Описание тега language-detection

Language detection or language identification is the task of identifying the language(s) in a fragment of text.

From Wikipedia:

In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods.


One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Serbian and Croatian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them.

http://corporavm.uni-koeln.de/vardial/sharedtask.html has input data and results from a recent competition (COLING 2014 VarDial workshop DSL task).