Использование униграммной модели в оболочке KenLM Python
Я пытаюсь использовать unigram arpa файл для создания kenlm Model
в оболочке Python. Однако я получаю следующую ошибку:
Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Traceback (most recent call last):
File "kenlm.pyx", line 119, in kenlm.Model.__init__ (python/kenlm.cpp:2603)
RuntimeError: lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 25
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "process_experiment.py", line 45, in <module>
create_logprob_corpus_vectors.create(tokenized_line_file, logprob_file)
File "/home/ubuntu/lm_1b/lm_1b/create_probabilities_from_raw_data/create_logprob_corpus_vectors.py", line 37, in create
klm_ngram_model = kenlm.Model(op.join(filenames.preproc_dir, 'lm1b-1gram.tsv'))
File "kenlm.pyx", line 122, in kenlm.Model.__init__ (python/kenlm.cpp:2740)
OSError: Cannot read model '/home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv' (lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException. This ngram implementation assumes at least a bigram model. Byte: 25)
Как я могу использовать модель Unigram?