Использование униграммной модели в оболочке KenLM Python

Я пытаюсь использовать unigram arpa файл для создания kenlm Model в оболочке Python. Однако я получаю следующую ошибку:

Loading the LM will be faster if you build a binary file.
Reading /home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Traceback (most recent call last):
  File "kenlm.pyx", line 119, in kenlm.Model.__init__ (python/kenlm.cpp:2603)
RuntimeError: lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException.
This ngram implementation assumes at least a bigram model. Byte: 25

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "process_experiment.py", line 45, in <module>
    create_logprob_corpus_vectors.create(tokenized_line_file, logprob_file)
  File "/home/ubuntu/lm_1b/lm_1b/create_probabilities_from_raw_data/create_logprob_corpus_vectors.py", line 37, in create
    klm_ngram_model = kenlm.Model(op.join(filenames.preproc_dir, 'lm1b-1gram.tsv'))
  File "kenlm.pyx", line 122, in kenlm.Model.__init__ (python/kenlm.cpp:2740)
OSError: Cannot read model '/home/ubuntu/lm_1b/lm_1b/preprocessed_data/lm1b-1gram.tsv' (lm/model.cc:100 in void lm::ngram::detail::GenericModel<Search, VocabularyT>::InitializeFromARPA(int, const char*, const lm::ngram::Config&) [with Search = lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>; VocabularyT = lm::ngram::ProbingVocabulary] threw FormatLoadException. This ngram implementation assumes at least a bigram model. Byte: 25)

Как я могу использовать модель Unigram?

0 ответов

Другие вопросы по тегам