Понимание и использование инструмента Стэнфордского НЛП с разрешением Coreference (в Python 3.7)

Question

Понимание и использование инструмента Стэнфордского НЛП с разрешением Coreference (в Python 3.7)

Я пытаюсь понять инструменты Coreference NLP Stanford.Это мой код, и он работает:

import os
os.environ["CORENLP_HOME"] = "/home/daniel/StanfordCoreNLP/stanford-corenlp-4.0.0"

from stanza.server import CoreNLPClient

text = 'When he came from Brazil, Daniel was fortiﬁed with letters from Conan but otherwise did not know a soul except Herbert. Yet this giant man from the Northeast, who had never worn an overcoat or experienced a change of seasons, did not seem surprised by his past.'

with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse','coref'],
               properties={'annotators': 'coref', 'coref.algorithm' : 'neural'},timeout=30000, memory='16G') as client:

    ann = client.annotate(text)

chains = ann.corefChain
chain_dict=dict()
for index_chain,chain in enumerate(chains):
    chain_dict[index_chain]={}
    chain_dict[index_chain]['ref']=''
    chain_dict[index_chain]['mentions']=[{'mentionID':mention.mentionID,
                                          'mentionType':mention.mentionType,
                                          'number':mention.number,
                                          'gender':mention.gender,
                                          'animacy':mention.animacy,
                                          'beginIndex':mention.beginIndex,
                                          'endIndex':mention.endIndex,
                                          'headIndex':mention.headIndex,
                                          'sentenceIndex':mention.sentenceIndex,
                                          'position':mention.position,
                                          'ref':'',
                                          } for mention in chain.mention ]


for k,v in chain_dict.items():
    print('key',k)
    mentions=v['mentions']
    for mention in mentions:
        words_list = ann.sentence[mention['sentenceIndex']].token[mention['beginIndex']:mention['endIndex']]
        mention['ref']=' '.join(t.word for t in words_list)
        print(mention['ref'])

Я пробовал три алгоритма:

статистический (как в коде выше). Результаты:

he
this giant man from the Northeast , who had never worn an overcoat or experienced a change of seasons
Daniel
his

нервный

this giant man from the Northeast , who had never worn an overcoat or experienced a change of seasons ,
his

детерминированный (я получил ошибку ниже)

 > Starting server with command: java -Xmx16G -cp
 > /home/daniel/StanfordCoreNLP/stanford-corenlp-4.0.0/*
 > edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout
 > 30000 -threads 5 -maxCharLength 100000 -quiet True -serverProperties
 > corenlp_server-9fedd1e9dfb14c9e.props -preload
 > tokenize,ssplit,pos,lemma,ner,parse,depparse,coref Traceback (most
 > recent call last):
 > 
 >   File "<ipython-input-58-0f665f07fd4d>", line 1, in <module>
 >     runfile('/home/daniel/Documentos/Working Papers/Leader traits/Code/20200704 - Modeling
 > Organizing/understanding_coreference.py',
 > wdir='/home/daniel/Documentos/Working Papers/Leader
 > traits/Code/20200704 - Modeling Organizing')
 > 
 >   File
 > "/home/daniel/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py",
 > line 827, in runfile
 >     execfile(filename, namespace)
 > 
 >   File
 > "/home/daniel/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py",
 > line 110, in execfile
 >     exec(compile(f.read(), filename, 'exec'), namespace)
 > 
 >   File "/home/daniel/Documentos/Working Papers/Leader
 > traits/Code/20200704 - Modeling
 > Organizing/understanding_coreference.py", line 21, in <module>
 >     ann = client.annotate(text)
 > 
 >   File
 > "/home/daniel/anaconda3/lib/python3.7/site-packages/stanza/server/client.py",
 > line 470, in annotate
 >     r = self._request(text.encode('utf-8'), request_properties, **kwargs)
 > 
 >   File
 > "/home/daniel/anaconda3/lib/python3.7/site-packages/stanza/server/client.py",
 > line 404, in _request
 >     raise AnnotationException(r.text)
 > 
 > AnnotationException: java.lang.RuntimeException:
 > java.lang.IllegalArgumentException: No enum constant
 > edu.stanford.nlp.coref.CorefProperties.CorefAlgorithmType.DETERMINISTIC

Вопросы:

Почему я получаю эту ошибку с детерминированным?
Любой фрагмент кода, использующий NLP Stanford в Python, кажется намного медленнее, чем коды, связанные с Spacy или NLTK. Я знаю, что в этих других библиотеках нет привязки. Но, например, когда я используюimport nltk.parse.stanford import StanfordDependencyParserдля разбора зависимостей это намного быстрее, чем эта библиотека StanfordNLP. Есть ли способ ускорить этот CoreNLPClient в Python?
Я буду использовать эту библиотеку для работы с длинными текстами. Лучше работать со всем текстом меньшими частями? Длинные тексты могут привести к неправильным результатам для разрешения кореферентности (я обнаружил очень странные результаты для этой библиотеки кореферентности, когда использую длинные тексты)? Есть оптимальный размер?
Полученные результаты:

Результаты статистического алгоритма кажутся лучше. Я ожидал, что лучший результат даст нейронный алгоритм. Согласен ли ты со мной? В статистическом алгоритме есть 4 действительных упоминания, а когда я использую нейронный алгоритм, только 2.

Я что-то упускаю?

2

python-3.x nlp coreference-resolution stanford-stanza

Источник

user2065691 05 июл '20 в 02:31

1 ответ

Другие вопросы по тегам python-3.x nlp coreference-resolution stanford-stanza

user2795141 14 авг '20 в 18:53 2020-08-14 18:53 · Answer 1 · 2020-08-14 18:53

Вы можете найти список поддерживаемых алгоритмов в документации Java: ссылка
Вы можете запустить сервер, а затем просто использовать его, что-то вроде
```
# Here's the slowest part—models are being loaded
client = CoreNLPClient(...)

ann = client.annotate(text)

...

client.stop()
```

Но я не могу дать вам подсказки относительно 3 и 4.