Разобрать вывод дерева NLTK в списке именных

Question

Разобрать вывод дерева NLTK в списке именных

У меня есть предложение

text  = '''If you're in construction or need to pass fire inspection, or just want fire resistant materials for peace of mind, this is the one to use. Check out 3rd party sellers as well Skylite'''

Я применил к нему NLTK-блок и получил дерево в качестве вывода.

sentences = nltk.sent_tokenize(d)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]

grammar = """NP: {<DT>?<JJ>*<NN.*>+}
       RELATION: {<V.*>}
                 {<DT>?<JJ>*<NN.*>+}
       ENTITY: {<NN.*>}"""

cp = nltk.RegexpParser(grammar)
for i in sentences:
    result = cp.parse(i)
    print(result)
    print(type(result))
    result.draw()

Вывод следующий:

(S If/IN you/PRP (RELATION 're/VBP) in/IN (NP construction/NN) or/CC (NP need/NN) to/TO (RELATION pass/VB) (NP fire/NN inspection/NN) ,/, or/CC just/RB (RELATION want/VB) (NP fire/NN) (NP resistant/JJ materials/NNS) for/IN (NP peace/NN) of/IN (NP mind/NN) ,/, this/DT (RELATION is/VBZ) (NP the/DT one/NN) to/TO (RELATION use/VB) ./.)

КАК я могу получить именную фразу в формате списка строк:

[construction, need, fire inspection, fire, resistant materials, peace, mind, the one]

Некоторые предложения, пожалуйста......?

1

python nltk text-chunking

Источник

user9388802 21 фев '18 в 03:53

1 ответ

Решение

Можно использовать фильтр по поддеревьям, как показано ниже

grammar = "NP: {<DT>?<JJ>*<NN>}"
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentences[1])
result.subtrees(filter =lambda t: t.label() == 'NP') # gives you generator

0

Источник

user6472962 29 ноя '20 в 07:57

Другие вопросы по тегам python nltk text-chunking

user4492932 21 фев '18 в 04:18 2018-02-21 04:18 · Accepted Answer · 2018-02-21 04:18

Что-то вроде этого:

noun_phrases_list = [[' '.join(leaf[0] for leaf in tree.leaves()) 
                      for tree in cp.parse(sent).subtrees() 
                      if tree.label()=='NP'] 
                      for sent in sentences]
#[['construction', 'need', 'fire inspection', 'fire', 'resistant materials', 
#  'peace', 'mind', 'the one'], 
# ['party sellers', 'Skylite']]

2

Источник

user4492932 21 фев '18 в 04:18