Как я могу извлечь фразы из CoreNLPParser?

Question

Как я могу извлечь фразы из CoreNLPParser?

Как вы можете видеть из парсера изображений возвращает NP, VP, PP, NP. Я хочу иметь возможность получить доступ ко всем фразам на разной глубине. Например, в глубине =1 есть две фразы NP и VP, в глубине =2 есть некоторые другие фразы, в глубине =3 есть некоторые другие. Как я могу получить доступ к фразам, которые относятся к глубине = n с Python?

1

python-3.x nlp stanford-nlp pycorenlp

Источник

user10431154 07 авг '19 в 00:26

1 ответ

Решение

Другие вопросы по тегам python-3.x nlp stanford-nlp pycorenlp

user4793732 07 авг '19 в 03:14 2019-08-07 03:14 · Accepted Answer · 2019-08-07 03:14

package edu.stanford.nlp.examples;

import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;

import java.util.*;
import java.util.stream.*;

public class ConstituencyParserExample {

    public static void main(String[] args) {
        String text = "The little lamb climbed the big mountain.";
        // set up pipeline properties
        Properties props = new Properties();
        // set the list of annotators to run
        props.setProperty("annotators", "tokenize,ssplit,pos,lemma,parse");
        // build pipeline
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        // create a document object
        CoreDocument document = new CoreDocument(text);
        // annnotate the document
        pipeline.annotate(document);
        int maxDepth = 5;
        for (CoreSentence sentence : document.sentences()) {
            Set<Constituent> constituents = sentence.constituencyParse().constituents(
                    new LabeledScoredConstituentFactory(), maxDepth).stream().filter(
                            x -> x.label().value().equals("NP")).collect(Collectors.toSet());
            for (Constituent constituent : constituents) {
                System.out.println("---");
                System.out.println("label: "+constituent.label().value());
                System.out.println(sentence.tokens().subList(constituent.start(), constituent.end()+1));
            }
        }
    }
}