Conditional Frequency Distribution

Question

Conditional Frequency Distribution

Hi :) I am really new to Python and NLP and now trying to go through the NLTK book from O'Reilly. I'm currently at a dead set with the task concerning plotting and tabulating with Conditional Frequency Distribution. The task is the following: "find out which days of the week are most newsworthy, and which are most romantic. Define a variable called days containing a list of days of the week, i.e. ['Monday', ...]. Now tabulate the counts for these words using cfd.tabulate(samples=days). Now try the same thing using plot in place of tabulate. You may control the output order of days with the help of an extra parameter: samples=['Monday', ...]."

This is my code:

      import nltk
from nltk.corpus import brown
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
genre_day = [(genre, day)
             for genre in ['news', 'romance']
             for day in days]
cfd = nltk.ConditionalFreqDist(genre_day)
tabulated = cfd.tabulate(conditions=['news', 'romance'],
                         sample=days, cumulative=True)

What I have as an outcome is this:

what I got

Could please someone explain to me why I have these data instead of counting how much each word is used per genre in the corpus? I will be very greatful for any help

1

nlp nltk nltk-book

Источник

29 июл '21 в 01:52

1 ответ

Другие вопросы по тегам nlp nltk nltk-book

user13737721 10 дек '22 в 10:53 2022-12-10 10:53 · Answer 1 · 2022-12-10 10:53

Понимание списка, которое вы предоставляете функции cdf:

      (genre, day)
for genre in ['news', 'romance']
for day in days

Он создает список пар с каждым жанром и каждый день, что-то вроде[('news','Sunday'),('news','Monday') ... ('romance','Saturday')]Таким образом, каждый жанр будет иметь по одному счету на каждый день, так как вы проходитеTrueкcumulativeпараметр, он просто суммирует.

Чтобы подсчитать появление дня недели в тексте, вы должны вместо этого использовать

      (genre, day)
for genre in ["news","romance"]
for word in brown.words(categories=category)
for day in days
if word == day

Для каждой категории он перебирает слова и пару(genre, word)будут добавлены в список, если слово является одним из дней.

Допустим, это текст «Воскресенье Apple Sunday» в жанре «новости». Понимание списка будет производить[("news","Sunday"), ("news","Sunday")], и получите счет 2 для «воскресенья».