unnest_tokens не может обработать векторы в R с пакетом tidytext

Question

unnest_tokens не может обработать векторы в R с пакетом tidytext

Я хочу использовать tidytext пакет для создания столбца с "Ngrams". со следующим кодом:

library(tidytext)

unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2)

Но когда я запускаю это, я получаю следующее сообщение об ошибке:

error: unnest_tokens expects all columns of input to be atomic vectors (not lists)

мой text столбец состоит из множества твитов со строками, которые выглядят следующим образом и имеют классовый характер.

president_tweets$text <– c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
    "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
    "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
    "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
    "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
    "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"
    )

---------Обновить:----------

Похоже, sentimetr или же exploratory пакет вызвал конфликт. Я перезагрузил свои пакеты без них, и теперь это работает снова!

2

r text-analysis tidytext

Источник

user5816847 20 дек '17 в 16:14

1 ответ

Другие вопросы по тегам r text-analysis tidytext

user5468471 20 дек '17 в 18:48 2017-12-20 18:48 · Answer 1 · 2017-12-20 18:48

Хм, я не могу воспроизвести вашу проблему.

library(tidytext)
library(dplyr)

president_tweets <- data_frame(text = c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
                                        "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
                                        "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
                                        "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
                                        "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
                                        "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"))


unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2) 
#> # A tibble: 205 x 1
#>    bigrams      
#>    <chr>        
#>  1 the united   
#>  2 united states
#>  3 states senate
#>  4 senate just  
#>  5 just passed  
#>  6 passed the   
#>  7 the biggest  
#>  8 biggest in   
#>  9 in history   
#> 10 history tax  
#> # ... with 195 more rows

Текущая версия tidytext для CRAN фактически не допускает списки-столбцы, но мы изменили обработку столбцов, чтобы версия для разработчиков на GitHub теперь поддерживала списки-столбцы. Вы уверены, что у вас нет ничего из этого в вашем фрейме данных? Каковы типы данных всех ваших столбцов? Являются ли какие-либо из них типа list?