Как извлечь слова тренда из данного набора данных (Java)?

Я хочу результат, как Twitter Trends. Из данного набора данных я хочу получить наиболее часто встречающиеся слова. 2 или 3 слова вместе.

На самом деле я хочу именно этот результат.

До сих пор я получил наиболее часто встречающиеся слова из набора данных с уменьшающимся списком результатов. Как я могу изменить этот код, чтобы дать мне наиболее часто встречающиеся два или три слова? Например, у меня есть новый набор данных. Я могу получить "Реал" и "Мадрид", но я хочу увидеть "Реал Мадрид" или даже "Реал Мадрид" вместе.

HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();

        BufferedReader reader = null;
        FileInputStream text = new FileInputStream("c:/news.txt");
        try
        {
            reader  = new BufferedReader(new InputStreamReader(text, "UTF-8")); 
            String currentLine = reader.readLine();
            while (currentLine != null)
            {  
                String[] words = currentLine.toUpperCase().split(" ");
                for (String word : words)
                {
                    if(wordCountMap.containsKey(word))
                    {    
                        wordCountMap.put(word, wordCountMap.get(word)+1);
                    }
                    {
                        wordCountMap.put(word, 1);
                    }
                }
                currentLine = reader.readLine();
            }
            Set<Entry<String, Integer>> entrySet = wordCountMap.entrySet();
            List<Entry<String, Integer>> list = new ArrayList<Entry<String,Integer>>(entrySet);
            Collections.sort(list, new Comparator<Entry<String, Integer>>() 
            {
                @Override
                public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) 
                {
                    return (e2.getValue().compareTo(e1.getValue()));
                }
            });
            System.out.println("Most seen words :");
            for (Entry<String, Integer> entry : list) 
            {
                if (entry.getValue() > 1)
                {
                    System.out.println(entry.getKey() + " : "+ entry.getValue());
                }
            }
        } 

Это небольшая часть моего примера набора данных. Обычно я читаю каждую строку слово за словом. Например, я вижу, как Тони видел 2 раза, мяч видели 5 раз. Но я хочу увидеть, видел ли Тони Болл слишком много, я хочу увидеть Тони Болл 10 раз. Я вхожу в большинство из них, таких как Twitter Trends.

Tony Ball says that the landscape of British broadcasting has shifted dramatically after BT bought a large slice of televised football rights, boosting the Premier League's next TV deal to a record £3bn over three years, a 71% increase.

This equates to at least £14m more per year for each football club, with the bottom team in the league from 2013-14 onwards likely to receive more than the £60.6m Manchester City earned this year for ending the season as champions. Each individual televised match will now cost the broadcasters £6.6m, up from £4.7m under the previous deal.

BSkyB, which has built its business over 20 years on the back of live top flight football, retained most of the rights, securing 116 matches per season from 2013-14 in exchange for £2.3bn over three years.

But BT sprung a huge surprise by winning the rights to 38 games, including almost half the "first pick" games on offer, in exchange for £738m over three years. Richard Scudamore, Premier League chief executive, said BT's securing 18 of the 38 coveted "first pick" matches would be a "game changer". "[BT chief executive] Ian Livingstone and his colleagues have hugely ambitious plans. They have not invested in all this fibre [optic cable] for nothing, they want to establish a direct relationship with consumers," he said.

BT – the latest challenger to Sky after Setanta and ESPN – is expected to launch a new sports channel, available on a variety of platforms. But BT will use the rights to push its high speed broadband service. Its matches will be shown at Saturday lunchtime and on midweek evenings.

Against a grim economic backdrop elsewhere, Tony Ball admitted he was "surprised" by the huge hike in income, which he said would allow clubs to continue to compete with their European rivals.

The huge increase in income is good news for club owners, players, their agents and luxury car dealerships and, on the evidence of previous deals, is likely to lead to another sharp rise in transfer fees. But despite the unprecedented riches that have flowed into the coffers of top flight clubs during the Premier League era, clubs made losses of £361m last year despite record income of £2.3bn.

Scudamore pleaded with clubs not to simply use the new deal to rack up losses and fuel wage inflation. While he said he wanted clubs to still invest in the best talent, he also made a plea to invest in infrastructure and youth development.

"We are entering a new era with financial fair play [the new Europe-wide regulations of club spending], I'm hoping it will get invested in things other than playing talent. It should also be able to achieve sustainability," he said.

The effect on fans is more uncertain. BT and Sky may have to charge more to cover their huge investment. When asked whether clubs would use the windfall to subsidise ticket prices, Scudamore would say only that it "gives them more choices".

Tony Ball, the former BSkyB chief executive who helped fuel the company's growth in the mid-1990s, is a non-executive director on the BT board and is likely to have advised it on its bidding strategy. ESPN, the US giant that entered the market when Setanta went bust trying to compete with Sky, has now been frozen out.

Какие-либо предложения? Благодарю.

0 ответов

Другие вопросы по тегам