计算不同单词的数量 [英] count number of distinct words

查看:30
本文介绍了计算不同单词的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Java 计算文本中不同单词的数量.

I am trying to count the number of distinct words in the text, using Java.

该词可以是 unigram、bigram 或 trigram 名词.这三个已经通过使用 Stanford POS tagger 找到,但我无法计算出现频率大于等于一、二、三、四、五的单词及其计数.

The word can be a unigram, bigram or trigram noun. These three are already found out by using Stanford POS tagger, but I'm not able to calculate the words whose frequency is greater than equal to one, two, three, four and five, and their counts.

推荐答案

我可能没有正确理解,但如果您需要做的就是根据您获得的位置/方式计算给定文本中不同单词的数量您需要从文本中计算的单词,您可以使用 Java.Util.Scanner 然后将单词添加到 ArrayList 并且如果该单词已存在于列表中不要添加它,然后列表的大小将是 Distinct 单词的数量,如下例所示:

I might not be understanding correctly, but if all you need to do is count the number of distinct words in a given text depending on where/how you are getting the words you need to count from the text, you could use a Java.Util.Scanner and then add the words to an ArrayList and if the word already exists in the list don't add it and then the size of the list would be the number of Distinct words, something like the example below:

public ArrayList<String> makeWordList(){
    Scanner scan = new Scanner(yourTextFileOrOtherTypeOfInput);
    ArrayList<String> listOfWords = new ArrayList<String>();

       String word = scan.next(); //scanner automatically uses " " as a delimeter
       if(!listOfWords.contains(word)){ //add the word if it isn't added already
            listOfWords.add(word);
    }

    return listOfWords; //return the list you made of distinct words
}

public int getDistinctWordCount(ArrayList<String> list){
    return list.size();
}

现在,如果您在将单词添加到列表之前实际上必须先计算单词中的字符数,那么您只需要添加一些语句来检查单词字符串的长度,然后再将其添加到列表中.例如:

now if you actually have to count the number of characters in the word first before you add it to the list then you would just need to add some statements to check the length of the word string before adding it to the list. for example:

if(word.length() <= someNumber){
//do whatever you need to
}

对不起,如果我不理解这个问题,只是给出了一些蹩脚的无关答案=P,但我希望它在某种程度上有所帮助!

Sorry if i'm not understanding the question and just gave some crappy unrelated answer =P but I hope it helps in some way!

如果您需要跟踪您看到同一个单词的频率,即使您只想计算一次,您可以创建一个跟踪该频率的变量并将其放入一个列表中,以便索引频率计数与 ArrayList 中的索引相同,因此您也知道频率对应哪个单词,或者更好地使用 HashMap,其中键是不同的单词,值是它的频率(基本上使用与上面相同的代码,但不是 ArrayList 使用 HashMap 并添加一些变量来计算频率:

if you needed to keep track of how often you see the same word, even though you only want to count it once, you could make a variable that keeps track of that frequency and put it in a list such that the index of the frequency count is the same as the index in the ArrayList so you know which word the frequency corresponds too or better yet use a HashMap where the key is the distinct word and the value is its frequency (basically use the same code as above but instead of ArrayList use HashMap and add in some variable to count the frequency:

 public HashMap<String, Integer> makeWordList(){
        Scanner scan = new Scanner(yourTextFileOrOtherTypeOfInput);
        HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
        Scanner scan = new Scanner(sc);
        while(cs.hasNext())
       {
            String word = scan.next(); //scanner automatically uses " " as a delimeter
            int countWord = 0;
            if(!listOfWords.containsKey(word))
            {                             //add word if it isn't added already
                listOfWords.put(word, 1); //first occurance of this word
            }
            else
            {
                countWord = listOfWords.get(word) + 1; //get current count and increment
                //now put the new value back in the HashMap
                listOfWords.remove(word); //first remove it (can't have duplicate keys)
                listOfWords.put(word, countWord); //now put it back with new value
            }
       }
        return listOfWrods; //return the HashMap you made of distinct words
    }

public int getDistinctWordCount(HashMap<String, Integer> list){
       return list.size();
}

//get the frequency of the given word
public int getFrequencyForWord(String word, HashMap<String, Integer> list){
    return list.get(word);
}

这篇关于计算不同单词的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆