计算文件中每个字出现的次数 [英] Count the number of times each word occurs in a file

查看:118
本文介绍了计算文件中每个字出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我正在编写一个程序,计算每个单词在文件中出现的次数。然后它打印计数在800和1000之间的词的列表,按照计数的顺序排序。我坚持保持一个计数器,看看第一个词是否匹配下一个,直到一个新单词出现。在主要我试图打开文件,读取每个字的单词和调用排序在while循环中的排序向量。然后,在for循环中遍历所有的字,如果第一个字等于第二个计数++。我不认为这是你如何保持柜台。

Hi I am writing a program that counts the number of times each word occurs in a file. Then it prints a list of words with counts between 800 and 1000, sorted in the order of count. I am stuck on keeping a counter to see if the first word matches the next until a new word appears. In the main I am trying to open the file, read each word by word and call sort in the while loop to sort the vector. Then, in the for loop go through all the words and if the first word equals the second count++. I don't think that is how you keep a counter.

以下是代码:

#include <string>
#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <set>

using namespace std;

vector<string> lines;
vector<int> second;
set<string> words;
multiset<string> multiwords;

void readLines(const char *filename)
{
    string line;
    ifstream infile;
    infile.open(filename);
    if (!infile)
    {       
        cerr << filename << " cannot open" << endl; 
          return; 
    }       
    getline(infile, line);
    while (!infile.eof())
    {
        lines.push_back(line);
        getline(infile, line);
    }  
    infile.close();
}

int binary_search(vector<string> &v, int size, int value)
{
    int from = 0;
    int to = size - 1;
    while (from <= to)
    {  
        int mid = (from + to) / 2;
        int mid_count = multiwords.count(v[mid]);
        if (value == mid_count) 
            return mid;
        if (value < mid_count) to = mid - 1;
        else from = mid + 1;
    }
   return from;
}

int main() 
{
    vector<string> words;
    string x;
    ifstream inFile;
    int count = 0;

    inFile.open("bible.txt");
    if (!inFile) 
    {
        cout << "Unable to open file";
        exit(1);
    }
    while (inFile >> x){
        sort(words.begin(), words.end());
    }

    for(int i = 0;i < second.size();i++)
    {
        if(x == x+1)
        {
            count++;
        }
        else
            return;
    }
    inFile.close();
}


推荐答案

定义 letter_only 语言环境,以便忽略来自流的标点符号,并从输入流中只读取有效的英语字母。这样,流就会处理词语方式,方式。和方法!作为只是相同的字方式,因为流将忽略像。和!。

One solution could be this : define letter_only locale so as to ignore punctuations coming from the stream, and to read only valid "english" letters from the input stream. That way, the stream will treat the words "ways", "ways." and "ways!" as just the same word "ways", because the stream will ignore punctuations like "." and "!".

struct letter_only: std::ctype<char> 
{
    letter_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);

        std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
        return &rc[0];
    }
};

然后使用它:

int main()
{
     std::map<std::string, int> wordCount;
     ifstream input;

     //enable reading only english letters only!
     input.imbue(std::locale(std::locale(), new letter_only())); 

     input.open("filename.txt");
     std::string word;
     std::string uppercase_word;
     while(input >> word)
     {
         std::transform(word.begin(), 
                        word.end(), 
                        std::back_inserter(uppercase_word),
                        (int(&)(int))std::toupper); //the cast is needed!
         ++wordCount[uppercase_word];
     }
     for (std::map<std::string, int>::iterator it = wordCount.begin(); 
                                               it != wordCount.end(); 
                                               ++it)
     {
           std::cout << "word = "<< it->first 
                     <<" : count = "<< it->second << std::endl;
     }
}

这篇关于计算文件中每个字出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆