计算文本文件中每个词的出现次数 [英] Counting Occurrences of Each Word in a Text File

查看:237
本文介绍了计算文本文件中每个词的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个具有多个字符串的大型文本文件,什么是最有效的方式来读取文本文件和计数每个单词的出现次数有C ++?文本文件的大小是未知的,所以我不能只使用一个简单的数组。此外,还有另一个catch。此文本文件的每一行以类别关键字开头,后面的单词是该类别的特征。我需要能够计算每个单词在该类别中的出现次数。

Given a large text file with multiple strings, what would be the most efficient way to read the text file and count how many occurrences of each word are there in C++? The text file's size is unknown so I cannot just use a simple array. Also, there is another catch. Each line of this text file starts with a category key word and the following words are the features of that category. I need to be able to count how many occurrences of each word is in that category.

例如:

colors red blue green yellow orange purple
sky blue high clouds air empty vast big
ocean wet water aquatic blue
colors brown black blue white blue blue

$在此示例中,我需要计数颜色类别中出现的四个蓝色,即使

With this example, I need to count that within the "colors" category, there are 4 occurrences of "blue", even though there are 6 total occurrences of blue in total.

推荐答案

我会使用 stream 用于读取和分离单词(通过查找空格分隔单词),并将它们保存到字典(标准C ++方法是使用 std :: map )。

I would use a stream for reading and separating the words (it separates words by looking for whitespace) and save them to a dictionary (The standard C++ method is to use std::map).

这里是一个C ++记录的代码:

Here is a C++ documented code:

#include <iostream>
#include <map> // A map will be used to count the words.
#include <fstream> // Will be used to read from a file.
#include <string> // The map's key value.
using namespace std;


//Will be used to print the map later.
template <class KTy, class Ty>
void PrintMap(map<KTy, Ty> map)
{
    typedef std::map<KTy, Ty>::iterator iterator;
    for (iterator p = map.begin(); p != map.end(); p++)
        cout << p->first << ": " << p->second << endl;
}

int main(void)
{
    static const char* fileName = "C:\\MyFile.txt";

    // Will store the word and count.
    map<string, unsigned int> wordsCount;


    {
        // Begin reading from file:
        ifstream fileStream(fileName);

        // Check if we've opened the file (as we should have).
        if (fileStream.is_open())
            while (fileStream.good())
            {
                // Store the next word in the file in a local variable.
                string word;
                fileStream >> word;

                //Look if it's already there.
                if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time.
                    wordsCount[word] = 1; // Initialize it to 1.
                else // Then we've already seen it before..
                    wordsCount[word]++; // Just increment it.
            }
        else  // We couldn't open the file. Report the error in the error stream.
        {
            cerr << "Couldn't open the file." << endl;
            return EXIT_FAILURE;
        }

        // Print the words map.
        PrintMap(wordsCount);
    }

    return EXIT_SUCCESS;
}

输出:


空气:1

aquatic:1

big:1

black:1

blue :6

brown:1

clouds:1

colors:2

empty:1

绿色:1

高位:1

海洋:1

橙色:1

紫色: 1

red:1

sky:1

vast:1

水:1

wet:1

white:1

yellow:1

air: 1
aquatic: 1
big: 1
black: 1
blue: 6
brown: 1
clouds: 1
colors: 2
empty: 1
green: 1
high: 1
ocean: 1
orange: 1
purple: 1
red: 1
sky: 1
vast: 1
water: 1
wet: 1
white: 1
yellow: 1

这篇关于计算文本文件中每个词的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆