阅读的文本文件,也就是说,不同的单词输出数量,使用最频繁的词 [英] Read text file and output number of words, distinct words, and most frequent word used

查看:182
本文介绍了阅读的文本文件,也就是说,不同的单词输出数量,使用最频繁的词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从一个文本读取文件中的所有字和输出的总数的话,的区别词数,而最常用的词。我还是个初学者所以任何帮助是真棒。

当读取的话,连字符/撇号/标点被省略,所以奥康纳将是相同的词作为奥康纳。的< ---我不知道如何去这样做,所以任何帮助将是巨大的

这是我到目前为止,但现在,当我尝试编译它给了我与的strcpy 警告,我不使用得当说。单词总数的输出工作,但它给我0的不同单词的数量,并没有对最常用的字。

任何帮助将是真棒,谢谢!

 的#include<&stdio.h中GT;
#包括LT&;&string.h中GT;
#包括LT&;&stdlib.h中GT;INT主(INT ARGC,CHAR *的argv [])
{
    INT数= 0;
    INT I;
    字符*温度;
    字符* TEMP2;
    字符字[2500] [50];
    INT的wordCount = 0;
    INT mostFreq = 1;
    焦炭mostFreqWord [2500] [50];
    INT frequentCount = 0;
    INT distinctCount = 0;
    诠释J;
    字符* P;
    FILE * FP;
    //读取文件!
    FP = FOPEN(COEN12_LAB1.txt,R);
    如果(FP == NULL)//检查是否文件是空的
    {
            的printf(文件丢失\\ n!);
            返回0;
    }
    而(的fscanf(FP,%S字)== 1)//扫描每一个字文本文件
            ++的wordCount; //计算字数
    而(的fscanf(FP,%S字)== 1)
    {
            对于(i = 0; I<的wordCount;我++)
            {
                    TEMP =字[我]
                    为(J = 0; J<的wordCount; J ++)
                    {
                            TEMP2 =字[J]。
                            如果(STRCMP(温度,TEMP2)== 0)//检查是否字重复
                            {
                                    frequentCount ++;
                                    如果(frequentCount> mostFreq)
                                    {
                                            的strcpy(mostFreqWord,字[I]); //这不起作用
                                    }
                            }
                            distinctCount ++;
                    }
            }
    }
    的printf(总字数数:%d \\ n的wordCount);
    的printf(鲜明的话总数:%d \\ n,distinctCount);
    的printf(最频繁出现的词是:%S \\ n,&安培; mostFreqWord);
    FCLOSE(FP);
}


解决方案

strcpy的问题(​​)是,作为确诊的Beginner href=\"http://stackoverflow.com/a/19106956/15168\">回答,如果​​你复制到 mostFreqWord ,你需要下标,因为它是一个二维数组。

不过,你有一个更根本的问题。你的话计数循环读取直到EOF,你不倒带文件重新开始。此外,像重读该文件是不是一个特别好的算法(和所有,如果你正在阅读从另一个程序管道中​​的数据将无法正常工作)。

您应该将二者结合起来循环。算的话,他们到达,而且还清理字(除去非字母字符 - 或者是说非字母数字字符,并执行 _ 下划线计数或不?) ,然后将其插入的单词表,如果它已不出现或增加频率计数的字,如果它已经出现。

在输入阶段完成后,你应该有鲜明的话准备数的计数,你就能够找到最频繁的扫描频率列表,找到最大的(和索引数,其中最大出现),然后适当地报告。

I have to read from a text file all the words and output the total number of words, number of distinct words, and the most frequently used word. I'm still a beginner so any help is awesome.

When reading the words, hyphens/apostrophes/punctuations are omitted, so O'connor would be the same word as Oconnor. <---I don't know how to go about doing that, so any help would be great.

This is what I have so far, but for now when I try to compile it gives me a warning with the strcpy and says I'm not using it properly. The output for the total number of words works, but it gives me 0 for the number of distinct words, and nothing for most frequently used word.

Any help would be awesome, thanks!

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int number=0;
    int i;
    char *temp;
    char *temp2;
    char word[2500][50];
    int wordCount=0;
    int mostFreq=1;
    char mostFreqWord[2500][50];
    int frequentCount=0;
    int distinctCount=0;
    int j;
    char *p;
    FILE *fp;
    //reads file!
    fp= fopen("COEN12_LAB1.txt", "r");
    if(fp == NULL)                  // checks to see if file is empty
    {
            printf("File Missing!\n");
            return 0;
    }
    while(fscanf(fp,"%s", word) == 1)  //scans every word in the text file
            wordCount++;  //counts number of words
    while(fscanf(fp,"%s",word) == 1)
    {
            for(i=0;i<wordCount;i++)
            {
                    temp=word[i];
                    for(j=0;j<wordCount;j++)
                    {
                            temp2 = word[j];
                            if(strcmp(temp,temp2) == 0)  //check to see if word is repeated
                            {
                                    frequentCount++;
                                    if(frequentCount>mostFreq)
                                    {
                                            strcpy(mostFreqWord,word[i]);  //this doesn't work
                                    }
                            }
                            distinctCount++;
                    }
            } 
    }
    printf("Total number of words: %d\n", wordCount);
    printf("Total number of distinct words: %d\n", distinctCount);
    printf("The most frequently appeared word is: %s \n", &mostFreqWord);
    fclose(fp);
}

解决方案

The problem with strcpy() is, as diagnosed by Beginner in their answer that if you are copying to mostFreqWord, you need to subscript it because it is a 2D array.

However, you have a more fundamental problem. Your word counting loop reads until EOF, and you don't rewind the file to start over. Further, rereading the file like that is not a particularly good algorithm (and wouldn't work at all if you were reading data piped in from another program).

You should combine the two loops. Count the words as they arrive, but also clean up the word (removing non-alphabetic characters — or is that non-alphanumeric characters, and does _ underscore count or not?), and then insert it into the word list if it does not already appear or increase the frequency count for the word if it does already appear.

When the input phase is done, you should have a count of the number of distinct words ready, and you'll be able to find the most frequent by scanning the list of frequencies to find the maximum (and the index number where the maximum appeared), and then reporting appropriately.

这篇关于阅读的文本文件,也就是说,不同的单词输出数量,使用最频繁的词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆