C程序计算一个文本文件中的词频 [英] C Program to count the word frequency in a text file

查看:142
本文介绍了C程序计算一个文本文件中的词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要能写C语言中code,可以读取文本文件,并找到多少每个单词的存在和它发生输出词,多少次了。现在我有code,将打印出每个单词,多少次这样的情况,但我需要它的字母顺序打印,并忽略大写字母。例如,IT和它应该被算作是同一个词。我不知道在哪里我的code,包括修订。下面是我的code的例子。

 的#include<&stdio.h中GT;
#包括LT&;&string.h中GT;
#包括LT&;&stdlib.h中GT;INT主(INT ARGC,CHAR *的argv [])
{
   如果(的argc == 1){
   的printf(输入文件名尚未提供\\ n);
   }
   否则如果(argc个== 2){
   FILE * F =的fopen(的argv [1],RB);
   fseek的(F,0,SEEK_END);
   长FSIZE = FTELL(F);
   fseek的(F,0,SEEK_SET);   字符*海峡=的malloc(FSIZE + 1);
   FREAD(STR,FSIZE,1,F);
   FCLOSE(F);   海峡[FSIZE] = 0;
   诠释计数= 0,C = 0,I,J = 0,K,空间= 0;
   字符P [1000] [512],STR1 [512],ptr1的[1000] [512];
   字符* PTR;
   对于(i = 0; I<的strlen(STR);我++)
   {
   如果((STR由[i] =='')||(STR [I] =='')||(STR [I] ==''))
   {
   空间++;
   }
   }
   对于(I = 0,J = 0,K = 0; J&下;的strlen(STR); J ++)
   {
   如果((STR [J] =='')||(STR [J] == 44)||(STR [J] == 46))
   {
   P [I] [K] ='\\ 0';
   我++;
   K = 0;
   }
   其他
   P [I] [K +] = STR [J]。
   }
   K = 0;
   对于(i = 0; I< =空间;我++)
   {
   为(J = 0; J< =空间; J ++)
   {
   如果(我== j)条
   {
   的strcpy(ptr1的[K],第[I]);
   ķ++;
   算上++;
   打破;
   }
   其他
   {
   如果(STRCMP(ptr1的研究[J],P [i])!= 0)
   继续;
   其他
   打破;
   }
   }
   }
   对于(i = 0; I<计数;我++)
   {
   为(J = 0; J< =空间; J ++)
   {
   如果(STRCMP(ptr1的[I],P [J]。)== 0)
   C ++;
   }
   的printf(%S%d个\\ N,ptr1的[I],C);
   C = 0;
   }
   }
   返回0;}


解决方案

下面是一个最小的命题,你的code很可能需要被分解为功能,但考虑到这仅仅是某种命题草案。
你可以简单地取代你的 STRCMP ,但 strcasecmp 的情况下,敏感的部分。

然后进行排序,你可以使用的qsort
定义一个函数像比较:

  INT COMPAR(常量无效*一,常量无效* B)
{
        回*(字符*)A - *(字符*)B:
}

和应用它你一句话阵列上。
据我的理解,似乎ptr1的牵你的话,那么您可以添加

 的qsort(ptr1的,计数的sizeof(ptr1的[0]),COMPAR);

在您最后一次的循环。

不过,在我看来,你需要修复您提取环路的valgrind在code报告一些错误。

I need to be able to write a code in C programming that can read the text file and find how many of each word there is and output the word and how many times it occurs. Right now I have code that will print out each word and how many times it occurs, but I need it to print in alphabetical order and to ignore the uppercase letters. For example, "It" and "it" should be counted as the same word. I'm not sure where in my code to include the revisions. Below is an example of my code.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
   if (argc == 1) {
   printf("The input file name has not been provided\n");
   }
   else if (argc == 2) {
   FILE *f = fopen(argv[1], "rb");
   fseek(f, 0, SEEK_END);
   long fsize = ftell(f);
   fseek(f, 0, SEEK_SET);

   char *str = malloc(fsize + 1);
   fread(str, fsize, 1, f);
   fclose(f);

   str[fsize] = 0;
   int count = 0, c = 0, i, j = 0, k, space = 0;
   char p[1000][512], str1[512], ptr1[1000][512];
   char *ptr;
   for (i = 0;i<strlen(str);i++)
   {
   if ((str[i] == ' ')||(str[i] == ',')||(str[i] == '.'))
   {
   space++;
   }
   }
   for (i = 0, j = 0, k = 0;j < strlen(str);j++)
   {
   if ((str[j] == ' ')||(str[j] == 44)||(str[j] == 46))
   {
   p[i][k] = '\0';
   i++;
   k = 0;
   }
   else
   p[i][k++] = str[j];
   }
   k = 0;
   for (i = 0;i <= space;i++)
   {
   for (j = 0;j <= space;j++)
   {
   if (i == j)
   {
   strcpy(ptr1[k], p[i]);
   k++;
   count++;
   break;
   }
   else
   {
   if (strcmp(ptr1[j], p[i]) != 0)
   continue;
   else
   break;
   }
   }
   }
   for (i = 0;i < count;i++)
   {
   for (j = 0;j <= space;j++)
   {
   if (strcmp(ptr1[i], p[j]) == 0)
   c++;
   }
   printf("%s %d \n", ptr1[i], c);
   c = 0;
   }
   }
   return 0;

}

解决方案

Here is a minimal proposition, your code would probably need to be broken down into functions, but consider this is just some kind of draft proposition. You can simply replace your strcmp but strcasecmp for the case sensitive part.

Then for sorting, you can use qsort: define a function for comparison like:

int compar(const void *a, const void *b)
{
        return *(char *)a - *(char *)b;
}

and apply it on you word array. As far as I understood, ptr1 seems to hold your words, so you may add

   qsort(ptr1, count, sizeof(ptr1[0]), compar);

before your last for loop.

Nevertheless it seems to me that you need to fix you extracting loop as valgrind reports some errors in your code.

这篇关于C程序计算一个文本文件中的词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆