C-计算文件中的单词,字符和行数.字符数 [英] C - Counting words, characters and lines in file. Character count

查看:49
本文介绍了C-计算文件中的单词,字符和行数.字符数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须用C编写代码,该代码输出给定文件中的字符,行和单词的数量.任务似乎很简单,但是我现在不确定到底出了什么问题.

所以,这是代码:

  #include< stdio.h>#include< stdlib.h>#include< ctype.h>int main(){FILE *文件;字符文件名[256];char ch;char prevch;int lines = 0;int words = 0;int个字符= 0;printf(输入您的文件名(不要忘记扩展名!):\ n");scanf(%s",文件名);file = fopen(filename,"r");if(文件== NULL){printf(无法打开文件%s \ n",文件名);退出(0);}别的{while((ch = fgetc(file))!= EOF){if(ch ==''|| ch =='\ n'|| ch =='\ t'){if(isspace(prevch)== 0){单词++;}}如果(ch =='\ n'){线++;}prevch = ch;字符++;}}fclose(文件);if(isspace(prevch)== 0){单词++;}printf(字符数:%d \ n",字符);printf(单词数:%d \ n",单词);printf(行数:%d \ n",行);返回0;} 

该任务的思想是输出应与Linux中wc命令的输出相同.但是我完全不知道为什么我的循环会跳过某些字符.我编写代码的方式应该适合于计算每个单字符,甚至那些空格.为什么当wc显示68时,我的程序显示示例文件包含65个字符?我以为也许有些字符被fgetc跳过了,但是这是不可能的,因为在编写程序将一个文本文件的内容复制到另一个文本之前,我已经使用了该功能,并且一切正常.

顺便说一句,我的字数统计解决方案正确吗?循环后的条件应确保对EOF之前的最后一个字进行计数.我使用isspace来确保结尾处不只是一些空白.

谢谢!

解决方案

我的程序显示示例文件包含65个字符,当wc显示68个字符"

您正在Windows上工作,并且文件只有三行吗?如果是这样,则问题在于Windows将CRLF行末尾映射到换行符,因此将3个CRLF对映射到3个新行末尾(仅LF),以解决差异.要解决此问题,请以二进制模式打开文件.

如果没有运行您的代码,我认为您的用于单词计数的代码就可以了.相反,您可以使用最初设置为0(false)的'in-word'标志,然后切换为true并在没有单词的情况下检测到非空格的内容时计数一个新单词.两者都起作用;它们略有不同.

此外,请记住, fgetc()和亲戚返回的是 int ,而不是 char .如果将返回值保存在 char 中,则无法可靠地检测到EOF,尽管问题的性质取决于是否使用普通的 char 是带符号的还是带符号的以及使用的代码集

如果普通的 char 是无符号类型,则您将永远无法检测到EOF(因为EOF映射到0xFF,并且当将其转换为 int 与EOF进行比较时,是肯定的).如果对普通的 char 进行了签名,则如果输入包含代码0xFF(在ISO 8859-1和相关代码集中,即ÿ— Unicode术语中带有DIAERESIS的拉丁小写字母Y),则可以尽早检测到EOF.但是,有效的UTF-8永远不能包含字节0xFF(也不能为0xC0、0xC1或0xF5..0xFF),因此您不应该遇到这种误解问题-但是您的代码也是字节计数而不是字符计数./p>

I have to write a code in C, which outputs the number of characters, lines and words in a given file. The task seems to be simple, but I'm really not sure what went wrong at this point.

So, here's the code:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main()
{
    FILE *file;
    char filename[256];
    char ch;
    char prevch;

    int lines=0;
    int words=0;
    int characters=0;

    printf("Enter your filename (don't forget about extension!):\n");
    scanf("%s", filename);

    file=fopen(filename, "r");
    if(file == NULL)
    {
        printf("Cannot open file %s \n", filename);
        exit(0);
    }
    else
    {

        while((ch=fgetc(file))!=EOF)
        {
            if(ch==' ' || ch=='\n' || ch=='\t')
            {
                if(isspace(prevch)==0)
                {
                    words++;
                }
            }
            if(ch=='\n')
            {
                lines++;
            }

            prevch=ch;
            characters++;
        }
    }

    fclose(file);

    if(isspace(prevch)==0)
    {
        words++;
    } 

    printf("Number of characters: %d\n", characters);
    printf("Number of words: %d\n", words);
    printf("Number of lines: %d\n", lines);

    return 0;
}

The idea of the task is that the output should be the same, as the output of command wc in Linux. But I've got absolutely no idea, why my loop is skipping some of the characters. The way I've written the code should be proper to count EVERY SINGLE character, even those whitespace. Why then my program shows sample file contains 65 characters, when wc shows 68? I've thought that maybe there are some characters, which are skipped by fgetc, but it's impossible as I've used the function before when I was writing a program to copy content of one text file to another and everything worked properly.

By the way, is my solution for word count correct? The condition after loop should make sure that last word before EOF is counted. I've used isspace to make sure that there aren't just some blank spaces in the ending.

Thanks!

解决方案

"My program shows sample file contains 65 characters, when wc shows 68"

Are you working on Windows, and does your file have just three lines? If so, the problem is that Windows maps CRLF line endings to newlines, so 3 CRLF pairs are mapped to 3 newlines (LF-only) endings, accounting for the discrepancy. To fix this problem, open the file in binary mode.

Without having run your code, I think your code for counting words is OK. You could instead use an 'in-word' flag initially set to 0 (false) and switch to true and count a new word when you detect something that's not white space while you're not in a word. Both work; they're slightly different.

Also, remember that fgetc() and relatives return an int, not a char. You cannot reliably detect EOF if you save the return value in a char, though the nature of the problem depends on whether plain char is signed or unsigned and the code set in use.

If plain char is an unsigned type, you can never detect EOF (because EOF is mapped to 0xFF, and when that is converted to int for comparison with EOF, it is positive). If plain char is signed, if the input contains code 0xFF (in ISO 8859-1 and related code sets, that's ÿ — LATIN SMALL LETTER Y WITH DIAERESIS in Unicode terminology), you detect EOF early. However, valid UTF-8 can never contain a byte 0xFF (nor 0xC0, 0xC1, nor 0xF5..0xFF), so you shouldn't run into that misinterpretation problem — but then your code is byte counting and not character counting too.

这篇关于C-计算文件中的单词,字符和行数.字符数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆