在C中读取一个大文本文件 [英] Reading a big text file in C

查看:62
本文介绍了在C中读取一个大文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个基于C语言的项目中工作。我想读一个包含两行数字字符的文件。



例子



I work in a project based on C language. I want to read a file that contains two lines of numeric characters.

Example

9
8 9 5456 32 2 45 34 98 5



我想将第二行的元素传递给数组。如您所见,第一行告诉我们阵列需要多大。在上面的例子中,我们想要创建一个大小为9的数组来容纳9个数字。



我创建了一个代码来完成这个但是它太复杂了我想要使它更灵活。



我的代码逻辑如下:



我尝试将整个文件一次性读入单个 char * ,然后按行将其拆分为 char *的数组然后通过调用 atoi 将每行转换为 int





函数 count_lines()计算单词数并输入密钥(换行) 。函数 is_linebreaks(p)检查字符是否为空格(''),输入键( \ n )或回车( \ r)。如果此条件为真,则递增行计数器并吃掉其他中断。吃其他休息意味着,例如,如果你有多个空格(''),它会跳过它们,直到下一个有效字符(或其他一个中断)。 br />


所以,如果 number_of_rows (在 main 函数中)实际上不是行,而是单词的数量,每行都会有一个单词!





我创建的代码只是为了尽可能快地保存 2.000.000 数字所以我想要保持代码的逻辑。但如果您有更好的想法如此快速地存储这么多数字,请继续进行。



问题是在第二个空间中,每个元素都是一个空格分隔的,所以我不需要杀死所有的空格。此外,我在文件中只有两行。




I want to pass the elements of the second line into an array. As you saw the first line tell us how big needs to be the array. In the above example we want to create an array with size of 9 to hold 9 numbers.

I created a code that does that exactly but it's too complicated and I want to make it more flexible.

The logic of my code it the following:

I try to read the entire file in one go into a single char*, then splits that by line into an array of char* and then lastly converts each line to an int by calling atoi.


The function count_lines() counts the number of words and enter key(new line). The function is_linebreaks(p) checks if the character is a space (' '), an enter key (\n) or a carriage return (\r). If this condition is true, it increment rows counter and eats the other breaks. Eating other breaks means, for example, that if you had multiple spaces (' ') it would skip them until the next valid character (or one of the other breaks).

So, if number_of_rows (in the main function) is not actually rows, but number of words, each line will have one word!


I created the code just to hold 2.000.000 numbers as fast as possible so i want to keep the same logic of the code. But if you have a better idea to store so many numbers so fast please proceed to expain.

The problem is that in the second linespace, every element is one space separated so I don't need to kill all the whitespaces. Also I have only two line in the file.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <time.h>

int is_end(char* input) {
    return *input == 0;
}

int is_linebreak(char* input) {
    return *input == '\r' || *input == '\n' || *input == ' ';
}

char* eat_linebreaks(char* input) {
    while (is_linebreak(input))
        ++input;

    return input;
}

size_t count_lines(char* input) {
    char* p = input;
    size_t rows = 1;

    if (is_end(p))
        return 0;

    while (!is_end(p)) {
        if (is_linebreak(p)) {
            ++rows;
            p = eat_linebreaks(p);
        }
        else {
            ++p;
        }
    }
    return rows;
}

/* split string by lines */
char** get_lines(char* input, size_t line_count) {
    char* p = input;
    char* from = input;
    size_t length = 0;
    size_t line = 0;
        int i;
    char** lines = (char**)malloc(line_count * sizeof(char*));

    do {
        if (is_end(p) || is_linebreak(p)) {
            lines[line] = (char*)malloc(length + 1);
            for (i = 0; i < length; ++i)
                lines[line][i] = *(from + i);

            lines[line][length] = 0;
            length = 0;
            ++line;
            p = eat_linebreaks(p);
            from = p;

        }
        else {
            ++length;
            ++p;
        }
    } while (!is_end(p));

    // Copy the last line as well in case the input doesn't end in line-break
    lines[line] = (char*)malloc(length + 1);
    for (i = 0; i < length; ++i)
        lines[line][i] = *(from + i);

    lines[line][length] = 0;
    ++line;


    return lines;
}

int main(int argc, char* argv[]) {
    clock_t start;
    unsigned long microseconds;
    float seconds;
    char** lines;
    size_t size;
    size_t number_of_rows;
    int count;
    int* my_array;
    start = clock();

    FILE *stream;
    char *contents;
    int fileSize = 0;
        int i;

    // Open file, find the size of it
    stream = fopen(argv[1], "rb");
    fseek(stream, 0L, SEEK_END);
    fileSize = ftell(stream);
    fseek(stream, 0L, SEEK_SET);

    // Allocate space for the entire file content
    contents = (char*)malloc(fileSize + 1);

    // Stream file into memory
    size = fread(contents, 1, fileSize, stream);
    contents[size] = 0;
    fclose(stream);

    // Count rows in content
    number_of_rows = count_lines(contents);

    // Get array of char*, one for each line
    lines = get_lines(contents, number_of_rows);

    // Get the numbers out of the lines
    count = atoi(lines[0]); // First row has count
    my_array = (int*)malloc(count * sizeof(int));
    for (i = 0; i < count; ++i) {
        my_array[i] = atoi(lines[i + 1]);
    }

    microseconds = clock() - start;
    seconds = microseconds / 1000000.0f;
    printf("Took %fs", seconds);


    return 0;
}

推荐答案

这是从stdin读取的示例。





Here's an example reading from stdin.


#include <stdlib.h>

int _tmain(int argc, _TCHAR* argv[])
{
    size_t n = 0;
    scanf_s("%d", &n);
    int *pi = (int*)calloc(n, sizeof(int));
    int *p = pi;
    for (size_t i = 0; i < n; i++)
    {
        scanf_s("%d", p);
        p++;
    }

    p = pi;
    for (size_t i = 0; i < n; i++)
    {
        printf("%d\n", *p);
        p++;
    }
    getchar(); getchar();
    return 0;
}


这篇关于在C中读取一个大文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆