问题从txt文件填充2d数组(csv) [英] issues filling 2d array from txt file(csv)

查看:76
本文介绍了问题从txt文件填充2d数组(csv)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过从文本文件中读取来填充2d数组,各项之间用逗号分隔。
我尝试了两种方法,但遇到了一些问题。

I am working on filling a 2d array by reading from a text file, the items are separated by commas. I have tried 2 ways and I am having some issues.

第一种方法:

使用strtok(我已经读过,应该避免,所以我将strcpy复制已读入的原始字符串复制到另一个字符串中)我使用逗号作为分隔符。第一个问题是程序崩溃,除非我在读入的单词之间添加额外的空格。所以我添加了空格并且它可以正常工作,它可以读取所有内容,并且我可以打印以检查其是否添加到2d数组中。完成数组填充后,我嵌套了for循环以进行打印,由于某种原因,二维数组中的所有内容都已替换为它从txt文件读取的最后一个东西。所以我的问题是如何使strtok不需要多余的空间,以及由于某种原因数组被覆盖的原因,当我第一次填充并打印它时,似乎填充正确了。

Using strtok (which I've read I should avoid so I'm strcpy to copy original string that was read in to another one) I am using a comma as a delimiter. First problem is the program crashes unless I add additional spaces between the words i'm reading in. so I added spaces and it works, it reads everything and i print to check its added to the 2d array, or so it seems. After it finishes filling array I do nested for loop to print and for some reason eveything in the 2d array has been replaced by the last thing it read from the txt file. so my issues is how to make strtok not require the extra space and how come array is getting overwritten for some reason, when I first fill and print it it seems that it was filled correctly.

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    FILE *fp;
    char text[20], *token;
    char word[20];
    const char delimiters[] = ",";
    char *table[8][8];
    int i = 0;
    int j = 0;

    fp = fopen("board.txt", "r");
    if (fp == NULL)
    {
        printf("Error opening");
    }
    printf("\n\n");
    while (fscanf(fp, "%15s", text) != EOF)
    {
        strcpy(word, text);
        token = strtok(word, delimiters);

        table[i][j] = token;
        //pritn table values as they get added
        printf("table[%d][%d] = %s ", i, j, table[i][j]);

        //ghetto nested for loop
        j++;
        if (j >= 8)
        {
            i++;
            j = 0;
            printf("\n");
        }
    }

    printf("\n\n\ntable[0][3] = %s|", table[0][3]);
    printf("\n");

    for (i = 0; i < 8; i++)
    {
        //printf("\n");
        for (j = 0; j < 8; j++)
        {
            printf("table[%d][%d] = %s|", i, j, table[i][j]);
        }
        printf("\n");
    }
    return 0;
}

这是我从文本文件中读取的数据

this is the data i'm reading from text file

-4,-2,-3,-5,-6,-3,-2,-4
-1,-1,-1,-1,-1,-1,-1,-1
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
+1,+1,+1,+1,+1,+1,+1,+1
+4,+2,+3,+5,+6,+3,+2,+100

但是如果我不添加这样的空格,则会崩溃

but if i don't add spaces like this it crashes

-4, -2, -3, -5, -6, -3, -2, -4
-1, -1, -1, -1, -1, -1, -1, -1
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
+1, +1, +1, +1, +1, +1, +1, +1
+4, +2, +3, +5, +6, +3, +2, +100

第二种方法:

我一次读取每个字符从txt文件中,如果它检测到逗号,则将所有先前的字符添加为字符串,移至下一个字符,并一直重复直到EOF。使用这种方法,我不会有多余空间的问题,但是代码的问题是,每到一行末尾,它都会添加2项而不是1项,因此现在所有内容都从此转移了。这发生在每一行的末尾,所以当完成所有操作后,我将丢失nRows项。

I am reading each character one at a time from txt file, if it detects a comma it adds all the previous characters as string, moves onto next character and keeps repeating until EOF. With this method I don't have the problem of needing the extra spaces, but the issue with the code is that whenever it gets to the end of a row it adds 2 items instead of one, so now everything gets shifted from there after. This happens at the end of every row so when it's all done I am missing nRows items.

通过这种方法,我也遇到了与第一种方法相同的问题,即似乎用从文本文件读取的最后一个值覆盖了所有内容。还有一个小问题是,由于它的工作方式是检测逗号,因此它在知道一个单词之前就知道了所有内容,当我到达文件中的最后一个值时,除非添加逗号,否则不会将其写入数组。我正在通过添加逗号来解决它,但它不是文件的一部分,所以我不应该使用它。

With this approach I also get the same issues as first approach that it seems to overwrite everything with the last value read from the text file. One small isssue with this also is that since the way it works is by detecting a comma then it knows everything before it is a word, when I get to the last value in the file unless I add a comma it will not write it to the array. I'm working around it by adding a comma but its not part of the file so I shouldn't use it.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    FILE *fp;
    char text[20];
    char *table[8][8] = {0};
    char word[30];
    //char *table[8][8];
    int i = 0;
    int j = 0;

    fp = fopen("board.txt", "r");
    if (fp == NULL)
    {
        printf("Error opening");
    }

    int word_i = 0;
    int c;
    while ((c = fgetc(fp)) != EOF)
    {
        if (c == ',')
        {
            //separate words with commas
            if (word_i > 0)
            {                                        
                text[word_i] = '\0';

                // strcpy(word, text);
                // table[i][j] = word;

                table[i][j] = text;
                printf("table[%d][%d] = %s |\t", i, j, table[i][j]);
                j++;

                if (j >= 8)
                {
                    i++;
                    j = 0;
                }
            }
            word_i = 0;
        }
        else
        {
            text[word_i] = c;
            ++word_i;
        }
    }

    printf("\n\n");
    //want to check that i manually modified table[0][0]=124
    for (i = 0; i < 8; i++)
    {
        //printf("\n");
        for (j = 0; j < 8; j++)
        {
            printf("table[%d][%d] = %s|", i, j, table[i][j]);
        }
        printf("\n");
    }
    return 0;
}

使用此代码,我必须在文本文件的末尾添加逗号因此它会读取最后一个值

with this code I have to add a comma at the end of the text file so it reads the last value

-4,-2,-3,-5,-6,-3,-2,-4
-1,-1,-1,-1,-1,-1,-1,-1
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
+1,+1,+1,+1,+1,+1,+1,+1
+4,+2,+3,+5,+6,+3,+2,+100,

我可以发布我得到的输出

I can post what output I'm getting if its needed.

任何帮助将不胜感激,谢谢。

Any help would be greatly appreciated, thank you.

推荐答案

继续使用@JohathanLeffler的注释,使用面向 line 的输入函数一次读取一行数据,例如 fgets()或POSIX getline()确保每次读取文件时都消耗一行输入。然后,您只需从保存文件中数据行的缓冲区中解析逗号分隔值即可。

Continuing on from the comment by @JohathanLeffler, using a line-oriented input function to read a line of data at a time, such as fgets() or POSIX getline() ensures you consume a line of input with each read from your file. You then simply parse the comma separated values from the buffer holding the line of data from your file.

有几种方法可以分隔每个逗号分隔值(和每个都将具有变体,具体取决于您是要保留还是丢弃字段周围的空白)。您始终可以使用 start_pointer end-pointer 移动end_pointer来查找下一个',',然后进行复制字符( token )从 start_pointer end_pointer ,然后设置 start_pointer = ++ end_pointer 并重复执行直到到达缓冲区末尾。

There are several ways to separate each of the comma-separated values (and each will have variants depending whether you want to preserve or discard the whitespace surrounding a field). You can always use a start_pointer and end-pointer moving the end_pointer to locate the next ',' and then copying the characters (token) from start_pointer to end_pointer and then setting start_pointer = ++end_pointer and repeating until you reach the end of the buffer.

如果您没有空字段(表示您的数据没有相邻的','分隔符,例如 -4,-2 ,,-5,... ),然后使用 strtok()是将缓冲区拆分为代币的简单方法。如果您有空字段,则如果您的编译器提供BSD strsep(),它将处理空字段,或者仅使用 strcspn( ) strspn()(或在单个','分隔符的情况下使用 strchr()代替)将使您自动通过缓冲区自动完成行走指针。

If you have no empty-fields (meaning your data doesn't have adjacent ',' delimiters, e.g. -4,-2,,-5,...) then using strtok() is a simple way to split the buffer into tokens. If you have empty-fields, then if your compiler provides BSD strsep() it will handle empty-fields, or simply using a combination of strcspn() and strspn() (or in the case of a single ',' delimiter using strchr() instead) will allow you to automate walking a pair of pointers through the buffer.

一个非常简单的实现,使用 strtok()将每一行分隔为令牌(从 stdin )将是:

A very simple implementation with strtok() to separate each line into tokens (reading your file from stdin) would be:

#include <stdio.h>
#include <string.h>

#define MAXC 1024

int main (void) {

    char buf[MAXC];                         /* buffer to hold each line */

    while (fgets (buf, MAXC, stdin)) {      /* read each line into buf */
        /* split buf into tokens using strtok */
        for (char *tok = strtok (buf, ","); tok; tok = strtok (NULL, ",")) {
            tok[strcspn (tok, "\n")] = 0;   /* trim '\n' from end tok */
            /* output board (space before if not 1st) */
            printf (tok != buf ? " %s" : "%s", tok);
        }
        putchar ('\n');
    }
}

注意: printf 一个简单的 ternary 运算符用于在除第一个字段之外的所有字段之前放置一个空格-您可以将输出格式更改为所需的任何格式。还要注意,检查 strlen(buf)+1 == MAXC& buf [MAXC-2]!='\n'以验证整行可以省略 buf ,留给您实施)

(note: with printf a simple ternary operator is used to put a space before all fields except the first -- you can change the output formatting to anything you like. Also note that checking if strlen(buf) + 1 == MAXC && buf[MAXC-2] != '\n' to validate that the entire line fit in buf was intentionally omitted and left to you to implement)

使用 for 循环只是合并调用以获得第一个令牌的一种简化方法,其中 strtok 的第一个参数是字符串本身,然后获取随后的令牌,其中在检查 tok时, strtok 的第一个参数为 NULL ! = NULL 以验证对 strtok 的调用将返回有效令牌。如果更易于阅读,也可以使用 while()循环编写,例如

The use of the for loop above is just a condensed way to incorporating the call to get the first-token where the first parameter to strtok is the string itself, and then getting a subsequent token where the first parameter to strtok is NULL while checking tok != NULL to validate the call to strtok returns a valid token. It can also be written with a while() loop if that is easier to read, e.g.

        /* split buf into tokens using strtok */
        char *tok = strtok (buf, ",");      /* separate 1st token */
        while (tok) {                       /* validate tok != NULL */
            tok[strcspn (tok, "\n")] = 0;   /* trim '\n' from end tok */
            /* output board (space before if not 1st) */
            printf (tok != buf ? " %s" : "%s", tok);
            tok = strtok (NULL, ",");       /* get next token */
        }

(两者都是用于分隔逗号的等效循环-从 buf )分离的令牌

(both are equivalent loops for separating the comma-separated tokens from buf)

示例输入文件

$ cat dat/board-8x8.txt
-4,-2,-3,-5,-6,-3,-2,-4
-1,-1,-1,-1,-1,-1,-1,-1
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
 0, 0, 0, 0, 0, 0, 0, 0
+1,+1,+1,+1,+1,+1,+1,+1
+4,+2,+3,+5,+6,+3,+2,+100

使用/输出示例

仅通过空格分隔每个令牌就输出数据:

Outputting the data simply separating each token with a space yields:

$ ./bin/strtok_board_csv < dat/board-8x8.txt
-4 -2 -3 -5 -6 -3 -2 -4
-1 -1 -1 -1 -1 -1 -1 -1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
+1 +1 +1 +1 +1 +1 +1 +1
+4 +2 +3 +5 +6 +3 +2 +100






中为每个指针分配存储空间 c strong>


Allocating Storage for Each Pointer in table

当声明 char * table [ROW] [COL]; 时,您声明了一个2D数组指向 char 的指针。为了使用指针,您必须为每个指针分配一个有效的现有内存块的地址,或者必须分配一个足以容纳 tok 的新内存块。并将每个此类块的起始地址依次分配给每个指针。您不能简单地分配例如 table [i] [j] = tok; 由于 tok 指向内的地址buf ,每次读取新行时都会被新内容覆盖。

When you declare char *table[ROW][COL]; you have declared a 2D array of pointers to char. In order to use the pointers, you must either assign the address for a valid existing block of memory to each pointer, or you must allocate a new block of memory sufficient to hold tok and assign the starting address for each such block to each of your pointers in turn. You can't simply assign, e.g. table[i][j] = tok; due to tok pointing to an address within buf that will be overwritten with something new each time a new line is read.

相反,您需要分配足够的内存来容纳 tok 的内容(例如 strlen (tok)+ 1 个字节)将生成的新内存块分配给您的 table [i] [j] 指针,然后复制 tok 到新的内存块。您可以执行以下操作:

Instead you need to allocate sufficient memory to hold the contents of tok (e.g. strlen(tok) + 1 bytes) assign the resulting new block of memory to your table[i][j] pointer and then copy tok to that new block of memory. You can do that similar to:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ROW     8       /* if you need a constant, #define one (or more) */
#define COL   ROW
#define MAXC 1024

int main (void) {

    char buf[MAXC],                         /* buffer to hold each line */
        *table[ROW][COL] = {{NULL}};        /* 2D array of pointers */
    size_t row = 0;
    while (fgets(buf,MAXC,stdin)) {         /* read each line into buf */
        size_t col = 0;
        /* split buf into tokens using strtok */
        for (char *tok = strtok (buf, ","); tok; tok = strtok (NULL, ",")) {
            size_t len;
            tok[strcspn (tok, "\n")] = 0;   /* trim '\n' from end tok */
            len = strlen (tok);
            if (!(table[row][col] = malloc (len + 1))) {  /* allocate/validate */
                perror ("malloc-table[row][col]");
                exit (EXIT_FAILURE);
            }
            memcpy (table[row][col++], tok, len + 1);   /* copy tok to table */
        }
        if (col != COL) {   /* validate COL tokens read from buf */
            fprintf (stderr, "error: insufficient columns, row %zu\n", row);
            exit (EXIT_FAILURE);
        }
        row++;  /* increment row counter */
    }

    for (size_t i = 0; i < row; i++) {      /* loop rows */
        for (size_t j = 0; j < COL; j++) {  /* loop COLS */
            /* output board from table (space before if not 1st) */
            printf (j > 0 ? " %s" : "%s", table[i][j]);
            free (table[i][j]);             /* free allocated memory */
        }
        putchar ('\n');
    }
}

(示例输入和输出相同)

(example input and output are the same)

内存使用/错误检查

在您编写的任何可动态分配内存的代码中,您对分配的任何内存块都有2个职责:(1)始终为该内存块保留指向起始地址的指针,因此(2)它可以是 free 不再需要时。

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

必须使用内存错误检查程序来确保您不尝试访问内存或在已分配的块的边界之外/之外进行写操作,尝试读取或基于未初始化的值进行条件跳转,最后确认您释放了已分配的所有内存。

It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

对于Linux valgrind 是正常选择。每个平台都有类似的内存检查器。它们都很容易使用,只需通过它运行程序即可。

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/strtok_board_table_csv < dat/board-8x8.txt
==3469== Memcheck, a memory error detector
==3469== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3469== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==3469== Command: ./bin/strtok_board_table_csv
==3469==
-4 -2 -3 -5 -6 -3 -2 -4
-1 -1 -1 -1 -1 -1 -1 -1
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
+1 +1 +1 +1 +1 +1 +1 +1
+4 +2 +3 +5 +6 +3 +2 +100
==3469==
==3469== HEAP SUMMARY:
==3469==     in use at exit: 0 bytes in 0 blocks
==3469==   total heap usage: 66 allocs, 66 frees, 5,314 bytes allocated
==3469==
==3469== All heap blocks were freed -- no leaks are possible
==3469==
==3469== For counts of detected and suppressed errors, rerun with: -v
==3469== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认已释放所有已分配的内存,并且有没有内存错误。

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

让我知道是否还有其他问题。

Let me know if you have any further questions.

这篇关于问题从txt文件填充2d数组(csv)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆