使用带有动态分配的C从CSV读取和保存值 [英] Reading and Saving Values from a CSV using C with Dynamic Allocation

查看:72
本文介绍了使用带有动态分配的C从CSV读取和保存值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

底部编辑(我将其写为编辑而不是自己的问题,因为它是如此相关):

EDITS ON THE BOTTOM (I wrote this as an edit rather than its own question because it is so related):

我正在编写CSV阅读器,该阅读器应将所有值另存为字符,并保存在大型多维数组中.在以前的帖子中,我曾被警告过不要对我的压缩代码太含糊,因此我将发布更多内容.我想对它的长度表示歉意,因为我仍在尝试确定此网站上合适的长度是什么样子.

I am writing a CSV reader that should save all the values as characters in a large, multidimensional array. In previous posts I have been warned about being either too vague with my condensed code, so I am going to post more of it. I want to apologize for its length, as I am still trying to gauge what an appropriate length looks like on this site.

最终,该程序将成为我正在创建的用于执行数据分析的标题文件中的标题.我用于此程序的头文件是:

Eventually this program will become the headliner in a Header File I am creating to perform data analysis. Header Files I use for this program are:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

我遇到的问题是阅读器似乎在名为read_csv()的函数内工作.我知道这是因为该函数中放置了printf()语句.当我在main()中printf()多维字符数组'buffer'时,数据无法正确显示.

The problem I am having is that the reader appears to work within the function called read_csv(). I know this because of the printf() statements placed within that function. When I printf() the multidimensional, character array 'buffer' in main(), the data does not appear correctly.

在main()中,第一列将正确打印,但后三列将无法正确打印.另外,我的目标是能够读取带有最大20个字符的单元格的任何MxN CSV文件.使用malloc()创建通用代码是本任务的下一个目标.

In main(), the first column will print correctly, but the next three will not. Also, my goal is to be able to read any MxN CSV file with cells carrying a MAXIMUM of 20 Characters. Creating generalized code using malloc() is my next goal in this task.

我的main()结构如下:

My main() is structured like:

int main(){

  FILE *f;
  char fname[20];
  int i, j;

  printf("enter name of csv file : ") ;
  scanf("%s",fname) ;       

  f = fopen(fname, "r");

  //find row/col
  int find_c_r[2];
  int * pfrc = &find_c_r[0];
  pfrc = find_col_row(f);
  printf("Find_c_r[0] = %d \t Find_c_r[1] = %d\n", *pfrc, *(pfrc+1));

  int numCol = *pfrc;
  int numRow = *(pfrc+1);

  char buffer[50][50][20];// ideally size buffer[numCol][numRow][20]

  //sets all values to NULL
  for(j = 0 ; j < 50 ; j++){
    for(i = 0 ; i < 10 ; i++){   
      memset(buffer[i][j],'\0', 20);
    }
  }

  read_csv(f, numCol, numRow, buffer); 
  /////
  printf("\n\n");

 for(j = 0 ; j < numRow ; j++){
    for(i = 0 ; i < numCol; i++){      
      printf("[%d][%d]",i, j);
      printf("_%s_  ",buffer[i][j]);
    } 
    printf("\n");
  }

  printf("END OF PROGRAM.\n");
 }

其中一部分是动态分配数组"buffer".我不太确定如何以这种格式malloc().

One part of this is dynamic allocation of my array 'buffer'. I'm not quite sure how to malloc() in this format.

main()调用的第一个函数是find_col_row(FILE * f).它的工作没有问题,但是人们在问题中要求我提供更多的代码.它返回一个指向int数组的指针,该数组保存正在读取的CSV文件中的列数和行数:

The first function main() calls is find_col_row(FILE *f). It is working without a problem, but folks have requested more of my code in questions. It returns a pointer to an int array which holds the number of Columns and Rows in the CSV File being read:

int * find_col_row(FILE *f){
 //Find numCol and numRow
  int numCol, numRow;
  char c;
  int new_line= 0;
  int comma = 0;
  int z = 0;
  numCol = 0;
  numRow = 0;
  while (c != EOF) {
    c = fgetc(f) ; 
    if(c == ','){ //WORDS MUST BE SEPARATED BY COMMAS
      comma++;
    }
    if(c == ';'){ //LINES MUST BE SEPARATED BY SEMI-COLONS
      new_line++;
      if(numCol == 0){
         numCol = comma + 1;
      }
    } 
  }


  numRow = new_line - 1;

  int a[2] = {numCol, numRow};
  int * pa = &a[0];

  return pa;
}

被调用的第二个函数是read_csv(...).此功能的目标是读取" CSV文件并保存"多维字符数组缓冲区"中每个单元格的值:

The second function being called is read_csv(...). The goal of this function is to 'read' the CSV file and 'save' values of each cell in the multidimensional, character array 'buffer':

void read_csv(FILE *f, int numCol, int numRow, char buffer[numCol][numRow][20])  //cells split by ',', row split by ';'
{
  char fname[100];
  int i = 0, j = 0;
  int c = 0,n = 0, z = 0;

  if (f == NULL) {
    printf("can't open file, %s\n", fname) ;
    exit(1) ;
  }

  n = 0 ;

  fseek(f, 0, SEEK_SET); //starts reading the file from the start
  c = fgetc(f) ;

  i = 0;
  j = 0;


  char temp[20];
  memset(temp, '\0', 20);
  int tc = 0; //temp counter
  int mv_temp = 0; //this aids in removing the first character if == ' '
  temp[tc] = c;
  while (c != EOF) {

    if(c == ','){
      if(temp[0] == ' '){
         for(mv_temp = 0 ; mv_temp < tc ; mv_temp++){
           temp[mv_temp] = temp[mv_temp + 1];
         }
      }
      strncpy(buffer[i][j], temp, 20);
      i++; 
      tc = 0;
      memset(temp, '\0', 20);
    }else if(c == ';'){
      if(temp[0] == ' '){
         for(mv_temp = 0 ; mv_temp < tc ; mv_temp++){
           temp[mv_temp] = temp[mv_temp + 1];
         }
      }
      strncpy(buffer[i][j], temp, 20);
      j++;
      i = 0;
      tc = 0;
      memset(temp, '\0', 20);
       c = fgetc(f);

    }else{
      temp[tc] = c;
      tc++;
    }
    c = fgetc(f);    
  }  /////while loop over


  for(j = 0 ; j < numRow ; j++){
    for(i = 0 ; i < numCol; i++){      
      printf("[%d][%d]",i, j);
      printf("_%s_  ",buffer[i][j]);
    } 
    printf("\n");
  }

}

没有尝试与任何其他CSV文件一起运行此程序,这是我使用的CSV.运行程序时,第一步将是scanf()文件名.我叫

Having not tried to run this program with any other CSV file, here is the CSV I use. When running the program, the first step will be to scanf() the name of the file. I call it

simp.csv

供参考,此数据指的是美式橄榄球的基本数据:阵型,阵型变化,下降,距离.该文件如下所示:

For reference, this data refers to basic American Football data: Formation, Formation Variation, Down, Distance. The file looks like:

OFF_FORM,FORM_VAR, DN, DIST;
DEUCE,RIGHT, 1, 10;
DEUCE,LEFT, 2, 7;
TRIO,RIGHT, 3, 3;
TREY,LEFT, 1, 10;
TRIO,RODDY, 1, 10;
TREY,LION, 2, 3;
DEEP,LEFT, 1, 10;
DEUCE,LION, 2, 15;
DEUCE,RIGHT, 3, 4;
DEEP,RODDY, 1, 10;
TREY,RIGHT, 1, 10;
TRIO,RAM, 2, 8;
TRIO,RAM, 3, 8;
DEEP,ROCK, 1, 10;
DEUCE,LION, 1, 10;
TRIO,LOUIE, 1, 10;
TRIO,RIGHT, 2,4;
DEUCE,RIGHT, 3, 6;
DEUCE,LION, 4, 2;
TREY,LION,1,10;

同样,对于这个问题,我深表歉意.我希望我有足够的信息可以帮助您.作为一名年轻/新手程序员,我愿意接受任何反馈.如果您可以回答我的问题并指出优化代码以使其更有效地工作的方法,我将不胜感激.

Again, I apologize for the length of the question. I hope I have enough information given to allow help to come. As a young/novice programmer, I am open to any and all feedback. If you can answer my question and point out ways to optimize my code to work more efficiently, I would greatly appreciate that feedback.

///////////////////////////////// //////////////////////////////

//////////////////////////////// ////////////////////////////////

@BLUEPIXY在他们的答案中共享的代码非常有效.现在,我只是试图将其转换为基本的头文件,并且不确定如何解决所遇到的一些问题.我对代码所做的只是更改函数的名称,并将它们转移到头文件中.

The code that @BLUEPIXY shared in their answer worked perfectly. Now, I am just trying to turn it into a basic header file and I am not sure of how to amend some of the problems I am seeing. All I did to the code was change the name of the functions and transfer them into a header file.

#ifndef bp_csv_reader
#define bp_csv_reader    

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <ctype.h>

//https://tools.ietf.org/html/rfc4180
char *csv_get_field(FILE *fp, char separator, int *state)

char ***csv_read(const char *filename, size_t *rows, size_t *cols)

char *csv_trim(char *s)

#endif

csv.c看起来像:

csv.c looks like:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include "csv.h"

//https://tools.ietf.org/html/rfc4180
char *csv_get_field(FILE *fp, char separator, int *state){
    int ch = fgetc(fp);

    if(ch == EOF)
        return NULL;

    size_t size = 1, index = 0;
    char *field = malloc(size);
    bool quoted_in = false;

    for(;ch != EOF; ch = fgetc(fp)){
        if(ch == '"'){
            if(quoted_in){
                int prefetch = fgetc(fp);
                if(prefetch == '"'){
                    ch = prefetch;
                } else {
                    quoted_in = false;
                    ungetc(prefetch, fp);
                    continue;
                }
            } else {
                quoted_in = true;
                continue;
            }
        } else if(!quoted_in && (ch == separator || ch == '\n')){
            break;
        }
        field[index++] = ch;
        char *temp = realloc(field, ++size);
        if(!temp){
            perror("realloc:");
            free(field);
            exit(EXIT_FAILURE);
        }
        field = temp;
    }
    field[index] = 0;
    *state = ch;
    if(quoted_in){
        fprintf(stderr, "The quotes is not closed.\n");
        free(field);
        return NULL;
    }
    return field;
}

char ***csv_read(const char *filename, size_t *rows, size_t *cols){
    *rows = *cols = 0;

    FILE *fp = fopen(filename, "r");
    if(!fp){
        fprintf(stderr, "%s can't open in %s\n", filename, __func__);
        perror("fopen");
        return NULL;
    }


    char *field;
    int state;
    size_t r = 0, c = 0;
    char ***mat = NULL;
    void *temp;

    while(field = csv_get_field(fp, ',', &state)){
        if(c == 0){
            mat = realloc(mat, (r + 1)*sizeof(*mat));
            if(!mat){
                fprintf(stderr, "realloc failed in %s\n", __func__);
                exit(EXIT_FAILURE);
            }
            mat[r] = NULL;
        }
        mat[r] = realloc(mat[r], (c + 1)*sizeof(**mat));
        if(!mat[r]){
            fprintf(stderr, "realloc failed in %s\n", __func__);
            exit(EXIT_FAILURE);
        }
        mat[r][c++] = field;
        if(state == '\n' || state == EOF){
            if(*cols == 0){
                *cols = c;
            } else if(c != *cols){
                fprintf(stderr, "line %zu doesn't match number of columns in %s\n", r, filename);
                exit(EXIT_FAILURE);
            }
            c  = 0;
            *rows = ++r;
        }
    }
    fclose(fp);

    return mat;
}

#include <ctype.h>

char *csv_trim(char *s){
    if(!s || !*s)
        return s;

    char *from, *to;

    for(from = s; *from && isspace((unsigned char)*from); ++from);
    for(to = s; *from;){
        *to++ = *from++;
    }
    *to = 0;
    while(s != to && isspace((unsigned char)to[-1])){
        *--to = 0;
    }
    return s;
}

调用它的代码如下:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <csv.h>


int main(void){
    size_t rows, cols;


    char ***mat = csv_read("simp.csv", &rows, &cols);

    size_t r, c;

    for(r = 0; r < rows; ++r){
        for(c = 0; c < cols; ++c){
            if(c)
                putchar(',');
            printf("%s", csv_trim(mat[r][c]));
            free(mat[r][c]);
        }
        puts("");
        free(mat[r]);
    }
    free(mat);
    return 0;
}

我不确定为什么会遇到错误.我得到的代码完美地在自己的文件中运行.直到我将它们放入头文件中,问题才出现.这是我在终端中进行编译的方式:

I am not sure why I am running into the errors I am running into. The code I got ran perfectly in its own file. The problems didn't arise until I placed them into a header file. This is how I compile int he terminal:

 acom test_csv.c csv.c -I. csv.h

这些是我看到的错误.

And these are the Errors I see.

In file included from test_cesv.c:5:0:
./csv.h: In function ‘csv_get_field’:
./csv.h:15:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘char’
 char *csv_trim(char *s)
 ^
test_cesv.c:142:1: error: expected ‘{’ at end of input
 }
 ^
In file included from csv.c:5:0:
csv.h: In function ‘csv_get_field’:
csv.h:15:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘char’
 char *csv_trim(char *s)
 ^
csv.c:55:67: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 char ***csv_read(const char *filename, size_t *rows, size_t *cols){
                                                                   ^
csv.c:105:24: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token
 char *csv_trim(char *s){
                        ^
csv.c:120:1: error: expected ‘{’ at end of input
 }
 ^
csv.h: In function ‘csv_get_field’:
csv.h:15:1: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘char’
 char *csv_trim(char *s)
 ^

推荐答案

尝试一下

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

//https://tools.ietf.org/html/rfc4180
char *getCSVField(FILE *fp, char separator, int *state){
    int ch = fgetc(fp);

    if(ch == EOF)
        return NULL;

    size_t size = 1, index = 0;
    char *field = malloc(size);
    bool quoted_in = false;

    for(;ch != EOF; ch = fgetc(fp)){
        if(ch == '"'){
            if(quoted_in){
                int prefetch = fgetc(fp);
                if(prefetch == '"'){
                    ch = prefetch;
                } else {
                    quoted_in = false;
                    ungetc(prefetch, fp);
                    continue;
                }
            } else {
                quoted_in = true;
                continue;
            }
        } else if(!quoted_in && (ch == separator || ch == '\n')){
            break;
        }
        field[index++] = ch;
        char *temp = realloc(field, ++size);
        if(!temp){
            perror("realloc:");
            free(field);
            exit(EXIT_FAILURE);
        }
        field = temp;
    }
    field[index] = 0;
    *state = ch;
    if(quoted_in){
        fprintf(stderr, "The quotes is not closed.\n");
        free(field);
        return NULL;
    }
    return field;
}

char ***read_csv(const char *filename, size_t *rows, size_t *cols){
    *rows = *cols = 0;

    FILE *fp = fopen(filename, "r");
    if(!fp){
        fprintf(stderr, "%s can't open in %s\n", filename, __func__);
        perror("fopen");
        return NULL;
    }

    char *field;
    int state;
    size_t r = 0, c = 0;
    char ***mat = NULL;
    void *temp;

    while(field = getCSVField(fp, ',', &state)){
        if(c == 0){
            mat = realloc(mat, (r + 1)*sizeof(*mat));
            if(!mat){
                fprintf(stderr, "realloc failed in %s\n", __func__);
                exit(EXIT_FAILURE);
            }
            mat[r] = NULL;
        }
        mat[r] = realloc(mat[r], (c + 1)*sizeof(**mat));
        if(!mat[r]){
            fprintf(stderr, "realloc failed in %s\n", __func__);
            exit(EXIT_FAILURE);
        }
        mat[r][c++] = field;
        if(state == '\n' || state == EOF){
            if(*cols == 0){
                *cols = c;
            } else if(c != *cols){
                fprintf(stderr, "line %zu doesn't match number of columns in %s\n", r, filename);
                exit(EXIT_FAILURE);
            }
            c  = 0;
            *rows = ++r;
        }
    }
    fclose(fp);

    return mat;
}

#include <ctype.h>

char *trim(char *s){
    if(!s || !*s)
        return s;

    char *from, *to;

    for(from = s; *from && isspace((unsigned char)*from); ++from);
    for(to = s; *from;){
        *to++ = *from++;
    }
    *to = 0;
    while(s != to && isspace((unsigned char)to[-1])){
        *--to = 0;
    }
    return s;
}

int main(void){
    size_t rows, cols;
    char ***mat = read_csv("simp.csv", &rows, &cols);
    for(size_t r = 0; r < rows; ++r){
        for(size_t c = 0; c < cols; ++c){
            if(c)
                putchar(',');
            printf("%s", trim(mat[r][c]));
            free(mat[r][c]);
        }
        puts("");
        free(mat[r]);
    }
    free(mat);
    return 0;
}

这篇关于使用带有动态分配的C从CSV读取和保存值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆