在C中通过stdin读取大列表 [英] reading large lists through stdin in C

查看:75
本文介绍了在C中通过stdin读取大列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我的程序要通过stdin传递大量数字,那么读入此数字的最有效方法是什么?

If my program is going to have large lists of numbers passed in through stdin, what would be the most efficient way of reading this in?

我要传递到程序中的输入将采用以下格式:

The input I'm going to be passing into the program is going to be of the following format:

3,5;6,7;8,9;11,4;; 

我需要处理输入,以便可以使用冒号之间的数字(即我希望能够使用3和5、6和7等). ;;表示它是该行的结尾.

I need to process the input so that I can use the numbers between the colons (i.e I want to be able to use 3 and 5, 6 and 7 etc etc). The ;; indicates that it is the end of the line.

我当时在考虑使用缓冲读取器读取整行,然后使用parseInt.

I was thinking of using a buffered reader to read entire lines and then using parseInt.

这将是最有效的方法吗?

Would this be the most efficient way of doing it?

推荐答案

另一种解决此问题的优雅方法是允许strtol通过将要读取的字符串推进到endptr来解析输入,如strtol.结合根据需要分配/重新分配的数组,您应该能够处理任何长度的行(直到内存耗尽).下面的示例对数据使用单个数组.如果要存储多行,每行存储为一个单独的数组,则可以使用相同的方法,但从指向int的指针数组的指针开始. (即int **numbers并分配指针,然后分配每个数组).如果您有任何问题,请告诉我:

One other fairly elegant way to handle this is to allow strtol to parse the input by advancing the string to be read to endptr as returned by strtol. Combined with an array allocated/reallocated as needed, you should be able to handle lines of any length (up to memory exhaustion). The example below uses a single array for the data. If you want to store multiple lines, each as a separate array, you can use the same approach, but start with a pointer to array of pointers to int. (i.e. int **numbers and allocate the pointers and then each array). Let me know if you have questions:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define NMAX 256

int main () {

    char *ln = NULL;                /* NULL forces getline to allocate  */
    size_t n = 0;                   /* max chars to read (0 - no limit) */
    ssize_t nchr = 0;               /* number of chars actually read    */
    int *numbers = NULL;            /* array to hold numbers            */
    size_t nmax = NMAX;             /* check for reallocation           */
    size_t idx = 0;                 /* numbers array index              */

    if (!(numbers = calloc (NMAX, sizeof *numbers))) {
        fprintf (stderr, "error: memory allocation failed.");
        return 1;
    }

    /* read each line from stdin - dynamicallly allocated   */
    while ((nchr = getline (&ln, &n, stdin)) != -1)
    {
        char *p = ln;       /* pointer for use with strtol  */
        char *ep = NULL;

        errno = 0;
        while (errno == 0)
        {
            /* parse/convert each number on stdin   */ 
            numbers[idx] = strtol (p, &ep, 10);
            /* note: overflow/underflow checks omitted */
            /* if valid conversion to number */
            if (errno == 0 && p != ep)
            {
                idx++;              /* increment index      */
                if (!ep) break;     /* check for end of str */
            }

            /* skip delimiters/move pointer to next digit   */
            while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
            if (*ep) 
                p = ep;
            else 
                break;

            /* reallocate numbers if idx = nmax */
            if (idx == nmax)
            {
                int *tmp = realloc (numbers, 2 * nmax * sizeof *numbers);
                if (!tmp) {
                    fprintf (stderr, "Error: struct reallocation failure.\n");
                    exit (EXIT_FAILURE);
                }
                numbers = tmp;
                memset (numbers + nmax, 0, nmax * sizeof *numbers);
                nmax *= 2;
            }
        }
    }

    /* free mem allocated by getline */
    if (ln) free (ln);

    /* show values stored in array   */
    size_t i = 0;
    for (i = 0; i < idx; i++)
        printf (" numbers[%2zu]  %d\n", i, numbers[i]);

    /* free mem allocate to numbers  */
    if (numbers) free (numbers);

    return 0;
}

输出

$ echo "3,5;6,7;8,9;11,4;;" | ./bin/prsistdin
 numbers[ 0]  3
 numbers[ 1]  5
 numbers[ 2]  6
 numbers[ 3]  7
 numbers[ 4]  8
 numbers[ 5]  11
 numbers[ 6]  4

也可以将字符串存储在文件中的方式为:

$ cat dat/numsemic.csv | ./bin/prsistdin
or
$ ./bin/prsistdin < dat/numsemic.csv


使用fgets,不使用size_t

我花了一点功夫才想出一个修订版,我对删除的getline和替换的fgets感到满意. getline更加灵活,可以为您处理空间分配,而fgets则由您决定. (更不用说getline无需调用strlen即可返回读取的字符的实际数量).


Using fgets and without size_t

It took a little reworking to come up with a revision I was happy with that eliminated getline and substituted fgets. getline is far more flexible, handling the allocation of space for you, with fgets it is up to you. (not to mention getline returning the actual number of chars read without having to call strlen).

我在这里的目标是保留读取任何长度的线的能力,以满足您的要求.这要么意味着最初分配一些巨大的行缓冲区(浪费),要么想出一个方案,如果该行比最初分配给ln的空间长,则可以根据需要重新分配输入行缓冲区. (这就是getline做得很好).我对结果感到满意. 注意:我将重新分配代码放入函数中,以保持main合理干净. 脚注2

My goal here was to preserve the ability to read any length line to meet your requirement. That either meant initially allocating some huge line buffer (wasteful) or coming up with a scheme that would reallocate the input line buffer as needed in the event it was longer than the space initially allocate to ln. (this is what getline does so well). I'm reasonably happy with the results. Note: I put the reallocation code in functions to keep main reasonably clean. footnote 2

看看下面的代码.注意,如果要在每次分配时吐出-DDEBUG标志,我已经在代码中保留了DEBUG预处理程序指令. [脚注1] ,您可以使用以下代码编译代码:

Take a look at the following code. Note, I have left the DEBUG preprocessor directives in the code allowing you to compile with the -DDEBUG flag if you want to have it spit out each time it allocates. [footnote 1] You can compile the code with:

gcc -Wall -Wextra -o yourexename yourfilename.c

,或者如果要调试输出(例如,将LMAX设置为2或小于行长的内容),请使用以下命令:

or if you want the debugging output (e.g. set LMAX to 2 or something less than the line length), use the following:

gcc -Wall -Wextra -o yourexename yourfilename.c -DDEBUG

如果您有任何问题,请告诉我:

Let me know if you have questions:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

#define NMAX 256
#define LMAX 1024

char *realloc_char (char *sp, unsigned int *n); /* reallocate char array    */
int *realloc_int (int *sp, unsigned int *n);    /* reallocate int array     */
char *fixshortread (FILE *fp, char **s, unsigned int *n); /* read all stdin */

int main () {

    char *ln = NULL;                    /* dynamically allocated for fgets  */
    int *numbers = NULL;                /* array to hold numbers            */
    unsigned int nmax = NMAX;           /* numbers check for reallocation   */
    unsigned int lmax = LMAX;           /* ln check for reallocation        */
    unsigned int idx = 0;               /* numbers array index              */
    unsigned int i = 0;                 /* simple counter variable          */
    char *nl = NULL;

    /* initial allocation for numbers */
    if (!(numbers = calloc (NMAX, sizeof *numbers))) {
        fprintf (stderr, "error: memory allocation failed (numbers).");
        return 1;
    }

    /* initial allocation for ln */
    if (!(ln = calloc (LMAX, sizeof *ln))) {
        fprintf (stderr, "error: memory allocation failed (ln).");
        return 1;
    }

    /* read each line from stdin - dynamicallly allocated   */
    while (fgets (ln, lmax, stdin) != NULL)
    {
        /* provide a fallback to read remainder of line
        if the line length exceeds lmax */
        if (!(nl = strchr (ln, '\n')))
            fixshortread (stdin, &ln, &lmax); 
        else
            *nl = 0;

        char *p = ln;       /* pointer for use with strtol  */
        char *ep = NULL;

        errno = 0;
        while (errno == 0)
        {
            /* parse/convert each number on stdin   */
            numbers[idx] = strtol (p, &ep, 10);
            /* note: overflow/underflow checks omitted */
            /* if valid conversion to number */
            if (errno == 0 && p != ep)
            {
                idx++;              /* increment index      */
                if (!ep) break;     /* check for end of str */
            }

            /* skip delimiters/move pointer to next digit   */
            while (*ep && (*ep <= '0' || *ep >= '9')) ep++;
            if (*ep)
                p = ep;
            else
                break;

            /* reallocate numbers if idx = nmax */
            if (idx == nmax)
                realloc_int (numbers, &nmax);
        }
    }

    /* free mem allocated by getline */
    if (ln) free (ln);

    /* show values stored in array   */
    for (i = 0; i < idx; i++)
        printf (" numbers[%2u]  %d\n", (unsigned int)i, numbers[i]);

    /* free mem allocate to numbers  */
    if (numbers) free (numbers);

    return 0;
}

/* reallocate character pointer memory */
char *realloc_char (char *sp, unsigned int *n)
{
    char *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
    printf ("\n  reallocating %u to %u\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "Error: char pointer reallocation failure.\n");
        exit (EXIT_FAILURE);
    }
    sp = tmp;
    memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
    *n *= 2;

    return sp;
}

/* reallocate integer pointer memory */
int *realloc_int (int *sp, unsigned int *n)
{
    int *tmp = realloc (sp, 2 * *n * sizeof *sp);
#ifdef DEBUG
    printf ("\n  reallocating %u to %u\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "Error: int pointer reallocation failure.\n");
        exit (EXIT_FAILURE);
    }
    sp = tmp;
    memset (sp + *n, 0, *n * sizeof *sp); /* memset new ptrs 0 */
    *n *= 2;

    return sp;
}

/* if fgets fails to read entire line, fix short read */
char *fixshortread (FILE *fp, char **s, unsigned int *n)
{
    unsigned int i = 0;
    int c = 0;

    i = *n - 1;
    realloc_char (*s, n);
    do
    {
        c = fgetc (fp);
        (*s)[i] = c;
        i++;
        if (i == *n)
            realloc_char (*s, n);
    } while (c != '\n' && c != EOF);
    (*s)[i-1] = 0;

    return *s;
}

脚注1

对于单词DEBUG的选择没什么特别的(可能是DOG等.)要点是,如果您想有条件地包含/排除代码,则可以简单地使用预处理器标志要做到这一点.您只需添加-Dflagname即可将flagname传递给编译器.

nothing special about the choice of the word DEBUG (it could have been DOG, etc..), the point to take away is if you want to conditionally include/exclude code, you can simply use preprocessor flags to do that. You just add -Dflagname to pass flagname to the compiler.

脚注2

您可以将重新分配函数组合成一个单独的void*函数,该函数接受一个void指针作为其参数以及要重新分配的类型的size,并返回一个指向重新分配空间的void指针-但是我们将将其留待以后.

you can combine the reallocation functions into a single void* function that accepts a void pointer as its argument along with the size of the type to be reallocated and returns a void pointer to the reallocated space -- but we will leave that for a later date.

这篇关于在C中通过stdin读取大列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆