从文件中删除尾部和前导空格 [英] removing trailing and leading spaces from a file

查看:77
本文介绍了从文件中删除尾部和前导空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从长度未知的文本文件中读取行.在该行中可以有前导和尾随空格,直到出现字符串为止.因此,我的第一步是逐行读取并为字符串分配内存.然后删除所有前导和尾随空格.之后,我要检查字符串中是否包含任何空格字符,这是一个无效字符.例如,字符串看起来不能像这样的坏字符串" ,而是看起来像这样的"goodstring" .但是,当我调用该功能删除前导空格和尾随空格时,它还会删除空格前后的字符.

I am trying to read lines from a text file of unknown length. In the line there can be leading and trailing white-spaces until the string occurs. So my first step is to read line by line and allocate memory for the strings. Then remove all the leading and trailing white spaces. After that I want to check if the string has any white space characters in it which is an invalid character. For example the string can not look like this "bad string" but can look like this "goodstring". However when I call the function to remove the leading and trailing white spaces it also removes characters before or after a white space.

有人可以告诉我我在做什么错吗?

Could someone tell me what I am doing wrong?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define NCHAR 64

char *readline (FILE *fp, char **buffer);
char *strstrip(char *s);


int main (int argc, char **argv) {

    char *line = NULL;
    size_t idx = 0;
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    if (!fp) {
        fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
        return 1;
    }

    while (readline (fp, &line)) {  /* read each line in 'fp' */
        printf (" line[%2zu] : %s\n", idx++, line);
        free (line);
        line = NULL;
    }
    if (fp != stdin) fclose (fp);

    return  0;
}

/* read line from 'fp' allocate *buffer NCHAR in size
 * realloc as necessary. Returns a pointer to *buffer
 * on success, NULL otherwise.
 */
char *readline (FILE *fp, char **buffer) 
{
    int ch;
    size_t buflen = 0, nchar = NCHAR;
    size_t n;
    char *invalid_character = " ";

    *buffer = malloc (nchar);    /* allocate buffer nchar in length */
    if (!*buffer) {
        fprintf (stderr, "readline() error: virtual memory exhausted.\n");
        return NULL;
    }

    while ((ch = fgetc(fp)) != '\n' && ch != EOF) 
    {
        (*buffer)[buflen++] = ch;

        if (buflen + 1 >= nchar) {  /* realloc */
            char *tmp = realloc (*buffer, nchar * 2);
            if (!tmp) {
                fprintf (stderr, "error: realloc failed, "
                                "returning partial buffer.\n");
                (*buffer)[buflen] = 0;
                return *buffer;
            }
            *buffer = tmp;
            nchar *= 2;
        }
        strstrip(*buffer); //remove traiing/leading spaces
    }
    (*buffer)[buflen] = 0;           /* nul-terminate */


   if (invalid_character[n = strspn(invalid_character, *buffer)] == '\0') //check if a string has invalid character ' ' in it
    {
        puts(" invalid characters");

    } 

    if (buflen == 0 && ch == EOF) {  /* return NULL if nothing read */
        free (*buffer);
        *buffer = NULL;
    }

    return *buffer;
}
char *strstrip(char *s)
{
    size_t size;
    char *end;

    size = strlen(s);

    if (!size)
        return s;

    end = s + size - 1;
    while (end >= s && isspace(*end))
        end--;
    *(end + 1) = '\0';

    while (*s && isspace(*s))
        s++;

    return s;
}

推荐答案

您无需担心传递给 strstrip()的字符串的长度,只需遍历字符串中的所有字符删除空格字符,例如以下版本从 s 中删除了 ALL 空格:

You do not need to worry about the length of the string passed to strstrip(), simply iterate over all characters in the string removing whitespace characters, e.g. the following version removals ALL whitespace from s:

/** remove ALL leading, interleaved and trailing whitespace, in place.
 *  the original start address is preserved but due to reindexing,
 *  the contents of the original are not preserved. returns pointer
 *  to 's'. (ctype.h required)
 */
char *strstrip (char *s)
{
    if (!s) return NULL;                     /* valdiate string not NULL */
    if (!*s) return s;                       /* handle empty string */

    char *p = s, *wp = s;                    /* pointer and write-pointer */

    while (*p) {                             /* loop over each character */
        while (isspace ((unsigned char)*p))  /* if whitespace advance ptr */
            p++;
        *wp++ = *p;                          /* use non-ws char */
        if (*p)
            p++;
    }
    *wp = 0;    /* nul-terminate */

    return s;
}

(注意:如果 isspace()的参数类型为 char ,则转换为 unsigned char 是必需的,请参见注释部分,例如 man3 isalpha )

(note: if the argument to isspace() is type char, a cast to unsigned char is required, see NOTES Section, e.g. man 3 isalpha)

仅删除多余的空格

以下版本删除了开头和结尾的空格,并将多个空格序列折叠为一个空格:

The following version removes leading and trailing whitespace and collapses multiple sequences of whitespace to a single space:

/** remove excess leading, interleaved and trailing whitespace, in place.
 *  the original start address is preserved but due to reindexing,
 *  the contents of the original are not preserved. returns pointer
 *  to 's'. (ctype.h required) NOTE: LATEST
 */
char *strstrip (char *s)
{
    if (!s) return NULL;                        /* valdiate string not NULL */
    if (!*s) return s;                               /* handle empty string */

    char *p = s, *wp = s;                      /* pointer and write-pointer */

    while (*p) {
        if (isspace((unsigned char)*p)) {                    /* test for ws */
            if (wp > s)                         /* ignore leading ws, while */
                *wp++ = *p;                   /* preserving 1 between words */
            while (*p && isspace (unsigned char)(*p))    /* skip remainder  */
                p++;
            if (!*p)                               /* bail on end-of-string */
                break;
        }
        if (*p == '.')                 /* handle space between word and '.' */
            while (wp > s && isspace ((unsigned char)*(wp - 1)))
                wp--;
        *wp++ = *p;                                      /* use non-ws char */
        p++;
    }
    while (wp > s && isspace ((unsigned char)*(wp - 1))) /* trim trailing ws */
        wp--;
    *wp = 0;    /* nul-terminate */

    return s;
}

(注释: s 必须是 mutable ,因此不能是 string-literal )

(note: s must be mutable and therefore cannot be a string-literal)

这篇关于从文件中删除尾部和前导空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆