使用C删除文件中的回车和换行? [英] Delete carriage return and line feed on file using C ?

查看:89
本文介绍了使用C删除文件中的回车和换行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿朋友

我想在我的*文件上删除回车和换行:)

thnx



我尝试了什么:



我没有尝试过我只想要想法

hey friends
I want delete "carriage return " and "line feed" on my *file :)
thnx

What I have tried:

I nothing tried I just want idea

推荐答案

只删除任何CR或NL的简单解决方案是
A simple solution to only remove any occurrence of CR or NL is
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    const char *remove_any_of = "\n\r";
    int c;
    while((c = getchar()) != EOF) {
        if (!strchr(remove_any_of, c)) putchar(c);
    }
    return EXIT_SUCCESS;
}

请记住,这适用于ASCII编码文本以及UTF-8编码文本,但不适用于任何其他文本。



使用方法与PIEBALDconsult建议的命令行过滤器一样:使用stdin并写入stdout。



另外,对于不太大的文件,你可能会啜泣整个文件放入一个内存缓冲区并处理该缓冲区中的每个字符,最后将处理后的缓冲区转储回文件。这样可以正常工作,因为文件变小了,并且永远不会比原始文件大 - 所以如果需要,你不需要分配更多的空间。



干杯

Andi



PS:上面提到的替代解决方案的示例版本(具有完整的错误处理)可能如下。玩得开心! : - )

Keep in mind that this works well for ASCII encoded text as well as UTF-8 encoded text, but not any other text.

The usage is as suggested by PIEBALDconsult as command line filter: consuming stdin and writing to stdout.

Alternatively, for not too large files, you could slurp the entire file into one memory buffer and process each character in that buffer, and finally dump the processed buffer back to the file. This will work fine since the file gets smaller and never larger than the original file - so you do not have to take care of allocating more space if needed.

Cheers
Andi

PS: A sample version for the alternative solution mentioned above (with full blown error handling) might be as follows. Have fun! :-)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char *buffer;
    size_t capacity;
    size_t next_pos;
} pos_t;

pos_t slurp(const char *file_name)
{
    pos_t pos = { NULL, 0, 0 };
    FILE *file = fopen(file_name, "rb");
    if (!file) {
        perror("Failed to open file in read mode");
        return pos;
    }
    if (fseek(file, 0, SEEK_END)) {
        perror("Failed to go to the end of the file");
        fclose(file);
        return pos;
    }
    long size = ftell(file);
    if (size < 0) {
        perror("Failed to get the length of the file");
        fclose(file);
        return pos;
    }
    rewind(file);

    char *slurped = malloc(size);
    if (!slurped) {
        perror("Failed to allocate memory");
        fclose(file);
        return pos;
    }
    if (fread(slurped, 1, size, file) != size) {
        perror("Failed to slurp the file");
        free(slurped);
        fclose(file);
        return pos;
    }
    fclose(file);
    pos.buffer = slurped;
    pos.capacity = size;

    printf("slurp: %s = %ld bytes\n", file_name, size);

    return pos;
}

void dump(const char *file_name, pos_t *pos)
{
    FILE *file = fopen(file_name, "wb");
    if (!file) {
        perror("Failed to open file in write mode");
        return;
    }

    if (fwrite(pos->buffer, 1, pos->next_pos, file) != pos->next_pos) {
        perror("Failed to write to the file");
        fclose(file);
        return;
    }
    if (fclose(file)) {
        perror("Failed to close file");
        return;
    }

    printf("dump: %s = %ld bytes\n", file_name, pos->next_pos);
}

int read_char(pos_t *pos)
{
    if (pos->capacity <= pos->next_pos) return EOF;
    return pos->buffer[pos->next_pos++];
}

void write_char(char c, pos_t *pos)
{
    if (pos->capacity <= pos->next_pos) return;
    pos->buffer[pos->next_pos++] = c;
}

int main(unsigned int argc, const char *argv[])
{
    if (argc < 2) {
        printf("Missing file name\n");
        return EXIT_FAILURE;
    }
    
    const char *file_name = argv[1];
    pos_t read = slurp(file_name);
    pos_t write = read; // share the buffer
    if (read.capacity == 0) {
        printf("Rejected to process file %s\n", file_name);
        return EXIT_FAILURE;
    }

    const char *remove_any_of = "\n\r";
    int c;
    while((c = read_char(&read)) != EOF) {
        if (!strchr(remove_any_of, c)) write_char(c, &write);
    }
    dump(file_name, &write);

    free(read.buffer);

    return EXIT_SUCCESS;
}


由于一个简单的原因,在单个文件中以任何简单的方式很难做到。比方说,你有一个大文件,你在一开始就删除了一个字符。随着其他内容的转移,这意味着您必须重写文件的整个内容。这正是线条末端会发生的事情。



它应该带你到下一个想法:如果你必须重写文件(除了一小部分)在第一行结束之前),你必须承认它并重写它。适当的解决方案是:以只读方式打开源文件,以只写方式打开新的临时文件,并将源文件的所有内容写入临时文件,仅跳过行尾字符。您可以在一些数据块(例如1M)中执行此操作,并在复制之前从字符串中删除不需要的字符,使用一些字符串替换方法。如果你想要简单,那么块可以是一个字符。



完成后,你可以关闭原始文件和临时文件,然后重命名(移动)临时文件到原始文件名,有效地丢弃它的旧版本和临时文件。



现在,一个小问题是:什么是行尾。不幸的是,自然界的丑陋事实是:这是一个依赖于平台的字符串。请参阅: Newline - Wikipedia,免费的百科全书



实际上,如果异常放置行尾字符并将其全部删除,则可以忽略这种可能性,这意味着删除所有0xA和0xD字符。您可能还想删除很少使用的Unicode分隔符。



-SA
It's hard to do in any simple way in a single file, by one simple reason. Let's say, you have a big file, and you remove one character at the very beginning. As the rest of the content is shifted, it means that you have to rewrite the whole content of the file. This is exactly what's going to happen with ends of lines.

It should bring you to the next idea: if you have to rewrite the file (except perhaps a small portion before the first end-of-line), you have to admit it and rewrite it. The adequate solution is: open a source file as read-only, a new temporary file as write-only, and write all the content of the source file into a temporary file, skipping just the end-of-line characters. You can do it in some chunks of data (say, 1M) and remove unwanted characters from the string before copying it, using some string replacement method. If you want simplicity, the chunk could be one character.

When this is done, you can close the original and temporary files and then rename (move) the temporary file to the original file name, effectively discarding old version of it and the temporary file.

And now, one little problem is: what is the "end of line". Unfortunately, the ugly fact of nature is: this is a platform-dependent string. Please see: Newline — Wikipedia, the free encyclopedia.

In practice, you can ignore the possibility if having abnormally placed end-of-line characters and simply remove them all, which means removing all 0xA and 0xD characters. You may also want to remove rarely used Unicode separator characters.

—SA


这篇关于使用C删除文件中的回车和换行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆