仅用一个替换一个文件中的多个新行 [英] Replacing multiple new lines in a file with just one

查看:105
本文介绍了仅用一个替换一个文件中的多个新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该功能应该在文本文件中搜索换行符.当找到换行符时,它会增加newLine计数器,当连续多于2条空白行时,它假定将所有空白行都压缩为一条空白行.

This function is supposed to search through a text file for the new line character. When it finds the newline character, it increments the newLine counter, and when there are more than 2 consecutive blank new lines, its suppose to squeeze all the blank lines into just one blank line.

在我的代码中,如果有2条新行,则应该将其删除并挤成一行,出于测试目的,当达到newLine < 2条件时,我也要打印新行".现在,它为每个新行打印一条新行,无论是否为空白,并且都不会消除多余的新行.我究竟做错了什么?

In my code if there are 2 new lines it's suppose to get rid of them and squeeze them into one, for testing purposes I also have it printing "new line" when it reaches the newLine < 2 condition. Right now it prints new line for every new line, whether its blank or not, and its not getting rid of the extra new lines. What am I doing wrong?

这是我的完整代码 http://pastebin.com/bsD3b38a

因此,基本上该程序是假设将两个文件连接在一起,而不是对它们执行各种操作,就像我想做的那样,它消除了多个连续的空白新行. 因此,为了在cygwin中执行它,我做了 ./a -s file1 file2 它假定将file1和file2连接到一个名为contents.txt的文件中,然后删除连续的新行并将其显示在我的cygwin终端(stdout)上. (-s调用该函数以除去连续的行).传入的第三个和第四个参数(文件1和文件2)是两个文件,假定它们可以串联在一起成为一个名为contents.txt的文件.squeeze_lines函数将读取contents.txt文件,并假定要压缩新行.您可以在下面看到我放置在file1.txt中的内容的示例. file2.txt只有一堆单词,后跟空白行.

So basically the program is suppose to concatenate two files together and than perform various operations on them, like what I'm trying to do which is get rid of multiple consecutive blank new lines. So in order to execute it in cygwin I do ./a -s file1 file2 Its suppose to concatenate file1 and file2 together into a file called contents.txt and than get rid of the consecutive new lines and display them on my cygwin terminal (stdout). (the -s calls the function to get rid of the consecutive lines). The third and fourth arguments passed in (file1 and file2) are the two files its suppose to concatenate together into one file called contents.txt The squeeze_lines function than reads the contents.txt file and is suppose to squeeze new lines. You can see below for an example for the contents I put in file1.txt. file2.txt just has a bunch of words followed by empty new lines.

int newLine = 1;
int c; 

if ((fileContents = fopen("fileContents.txt", "r")) == 0) 
{
    perror("fopen");
    return 1; 
}

while ((c = fgetc(fileContents)) != EOF)
{   
    if (c == '\n')
    {
        newLine++;
        if (newLine < 2) 
        {
            printf("new line");
            putchar(c); 
        }
    }
    else 
    {
        putchar(c); 
        newLine = 0;
    }
}

程序在具有这些内容的.txt文件中读取的文件.它假定要读取文件,摆脱开头和连续的新行,然后将新格式化的内容输出到我的cywgin终端上的stdout.

The file the program reads in a .txt file with these contents. Its suppose to read the file, get rid of the leading, and consecutive new lines, and output the new formatted contents to stdout on my cywgin terminal.

/* hello world program */


#include <stdio.h>

    tab
            2tabs

推荐答案

诊断

如果您有Unix行结尾,则逻辑看起来是正确的 .如果您具有Windows CRLF行尾,但是正在Unix上处理文件,则在每个LF之前都有一个CR,并且CR将newLine重置为零,因此您将获得每个换行符的消息.

Diagnosis

The logic looks correct if you have Unix line endings. If you have Windows CRLF line endings but are processing the file on Unix, you have a CR before each LF, and the CR resets newLine to zero, so you get the message for each newline.

这将解释您所看到的内容.

This would explain what you're seeing.

这也将解释为什么其他人都说您的逻辑是正确的(这是-假设行仅以LF而不是CRLF结尾),但是您看到了意外的结果.

It would also explain why everyone else is saying your logic is correct (it is — provided that the lines end with just LF and not CRLF) but you are seeing an unexpected result.

如何解决?

公平问题.一个主要的选择是使用dos2unix或等效的机制将DOS文件转换为Unix文件.关于SO的主题有很多问题.

Fair question. One major option is to use dos2unix or an equivalent mechanism to convert the DOS file into a Unix file. There are many questions on the subject on SO.

如果根本不需要CR(C中的'\r')字符,则可以简单地删除(不打印,且newLine不为零).

If you don't need the CR ('\r' in C) characters at all, you can simply delete (not print, and not zero newLine) those.

如果您需要保留CRLF行尾,则需要多加注意.您必须先记录自己有一个CR,然后检查是否有一个LF,然后打印该对,然后检查是否有更多的CRLF序列并抑制它们,等等.

If you need to preserve the CRLF line endings, you'll need to be a bit more careful. You'll have to record that you got a CR, then check that you get an LF, then print the pair, and then check whether you get any more CRLF sequences and suppress those, etc.

此程序仅从标准输入中读取;这比 仅从固定文件名读取.学习避免编写代码 仅适用于一个文件名;它将节省您大量的重新编译 随着时间的推移.该代码仅使用换行符("\n")处理Unix样式的文件 在最后;它还处理带有CRLF("\r\n")结尾的DOS文件;和 它还可以处理带有CR的(旧式)Mac(Mac OS 9和更早版本)文件 ("\r")行结尾.实际上,它可以处理 不同的行结束样式.如果您想执行一个 模式,您必须做一些工作来确定哪种模式,然后使用 此代码的适当子集.

This program only reads from standard input; this is more flexible than only reading from a fixed file name. Learn to avoid writing code which only works with one file name; it will save you lots of recompilation over time. Th code handles Unix-style files with newlines ("\n") only at the end; it also handles DOS files with CRLF ("\r\n") endings; and it also handles (old style) Mac (Mac OS 9 and earlier) files with CR ("\r") line endings. In fact, it handes arbitrary interleavings of the different line ending styles. If you want enforcement of a single mode, you have to do some work to decide which mode, and then use an appropriate subset of this code.

#include <stdio.h>

int main(void)
{
    FILE *fp = stdin;       // Instead of fopen()
    int newLine = 1;
    int c; 

    while ((c = fgetc(fp)) != EOF)
    {   
        if (c == '\n')
        {
            /* Unix NL line ending */
            if (newLine++ == 0)
                putchar(c); 
        }
        else if (c == '\r')
        {
            int c1 = fgetc(fp);
            if (c1 == '\n')
            {
                /* DOS CRLF line ending */
                if (newLine++ == 0)
                {
                    putchar(c);
                    putchar(c1);
                }
            }
            else
            {
                /* MAC CR line ending */
                if (newLine++ == 0)
                    putchar(c);
                if (c1 != EOF && c1 != '\r')
                    ungetc(c1, stdin);
            }
        }
        else
        {
            putchar(c); 
            newLine = 0;
        }
    }

    return 0;
}

示例运行-输入和输出

$ cat test.unx


data long enough to be seen 1 - Unix

data long enough to be seen 2 - Unix
data long enough to be seen 3 - Unix
data long enough to be seen 4 - Unix



data long enough to be seen 5 - Unix


$ sed 's/Unix/DOS/g' test.unx | ule -d > test.dos
$ cat test.dos


data long enough to be seen 1 - DOS

data long enough to be seen 2 - DOS
data long enough to be seen 3 - DOS
data long enough to be seen 4 - DOS



data long enough to be seen 5 - DOS


$ sed 's/Unix/Mac/g' test.unx | ule -m > test.mac
$ cat test.mac
$ ta long enough to be seen 5 - Mac
$ odx test.mac
0x0000: 0D 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75   ..data long enou
0x0010: 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20   gh to be seen 1 
0x0020: 2D 20 4D 61 63 0D 0D 64 61 74 61 20 6C 6F 6E 67   - Mac..data long
0x0030: 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65    enough to be se
0x0040: 65 6E 20 32 20 2D 20 4D 61 63 0D 64 61 74 61 20   en 2 - Mac.data 
0x0050: 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20 62   long enough to b
0x0060: 65 20 73 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64   e seen 3 - Mac.d
0x0070: 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20   ata long enough 
0x0080: 74 6F 20 62 65 20 73 65 65 6E 20 34 20 2D 20 4D   to be seen 4 - M
0x0090: 61 63 0D 0D 0D 0D 64 61 74 61 20 6C 6F 6E 67 20   ac....data long 
0x00A0: 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65   enough to be see
0x00B0: 6E 20 35 20 2D 20 4D 61 63 0D 0D 0D               n 5 - Mac...
0x00BC:
$ dupnl < test.unx
data long enough to be seen 1 - Unix
data long enough to be seen 2 - Unix
data long enough to be seen 3 - Unix
data long enough to be seen 4 - Unix
data long enough to be seen 5 - Unix
$ dupnl < test.dos
data long enough to be seen 1 - DOS
data long enough to be seen 2 - DOS
data long enough to be seen 3 - DOS
data long enough to be seen 4 - DOS
data long enough to be seen 5 - DOS
$ dupnl < test.mac
$ ta long enough to be seen 5 - Mac
$ dupnl < test.mac | odx
0x0000: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68   data long enough
0x0010: 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20 2D 20    to be seen 1 - 
0x0020: 4D 61 63 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E   Mac.data long en
0x0030: 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20   ough to be seen 
0x0040: 32 20 2D 20 4D 61 63 0D 64 61 74 61 20 6C 6F 6E   2 - Mac.data lon
0x0050: 67 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73   g enough to be s
0x0060: 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64 61 74 61   een 3 - Mac.data
0x0070: 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20    long enough to 
0x0080: 62 65 20 73 65 65 6E 20 34 20 2D 20 4D 61 63 0D   be seen 4 - Mac.
0x0090: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68   data long enough
0x00A0: 20 74 6F 20 62 65 20 73 65 65 6E 20 35 20 2D 20    to be seen 5 - 
0x00B0: 4D 61 63 0D                                       Mac.
0x00B4:
$

$ ta开头的行是提示符覆盖先前输出的地方(而足够长的时间可以看到"部分是因为我的提示符通常比$长).

The lines starting $ ta are where the prompt overwrites the previous output (and the 'long enough to be seen' part is because my prompt is normally longer than just $).

odx是十六进制转储程序. ule用于统一的行尾",并分析或转换数据,使其具有统一的行尾.

odx is a hex dump program. ule is for 'uniform line endings' and analyzes or transforms data so it has uniform line endings.

Usage: ule [-cdhmnsuzV] [file ...]
  -c  Check line endings (default)
  -d  Convert to DOS (CRLF) line endings
  -h  Print this help and exit
  -m  Convert to MAC (CR) line endings
  -n  Ensure line ending at end of file
  -s  Write output to standard output (default)
  -u  Convert to Unix (LF) line endings
  -z  Check for zero (null) bytes
  -V  Print version information and exit

这篇关于仅用一个替换一个文件中的多个新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆