从文件中读取行时 C++ 切断字符 [英] C++ cutting off character(s) when read lines from file

查看:32
本文介绍了从文件中读取行时 C++ 切断字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这与 Windows 和 linux 中行尾指示符之间的差异有关,但我不知道如何解决.

I know this has to do with the differences between the end-of-line designators in Windows and linux but I don't know how to fix it.

我确实看过帖子让 std::ifstream 处理 LF、CR 和 CRLF?但是当我使用该帖子的简化版本时(我使用直接读取而不是缓冲读取,知道存在性能损失但现在想保持简单),它没有解决我的问题,所以我希望在这里得到一些指导.我确实测试了我修改后的帖子版本,它确实成功地找到并替换了我临时用于测试场景的字符和制表符,所以逻辑是有效的,但我仍然有问题.

I did look at the posting at Getting std :: ifstream to handle LF, CR, and CRLF? but when I used a simplified version from that post (I used straight reads instead of buffered reads, knowing there was a performance penalty but wanting to keep it simple for now), it did not solve my problem so I am hoping for some guidance here. I did test my modified version of the post and it did successfully find and replace characters and a tab that I temporarily used for a test scenario, so the logic is working but I still have the problem.

我知道我在这里遗漏了一些非常基本的东西,当有人帮助我解决这个问题时,我可能会感到非常愚蠢,所以我宁愿不公开承认我的愚蠢,但我已经为此工作了一个星期,并且无法解决,所以我寻求帮助.

I know I am missing something very basic here and I am likely going to feel very stupid when someone helps me figure this out, so I would rather not admit my stupidity publicly but I have been working on this for a week now and cannot solve it so I am reaching out for help.

我是 C++ 的新手,所以如果我在这里做一些真正的菜鸟,请对你的回答保持温和:-)

I am new to C++ so please be gentle in you answers if I am doing something really noobie here :-)

我创建了以下单文件程序,用于对我想做的事情进行原型设计.所以这是一个简单的例子,但我需要让它发挥作用才能走得更远.这不是作业问题;我真的需要解决这个问题才能创建一个应用程序.

I have the following one-file program that I have created to prototype what I want to do. So this is a simple example, but I need to get this to work to go further. This is NOT a homework problem; I really need to get this solved to create an application.

程序(如下所示):

  • 编译没有错误或警告,并在 CentOS 机器上干净地运行;

  • compiles without error or warnings and runs cleanly on a CentOS box;

在 CentOS 机器上使用 mingw32 交叉编译没有错误或警告,并在 Windows 上干净地运行;

cross compiles without error or warnings using mingw32 on a CentOS box and runs cleanly on Windows;

所以是的,它与 linux 和 Windows 之间的不同文件格式有关,并且可能与换行代码有关,但我试图适应这一点,但它不起作用.

So yes, it has something to do with the different file formats between linux and Windows and it likely has to do with the newline codes, but I have tried to accommodate that and it does not work.

为了让它变得更复杂,我发现旧的 Mac 换行符再次不同:

To make it more complicated, I have discovered that old Mac newline characters are different yet again:

  • linux = \n
  • Windows = \r\n
  • Mac = \r

.

请帮忙!...

.

我想:

  1. 读入一个txt文件的内容
  2. 对内容进行一些验证检查(此处未完成;下一步将进行)
  3. 将报告输出到另一个 txt 文件

所以我需要检查文件,确定正在使用的换行符并相应地处理

so I need to check the file, determine the newline character(s) being used and handle accordingly

有什么建议吗?

我当前的(简化的)代码(还没有验证检查)是:

My current (simplified) code (with no validation checks yet) is:

[代码]

int main(int argc, char** argv)
{
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    bool failure_flag = false;

    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    if ( ! rc_input_file_holder.is_open() )
    {
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       if ( ! rc_output_file_holder.is_open() )
       {
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          std::streampos char_num = 0;

          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             line_num = line_num + 1;
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                if ( file_line[ char_num ] == '\n' )
                {
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    {
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       else
                       {
                          file_line[ char_num ] = ' ';
                       }
                    }
                }
             }

             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             starting_char_pos = rc_input_file_holder.tellg();

             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

[/code]

带有大量注释的相同代码(为了我的学习经验)是:

The same code, with lots of comments (for my benefit as a learning experience) is:

[代码]

/*
 * The main function, from which all else is accessed
 */
int main(int argc, char** argv)
{


    /*
    *Program to:
    *  1) read from a text file
    *  2) do some validation checks on the content of that text file
    *  3) output a report to another text file
    */

    // Set the filenames to be used in this file-handling program
    std::string rc_input_file_name = "rc_input_file.txt";
    std::string rc_output_file_name = "rc_output_file.txt";

    // Note that when the filenames are used in the .open statements below
    //   they have to be in a cstring format, not a string format
    //   so the conversion is done here once
    // Use the Capitalized form of the file name to indicate the converted value
    //   (remember, variable names are case-sensitive in C/C++ so NAME is different than name)
    // This conversion could be done 3 ways:
    // - done each time the cstring is needed: 
    //          file_holder_name.open( string_file_name.c_str() )
    // - done once and referred to each time
    //     simple method: 
    //          const char * converted_file_name = string_file_name.c_str()
    //     explicit method (2-step):              
    //          char * converted_file_name = new char[ string_file_name.length() + 1 ];
    //          strcpy( converted_file_name, string_file_name.c_str() );
    // This program uses the explicit method to do it once for each filename
    // because by doing so, the char array created has variable length
    // and you do not risk buffer overflow
    char * RC_INPUT_FILE_NAME = new char[ rc_input_file_name.length() + 1 ];
    strcpy( RC_INPUT_FILE_NAME, rc_input_file_name.c_str() );
    char * RC_OUTPUT_FILE_NAME = new char[ rc_output_file_name.length() + 1 ];
    strcpy( RC_OUTPUT_FILE_NAME, rc_output_file_name.c_str() );

    // This will be set to true if either the input or output file cannot be opened
    bool failure_flag = false;

    // Open the input file
    std::ifstream rc_input_file_holder;
    rc_input_file_holder.open( RC_INPUT_FILE_NAME , std::ios::in );

    // Validate that the input file was properly opened/created
    // If not, set failure flag
    if ( ! rc_input_file_holder.is_open() )
    {
       // Could not open the input file; set failure flag to true
       std::cout << "Error - Could not open the input file" << std::endl;
       failure_flag = true;
    }
    else
    {
       // Open the output file
       // Create one if none previously existed
       // Erase the contents if it already existed
       std::ofstream rc_output_file_holder;
       rc_output_file_holder.open( RC_OUTPUT_FILE_NAME , std::ios::out | std::ios::trunc );

       // Validate that the output file was properly opened/created
       // If not, set failure flag
       if ( ! rc_output_file_holder.is_open() )
       {
          // Could not open the output file; set failure flag to true
          std::cout << "Error - Could not open or create the output file" << std::endl;
          failure_flag = true;
       }
       else
       {
          // Get the current position where the character pointer is at
          // Get it before the getline is executed so it gives you where the current line starts
          std::streampos char_num = 0;

          // Initialize the line_number and starting character position to 0
          long int line_num = 0;
          long int starting_char_pos = 0;

          std::string file_line = "";
          while ( getline( rc_input_file_holder , file_line ) )
          {
             // Set the line number counter to the current line (first line is Line 1, not 0)
             line_num = line_num + 1;


             // Check if the new line designator uses the standard for:
             //   - linux (\n)
             //   - Windows (\r\n)
             //   - Old Mac (\r)
             // Convert any non-linux new line designator to linux new line designator (\n)
             long int file_line_length = file_line.length() +1 ;
             long int char_num = 0;
             for ( char_num = 0 ; char_num < file_line_length ;  char_num++ )
             {
                // If a \r character is found, decide what to do with it
                if ( file_line[ char_num ] == '\n' )
                {
                    // If the \r char  is the last line character (before the null terminator)
                    //   the file use the old Mac format to indicate new line
                    //   so replace the \r with \n
                    if ( char_num == file_line_length - 1 )
                    {
                       file_line[ char_num ] = '-';
                    }
                    else
                    // If the \r char is NOT the last line character (before the null terminator)
                    {
                       // If the next character is a \n, the file uses the Windows format to indicate new line
                       //   so replace the \r with space
                       if ( file_line[ char_num + 1 ] == '\n' )
                       {
                          file_line[ char_num ] = ' ';
                       }
                       // If the next char is NOT a \n (and the pointer is NOT at the last line character)
                       //   then for some reason, there is a \r in the interior of the string
                       // At this point, I do  not know why this would be
                       //   but I don't want it left there, so replace it with a space
                       // Yes, I  know this is the same as the above action, 
                       //   but I left is separate to allow for future flexibility
                       else
                       {
                          file_line[ char_num ] = '-';
                       }
                    }
                }
             }


             // Output the contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the validation check(s) for the line would occur)
             //   and would likely be done in a function or class
             // The setw() function requires #include <iomanip>
             int field_display_width = 4;
             std::cout << "Line " << std::setw( field_display_width ) << line_num << 
                    ", starting at character position " << std::setw( field_display_width ) << starting_char_pos << 
                    ", contains " << file_line << "." << std::endl;

             // Reset the character pointer to the end of this line => start of next line
             starting_char_pos = rc_input_file_holder.tellg();

             // Output the (edited) contents of the line just fetched
             // This is done in this prototype file as a placeholder
             // In the real program, this is where the results of the validation checks would be recorded
             // You could put this in an if statement and record nothing if the line was valid
             rc_output_file_holder << "Line " << line_num << ": " << file_line << std::endl;
          }

          // Clean up by:
          //  - closing the files that were opened (input and output)
          //  - deleting the character arrays created
          rc_input_file_holder.close();
          rc_output_file_holder.close();
          delete [] RC_INPUT_FILE_NAME;
          delete [] RC_OUTPUT_FILE_NAME;
       }
    }

    // Check to see if all operations have successfully completed
    // If so exit this program with success indicated
    // If not,exit this program with failure indicated
    if ( failure_flag )
    {
       return EXIT_FAILURE;
    }
    else
    {
       return EXIT_SUCCESS;
    }
}

[/code]

我拥有所有正确的包含,并且在为 linux 编译或为 Windows 交叉编译时没有生成错误或警告.

I have all the proper includes and there are no errors or warnings generated when I compile for linux or cross-compile for Windows.

我使用的输入文件只有 5 行(愚蠢的)文本:

The input file I am using has just 5 lines of (silly) text:

A new beginning
just in case
the file was corrupted
and the darn program was working fine ...
at least it was on linux

Linux 上的输出如预期的那样:

and the output on linux is, as expected:

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   16, contains just in case.
Line    3, starting at character position   29, contains the file was corrupted.
Line    4, starting at character position   52, contains and the darn program was working fine ....
Line    5, starting at character position   94, contains at least it was on linux.

当我导入在 linux 中创建的文本文件时,Windows 中的输出是相同的,但是当我使用记事本并在 Windows 中手动重新创建相同的文件时,输出是

The output in Windows is the same when I import the text file create in linux, but when I use notepad and manually recreate the same file in Windows the ouput is

Line    1, starting at character position    0, contains A new beginning.
Line    2, starting at character position   20, contains t in case.
Line    3, starting at character position   33, contains e file was corrupted.
Line    4, starting at character position   56, contains nd the darn program was working fine ....
Line    5, starting at character position   98, contains at least it was on linux.

注意第 2、3、4 和 5 行起始字符位置的差异注意第 2,3 和 4 行开头缺少的字符

Note the differences in the starting character position for lines 2,3,4 and 5 Note the missing characters at the start of line 2,3 and 4

  • 第 2 行缺少 3 个字符
  • 第 3 行缺少 2 个字符
  • 第 5 行缺少 1 个字符
  • 第 5 行缺少 0 个字符

欢迎任何和所有想法......

Any and all ideas welcome ...

推荐答案

查看解决方案

交叉编译器过时

为了解决这个问题,通过 apt-get install 安装的 mingw 交叉编译器已经过时了.当我手动安装更新的交叉编译器并更新设置以防止出现一些错误消息时,一切正常.

To net it out, the mingw cross-compiler installed via apt-get install was old and out of date. When I manually installed an updated cross-compiler, and updated the settings to prevent some error messages, all worked fine.

这篇关于从文件中读取行时 C++ 切断字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆