在C#中,如何复制具有任意编码的文件,逐行阅读,而无需添加或删除换行符 [英] In C#, How can I copy a file with arbitrary encoding, reading line by line, without adding or deleting a newline

查看:167
本文介绍了在C#中,如何复制具有任意编码的文件,逐行阅读,而无需添加或删除换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要能够获取未知编码的文本文件(例如,UTF-8,UTF-16,...),并逐行复制,随着我的具体变化。在这个例子中,我正在更改编码,但是还有其他用途用于这种处理。

I need to be able to take a text file with unknown encoding (e.g., UTF-8, UTF-16, ...) and copy it line by line, making specific changes as I go. In this example, I am changing the encoding, however there are other uses for this kind of processing.

我无法弄清楚如何确定最后一行是否有换行符!一些程序关心文件与这些记录的区别:

What I can't figure out is how to determine if the last line has a newline! Some programs care about the difference between a file with these records:

Rec1<newline>
Rec2<newline>

还有一个这样的文件:

Rec1<newline>
Rec2

如何告诉我的代码差异,以便我可以采取适当的措施(StreamReader reader = new StreamReader(sourcePath))
使用(StreamWriter writer = new StreamWriter(destinationPath,...),

How can I tell the difference in my code so that I can take appropriate action?

using (StreamReader reader = new StreamReader(sourcePath))
using (StreamWriter writer = new StreamWriter(destinationPath, false, outputEncoding))
{
    bool isFirstLine = true;

    while (!reader.EndOfStream)
    {
        string line = reader.ReadLine();

        if (isFirstLine)
        {
            writer.Write(line);
            isFirstLine = false;
        }
        else
        {
            writer.Write("\r\n" + line);
        }
    }


    //if (LastLineHasNewline)
    //{
    //  writer.Write("\n");
    //}

    writer.Flush();
}

注释掉的代码是我想要做的,但我无法弄清楚如何设置条件lastInputLineHadNewline!记住,我没有对输入文件编码的先验知识。

The commented out code is what I want to be able to do, but I can't figure out how to set the condition lastInputLineHadNewline! Remember, I have no a priori knowledge of the input file encoding.

推荐答案


记住,没有对输入文件编码的先验知识。

Remember, I have no a priori knowledge of the input file encoding.

这是解决的根本问题。

如果文件可以使用任何编码,那么没有读取逐行的概念,因为您不可能知道行结尾是什么。

If the file could be using any encoding, then there is no concept of reading "line by line" as you can't possibly tell what the line ending is.

我建议你先解决这个问题,其余的将很容易。现在,在不知道上下文的情况下,很难说这是否意味着您应该向用户询问编码,或是启发式地检测它,还是其他的东西 - 但是我不会开始尝试使用数据之前您可以完全了解。

I suggest you first address this part, and the rest will be easy. Now, without knowing the context it's hard to say whether that means you should be asking the user for the encoding, or detecting it heuristically, or something else - but I wouldn't start trying to use the data before you can fully understand it.

这篇关于在C#中,如何复制具有任意编码的文件,逐行阅读,而无需添加或删除换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆