UTF8文件开头字符破坏了序列化程序&读者群 [英] UTF8 Beginning of File characters are breaking serializer & readers

查看:82
本文介绍了UTF8文件开头字符破坏了序列化程序&读者群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我正在尝试使用UTF8文本文件.我一直在与写作者为UTF8投入的BOF字符作斗争,这几乎炸毁了我需要用来读取文件的所有内容,包括序列化程序和其他文本阅读器.

Okay, I'm trying to work with UTF8 text files. I'm constantly fighting the BOF chars that the writer drops in for UTF8, which blows up pretty much anything I need to use to read the file including serializers and other text readers.

我正在获取前六个字节的数据:

I'm getting a leading six bytes of data:

0xEF
0xBB
0xBF
0xEF
0xBB
0xBF

(现在我正在看它,我意识到那里有两个字符.那是UTF8 BOF标记吗?我是否对其进行了双重编码)?

(now that I'm looking at it, I realize there's two characters there. Is that the UTF8 BOF marker? Am I double encoding it)?

请注意,序列化程序编码为UTF8,然后内存流将字符串作为UTF8,然后使用UTF8将字符串写入文件...似乎有很多冗余.有什么想法吗?

Notice the serializer encodes to UTF8, then the memory stream gets a string as UTF8, then I write the string to the file with UTF8... seems like a lot of redundancy. Thoughts?

//I'm storing this xml result to a database field. (this one includes the BOF chars)
using (MemoryStream ms = new MemoryStream())
{
    Utility.SerializeXml(ms, root);
    xml = Encoding.UTF8.GetString(ms.ToArray());

}


//later on, I would take that xml and then write it out to a file like this: 
File.WriteAllText(path, xml, Encoding.UTF8);



public static void SerializeXml(Stream output, object data)
{
    XmlSerializer xs = new XmlSerializer(data.GetType());
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.IndentChars = "\t";
    settings.Encoding = Encoding.UTF8;
    XmlWriter writer = XmlTextWriter.Create(output, settings);
    xs.Serialize(writer, data);
    writer.Flush();
    writer.Close();
}

推荐答案

是的,这是两个BOM.您要编码两次UTF-8,并且每次都添加一个伪BOM,这是由于非常不幸的事实:

Yeah, that's two BOMs. You're encoding to UTF-8 twice and each time adds a pseudo-BOM, due to the extremely unfortunate fact that:

Encoding.UTF8

表示"UTF-8的前端无意义,毫无意义的U + FEFF固定了您的应用程序".尝试使用

means "UTF-8 with a pointless, meaningless U+FEFF stuck to the front to screw up your applications". Try instead using

new UTF8Encoding(false)

应该为您提供一个不太麻烦的版本.

这篇关于UTF8文件开头字符破坏了序列化程序&读者群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆