在没有BOM的情况下使用c#utf-8编码文本 [英] Encode text in c# utf-8 without BOM

查看:112
本文介绍了在没有BOM的情况下使用c#utf-8编码文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了但没有起作用,我想在没有BOM的情况下进行编码,但选项false仍在带有BOM的utf-8中进行编码.

这是我的代码

  System.Text.Encoding outputEnc =新的System.Text.UTF8Encoding(false);返回File(outputEnc.GetBytes(<?xml version = \" 1.0 \"encoding = \" utf-8 \?>" + xmlString),"application/xml",id); 

解决方案

这个问题已有两年多了,但是我找到了答案.之所以在输出中看到BOM,是因为您的输入中有BOM..在XML声明的开始似乎是一个空格,实际上是一个BOM表,后跟一个空格.为了证明这一点,请从XML编码中选择文本< (双引号,其后的空格以及< 字符)并粘贴可以将其输入到可以告诉您Unicode代码点的任何工具中,例如,将该文本粘贴到 http://www.babelstone.co.uk/Unicode/whatisit.html 给了我以下结果:

  U + 0022:引号U + FEFF:零宽度无间断空格[ZWNBSP](别名BYTE ORDER MARK [BOM])U + 0020:空格[SP]U + 003C:小于符号 

您还可以复制并粘贴我在此答案中输入的< :我从您的问题中复制了这些字符,因此它们在空格字符之前包含不可见的BOM./p>

这就是为什么我经常将BOM表称为BOM(b)的原因-因为它静默地坐在那里,隐藏着,等到您最不希望它爆炸时才会爆炸.您正在正确使用 System.Text.UTF8Encoding(false).它没有添加BOM,但是您复制并粘贴XML的源中包含BOM,因此无论如何您都会在输出中得到一个,因为您的输入中只有一个.

个人建议::将BOM表保留在UTF-8编码的文本之外是一个很好的主意.但是,如果其中一些破损的工具(Microsoft,我在找您,因为您是其中最多的人),它们将误解文本(如果其中不包含BOM表),因此请向其中添加BOM表有时需要 使用UTF-8编码的文本.但实际上应尽可能避免.UTF-8现在是Internet上的事实默认编码,因此,任何编码未知的文本文件都应解析为UTF-8 first ,回退为旧版"仅当将文档解析为UTF-8失败时才使用Windows-1252,Latin-1等编码.

I tried but didn't function, I want to encode without BOM but with the option false still encoding in utf-8 with BOM.

Here is my code

System.Text.Encoding outputEnc = new System.Text.UTF8Encoding(false);
                return File(outputEnc.GetBytes(" <?xml version=\"1.0\" encoding=\"utf-8\"?>" + xmlString), "application/xml", id);

解决方案

This question is more than two years old, but I've found the answer. The reason you were seeing a BOM in the output is because there's a BOM in your input. What appears to be a space at the start of your XML declaration is actually a BOM followed by a space. To prove it, select the text " < from your XML encoding (the opening double-quote, the space following it, and the opening < character) and paste that into any tool that tells you Unicode codepoints. For example, pasting that text into http://www.babelstone.co.uk/Unicode/whatisit.html gave me the following result:

U+0022 : QUOTATION MARK
U+FEFF : ZERO WIDTH NO-BREAK SPACE [ZWNBSP] (alias BYTE ORDER MARK [BOM])
U+0020 : SPACE [SP]
U+003C : LESS-THAN SIGN

You can also copy and paste from the " < that I put in this answer: I copied those characters from your question, so they contain the invisible BOM immediately before the space character.

This is why I often refer to the BOM as a BOM(b) -- because it sits there silently, hidden, waiting to blow up on you when you least expect it. You were using System.Text.UTF8Encoding(false) correctly. It didn't add a BOM, but the source that you copied and pasted your XML from contained a BOM, so you got one in your output anyway because you had one in your input.

Personal rant: It's a very good idea to leave BOMs out of your UTF-8 encoded text. However, some broken tools (Microsoft, I'm looking at you since you're the ones who made most of them) will misinterpret text if it doesn't contain a BOM, so adding a BOM to UTF-8 encoded text is sometimes necessary. But it should really be avoided as much as possible. UTF-8 is now the de facto default encoding for the Internet, so any text file whose encoding is unknown should be parsed as UTF-8 first, falling back to "legacy" encodings like Windows-1252, Latin-1, etc. only if parsing the document as UTF-8 fails.

这篇关于在没有BOM的情况下使用c#utf-8编码文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆