在没有BOM的情况下使用c#utf-8编码文本 [英] Encode text in c# utf-8 without BOM

查看：112 发布时间：2021/5/4 19:17:25 c# encoding utf-8 byte-order-mark

本文介绍了在没有BOM的情况下使用c#utf-8编码文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试了但没有起作用，我想在没有BOM的情况下进行编码，但选项false仍在带有BOM的utf-8中进行编码.

这是我的代码

  System.Text.Encoding outputEnc =新的System.Text.UTF8Encoding(false);返回File(outputEnc.GetBytes(<?xml version = \" 1.0 \"encoding = \" utf-8 \?>" + xmlString)，"application/xml"，id);

解决方案

这个问题已有两年多了，但是我找到了答案.之所以在输出中看到BOM，是因为您的输入中有BOM..在XML声明的开始似乎是一个空格，实际上是一个BOM表，后跟一个空格.为了证明这一点，请从XML编码中选择文本< (双引号，其后的空格以及< 字符)并粘贴可以将其输入到可以告诉您Unicode代码点的任何工具中，例如，将该文本粘贴到 http://www.babelstone.co.uk/Unicode/whatisit.html 给了我以下结果:

  U + 0022:引号U + FEFF:零宽度无间断空格[ZWNBSP](别名BYTE ORDER MARK [BOM])U + 0020:空格[SP]U + 003C:小于符号

您还可以复制并粘贴我在此答案中输入的< :我从您的问题中复制了这些字符，因此它们在空格字符之前包含不可见的BOM./p>

这就是为什么我经常将BOM表称为BOM(b)的原因-因为它静默地坐在那里，隐藏着，等到您最不希望它爆炸时才会爆炸.您正在正确使用 System.Text.UTF8Encoding(false).它没有添加BOM，但是您复制并粘贴XML的源中包含BOM，因此无论如何您都会在输出中得到一个，因为您的输入中只有一个.

个人建议::将BOM表保留在UTF-8编码的文本之外是一个很好的主意.但是，如果其中一些破损的工具(Microsoft，我在找您，因为您是其中最多的人)，它们将误解文本(如果其中不包含BOM表)，因此请向其中添加BOM表有时需要使用UTF-8编码的文本.但实际上应尽可能避免.UTF-8现在是Internet上的事实默认编码，因此，任何编码未知的文本文件都应解析为UTF-8 first ，回退为旧版"仅当将文档解析为UTF-8失败时才使用Windows-1252，Latin-1等编码.

I tried but didn't function, I want to encode without BOM but with the option false still encoding in utf-8 with BOM.

Here is my code

System.Text.Encoding outputEnc = new System.Text.UTF8Encoding(false);
                return File(outputEnc.GetBytes(" <?xml version=\"1.0\" encoding=\"utf-8\"?>" + xmlString), "application/xml", id);

解决方案

This question is more than two years old, but I've found the answer. The reason you were seeing a BOM in the output is because there's a BOM in your input. What appears to be a space at the start of your XML declaration is actually a BOM followed by a space. To prove it, select the text " < from your XML encoding (the opening double-quote, the space following it, and the opening < character) and paste that into any tool that tells you Unicode codepoints. For example, pasting that text into http://www.babelstone.co.uk/Unicode/whatisit.html gave me the following result:

U+0022 : QUOTATION MARK
U+FEFF : ZERO WIDTH NO-BREAK SPACE [ZWNBSP] (alias BYTE ORDER MARK [BOM])
U+0020 : SPACE [SP]
U+003C : LESS-THAN SIGN

You can also copy and paste from the " < that I put in this answer: I copied those characters from your question, so they contain the invisible BOM immediately before the space character.

This is why I often refer to the BOM as a BOM(b) -- because it sits there silently, hidden, waiting to blow up on you when you least expect it. You were using System.Text.UTF8Encoding(false) correctly. It didn't add a BOM, but the source that you copied and pasted your XML from contained a BOM, so you got one in your output anyway because you had one in your input.

Personal rant: It's a very good idea to leave BOMs out of your UTF-8 encoded text. However, some broken tools (Microsoft, I'm looking at you since you're the ones who made most of them) will misinterpret text if it doesn't contain a BOM, so adding a BOM to UTF-8 encoded text is sometimes necessary. But it should really be avoided as much as possible. UTF-8 is now the de facto default encoding for the Internet, so any text file whose encoding is unknown should be parsed as UTF-8 first, falling back to "legacy" encodings like Windows-1252, Latin-1, etc. only if parsing the document as UTF-8 fails.

这篇关于在没有BOM的情况下使用c#utf-8编码文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在没有BOM的情况下使用c#utf-8编码文本 [英] Encode text in c# utf-8 without BOM

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

在没有BOM的情况下使用c#utf-8编码文本 [英] Encode text in c# utf-8 without BOM

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭