地带字节顺序标记在C#中的字符串 [英] Strip Byte Order Mark from string in C#

查看:122
本文介绍了地带字节顺序标记在C#中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读过这个相近岗位,他们不回答我的问题。

I've read similar posts on this and they don't answer my question.

在C#中,我有我从WebClient.DownloadString获得的字符串。我试过设置client.Encoding新的UTF8Encoding(假),但是这并没有区别 - 我还是结束了一个字节顺序标记为UTF-8的结果字符串的开头。我需要删除此(来解析与LINQ生成的XML),并希望在内存中这样做。

In C#, I have a string that I'm obtaining from WebClient.DownloadString. I've tried setting client.Encoding to new UTF8Encoding(false), but that's made no difference - I still end up with a byte order mark for UTF-8 at the beginning of the result string. I need to remove this (to parse the resulting XML with LINQ), and want to do so in memory.

所以我必须与\\ x00EF \\ x00BB \\ x00BF开头的字符串,我想删除,如果它的存在。现在,我使用

So I have a string that starts with \x00EF\x00BB\x00BF, and I want to remove that if it exists. Right now I'm using

if (xml.StartsWith(ByteOrderMarkUtf8))
{
    xml = xml.Remove(0, ByteOrderMarkUtf8.Length);
}

但只是觉得不妥。我已经试过各种code与流,GetBytes会和编码,并没有什么作品。任何人都可以提供正确的算法从一个字符串中去除BOM的?

but that just feels wrong. I've tried all sorts of code with streams, GetBytes, and encodings, and nothing works. Can anyone provide the "right" algorithm to strip a BOM from a string?

感谢您!

推荐答案

如果变量XML字符串类型的,你做已经有点问题 - 在一个字符串,该BOM不宜再presented三个独立的字符,但作为一个单一的code点。而不是使用DownloadString,请使用DownloadData,并解析字节数组来代替。 XML解析器应该认识到BOM本身,并跳过它(除自动检测文档编码为UTF-8)。

If the variable xml is of type string, you did something wrong already - in a character string, the BOM should not be represented as three separate characters, but as a single code point. Instead of using DownloadString, use DownloadData, and parse byte arrays instead. The XML parser should recognize the BOM itself, and skip it (except for auto-detecting the document encoding as UTF-8).

这篇关于地带字节顺序标记在C#中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆