如何将十六进制实体转换为其等效的Utf-8实体 [英] How Do I Convert Hex Entity To Its Equivalent Utf-8 Entities
问题描述
如果有人给我转换
HEX 到其等效的 UTF-8 的解决方案,那将是非常值得注意的。
例如:
我需要将'转换为â€,这里是HEX实体†™这是单一报价的实体。
等待您的宝贵回复.... 。!!!!
如果这是Unicode代码点,您可以使用通常的整数表示形式的整数值与UTF32-LE相同的事实给定角色的表示。因此,您可以使用此功能: https:// msdn.microsoft.com/en-us/library/system.char.convertfromutf32%28v=vs.110%29.aspx 。
有趣的是,这不是一个字符,而是一个字符串。这与.NET中字符串的内存中表示的特性有关:在内部使用UTF-16,而BMP之外的字符表示为代理对。正式地,.NET表示处理代理对与两个字符一样(如果你试图手动操作这样的字符串,可能会导致形成无效字符串 - 从不这样做),尽管从Unicode的角度来看,这个字符串片段由4个字节表示的实际上是一个字符。你需要小心这种情况,永远不要在字节基础上运行这样的字符串。
获得字符串后(让我们称之为someString
,你可以使用它或表示为UTF-8,它总是一些字节数组:
int codePoint = // ...
string someString = char .ConvertFromUtf32(codePoint);
byte [] utf8 = System.Text.Encoding.UTF8.GetBytes(someString);请参阅:
https://msdn.microsoft.com/en-us/library/system.text.encoding.utf8 (v = vs.110).aspx ,
https://msdn.microsoft.com/en-us/library/ds4kkd55(v = vs.110).aspx ,
https://msdn.microsoft.com/en- us / library / system.text.encoding%28v = vs.110%29.aspx 。
-SA
It would be very much appreciable if someone give me solution for Converting
"HEX" to its equivalent "UTF-8".
For example :
I need to convert "’" into "’", here ’ is HEX Entity of "’" which is Entity of Single Quote.
Waiting for your valuable response .....!!!!
If this is Unicode code point, you can use the fact that its integer value using the usual integer representation is the same as UTF32-LE representation of the given character. So, you can use this function: https://msdn.microsoft.com/en-us/library/system.char.convertfromutf32%28v=vs.110%29.aspx.
Interestingly, this is not a character, but a string. This is related to the peculiarities of in-memory representation of strings in .NET: internally, UTF-16 is used, and the characters beyond BMP are represented as surrogate pairs. Formally, .NET representation deals with the surrogate pair as with two characters (which can lead to forming invalid string if you try to manipulate such string "manually" — never do it), even though, from the Unicode standpoint, this fragment of string represented by 4 bytes is really one character. You need to be careful with such cases, never operate such string on byte basics.
After you got the string (let's call itsomeString
, you can use it or represent as UTF-8, which is always some array of bytes:
int codePoint = //... string someString = char.ConvertFromUtf32(codePoint); byte[] utf8 = System.Text.Encoding.UTF8.GetBytes(someString);Please see:
https://msdn.microsoft.com/en-us/library/system.text.encoding.utf8(v=vs.110).aspx,
https://msdn.microsoft.com/en-us/library/ds4kkd55(v=vs.110).aspx,
https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx.
—SA
这篇关于如何将十六进制实体转换为其等效的Utf-8实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!