转换的字符实体,以他们的UNI code当量 [英] Convert character entities to their unicode equivalents

查看:108
本文介绍了转换的字符实体,以他们的UNI code当量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有HTML连接codeD字符串在数据库中,但许多字符实体的不只是标准的&放大器;放大器; &放大器; LT; 。实体如&放大器; ldquo; &放大器; mdash; 。不幸的是,我们需要养活这些数据转化为一个基于Flash的RSS阅读器和闪存不读这些实体,但他们读的UNI code当量(前&放大器;#8220; )。

I have html encoded strings in a database, but many of the character entities are not just the standard & and <. Entities like “ and —. Unfortunately we need to feed this data into a flash based rss reader and flash doesn't read these entities, but they do read the unicode equivalent (ex “).

使用.NET 4.0,没有任何实用方法将HTML转换连接codeD字符串使用UNI code连接codeD字符实体?

Using .Net 4.0, is there any utility method that will convert the html encoded string to use unicode encoded character entities?

下面是什么,我需要一个更好的例子。该数据库具有HTML字符串,如:< P>约翰&放大器;放大器;莎拉去看$ ldquo;惊声尖叫4 $ rdquo;的< / P> 和我需要的RSS / XML文档中输出,在<描述> 标签是:&放大器; LT; P&放大器; GT;约翰&放大器;放大器;#38;莎拉去看帐单与;#8220;惊声尖叫4放大器;放大器;#8221;&放大器; LT; / P&放大器; GT;

Here is a better example of what I need. The db has html strings like: <p>John &amp; Sarah went to see $ldquo;Scream 4$rdquo;.</p> and what I need to output in the rss/xml document with in the <description> tag is: &lt;p&gt;John &amp;#38; Sarah went to see &amp;#8220;Scream 4&amp;#8221;.&lt;/p&gt;

我使用的XmlTextWriter创建从数据库中记录的XML文档类似这样的例子code <一个href="http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx">http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx

I'm using an XmlTextWriter to create the xml document from the database records similar to this example code http://www.dotnettutorials.com/tutorials/advanced/rss-feed-asp-net-csharp.aspx

所以,我需要更换所有的字符实体从他们的UNI code同等学历分贝HTML字符串中,因为基于Flash的RSS阅读器不能识别超出最常见的任何实体,比如&放大器;放大器;

So I need to replace all of the character entities within the html string from the db with their unicode equivilant because the flash based rss reader doesn't recognize any entities beyond the most common like &amp;.

推荐答案

我首先想到的是,可以在你的RSS阅读器接受的实际字符?如果是这样,你可以使用 HtmlDe code 并直接喂养它。

My first thought is, can your RSS reader accept the actual characters? If so, you can use HtmlDecode and feed it directly in.

如果你需要将其转换为数字再presentations,你可以解析出每一个实体, HtmlDe code ,然后丢到 INT 来获得基10 UNI code值。然后再重新将其插入字符串。

If you do need to convert it to the numeric representations, you could parse out each entity, HtmlDecode it, and then cast it to an int to get the base-10 unicode value. Then re-insert it into the string.

编辑: 下面是一些code以证明我的意思(这是未经测试,但在整个有想法):

Here's some code to demonstrate what I mean (it is untested, but gets the idea across):

string input = "Something with &mdash; or other character entities.";
StringBuilder output = new StringBuilder(input.Length);

for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '&')
    {
        int startOfEntity = i; // just for easier reading
        int endOfEntity = input.IndexOf(';', startOfEntity);
        string entity = input.Substring(startOfEntity, endOfEntity - startOfEntity);
        int unicodeNumber = (int)(HttpUtility.HtmlDecode(entity)[0]);
        output.Append("&#" + unicodeNumber + ";");
        i = endOfEntity; // continue parsing after the end of the entity
    }
    else
        output.Append(input[i]);
}

我可能有一个关情况的一个错误的地方在那里,但它应该是接近。

I may have an off-by-one error somewhere in there, but it should be close.

这篇关于转换的字符实体,以他们的UNI code当量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆