C#HtmlEncode - ISO-8859-1实体名称与数字 [英] C# HtmlEncode - ISO-8859-1 Entity Names vs Numbers

查看:191
本文介绍了C#HtmlEncode - ISO-8859-1实体名称与数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据以下 ISO-8859-1 标准,似乎有一个实体名称和一个实体号相关联每个保留的HTML字符。

According to the following table for the ISO-8859-1 standard, there seems to be an entity name and an entity number associated with each reserved HTML character.

所以例如,对于字符é

实体名称:& eacute;

实体号码:é

同样,对于角色>

实体名称:& gt;

实体数字:>

对于给定的字符串, HttpUtility.HtmlEncode 返回HTML编码的字符串,但我不知道它是如何工作的。这是我的意思:

For a given string, the HttpUtility.HtmlEncode returns an HTML encoded String, but I can't figure out how it works. Here is what I mean :

Console.WriteLine(HtmlEncode("é>"));
//Outputs é>

似乎正在使用é字符,但> 字符的实体名称。

It seems to be using the entity number for the é character but the entity name for the > character.

HtmlEncode方法真的适用于ISO-8859-1标准?如果是,有没有理由为什么有时使用实体名称和其他时间的实体号码?更重要的是,我可以强制它给我实体名称可靠吗?

So does the HtmlEncode method really work with the ISO-8859-1 standard? If it does, is there a reason why it sometimes uses the entity name and other times the entity number? More importantly, can I force it to give me the entity name reliably?

编辑:
感谢你的答案。在执行搜索之前,我无法解码字符串。没有进入太多的细节,文本存储在SharePoint列表中,搜索​​是由SharePoint自己完成的(使用CAML查询)。所以基本上,我不能。

EDIT : Thanks for the answers guys. I cannot decode the string before I perform the search though. Without getting into too many details, the text is stored in a SharePoint List and the "search" is done by SharePoint itself (using a CAML query). So basically, I can't.

我试图想到一种将实体数字转换为名称的方法,.NET中是否有一个功能?还是其他想法?

I'm trying to think of a way to convert the entity numbers into names, is there a function in .NET that does that? Or any other idea?

推荐答案

这是怎么实现的方法。对于一些已知的字符,它使用相应的实体,对于其他所有的字符,它使用相应的十六进制值,并且没有太多的可以修改此行为。摘自执行 System.Net.WebUtility.HtmlEncode (如反光镜所示):

That's how the method has been implemented. For some known characters it uses the corresponding entity and for everything else it uses the corresponding hex value and there is not much you could do to modify this behavior. Excerpt from the implementation of System.Net.WebUtility.HtmlEncode (as seen with reflector):

...
if (ch <= '>')
{
    switch (ch)
    {
        case '&':
        {
            output.Write("&amp;");
            continue;
        }
        case '\'':
        {
            output.Write("&#39;");
            continue;
        }
        case '"':
        {
            output.Write("&quot;");
            continue;
        }
        case '<':
        {
            output.Write("&lt;");
            continue;
        }
        case '>':
        {
            output.Write("&gt;");
            continue;
        }
    }
    output.Write(ch);
    continue;
}
if ((ch >= '\x00a0') && (ch < 'Ā'))
{
    output.Write("&#");
    output.Write(((int) ch).ToString(NumberFormatInfo.InvariantInfo));
    output.Write(';');
}
...

这就是说你不应该在乎,因为这种方法将始终产生有效的,安全且正确编码的HTML。

This being said you shouldn't care as this method will always produce valid, safe and correctly encoded HTML.

这篇关于C#HtmlEncode - ISO-8859-1实体名称与数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆