在 html 文件中将阿拉伯数字转换为阿拉伯/波斯数字 [英] Converting Arabic numerals to Arabic/Persian numbers in html file

查看:49
本文介绍了在 html 文件中将阿拉伯数字转换为阿拉伯/波斯数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将纯文本阿拉伯数字转换为东阿拉伯数字.所以基本上把 1 2 3... 转换成 ١ ٢ ٣ ....该函数转换所有数字,包括标签中包含的任何数字,即H1.

I am trying to convert the plain text Arabic Numerals into Eastern Arabic digits. So basically taking 1 2 3... and converting them into ١‎ ٢‎ ٣‎.... The function converts all numbers, including any numbers contained within tags, i.e. H1.

 private void LoadHtmlFile(object sender, EventArgs e)
        {
            var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>".ToArabicNumber(); ;
            webBrowser1.DocumentText=htmlfile;
        }


    }
    public static class StringHelper
    {
        public static string ToArabicNumber(this string str)
        {
            if (string.IsNullOrEmpty(str)) return "";
            char[] chars;
            chars = str.ToCharArray();
            for (int i = 0; i < str.Length; i++)
            {
                if (str[i] >= '0' && str[i] <= '9')
                {
                    chars[i] += (char)1728;
                }
            }
            return new string(chars);
        }
    }

我也尝试过只针对 InnerText 中的数字,但它也没有奏效.下面的代码也会更改标签编号.

I also tried targeting only numbers in InnerText, but it also did not work. The code below changes tag numbers as well.

private void LoadHtmlFile(object sender, EventArgs e)
        {
            var htmlfile = "<html><body><h1>i was born in 1988</h1></body></html>" ;
            webBrowser1.DocumentText=htmlfile;
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Document.Body.InnerText = webBrowser1.Document.Body.InnerText.ToArabicNumber();
        }

有什么建议吗?

推荐答案

您可以使用正则表达式来查找 HTML 中>"之间的部分和 '<'字符,并对它们进行操作.这将阻止代码处理标签名称和属性(样式等).

You can use a regular expression to find the parts of the HTML that are between '>' and '<' characters, and operate on those. This will prevent the code from processing the tag names and attributes (style, etc).

// Convert all English digits in a string to Arabic digit equivalents
public static string ToArabicNums(string src)
{
    const string digits = "۰۱۲۳۴۵۶۷۸۹";
    return string.Join("", 
        src.Select(c => c >= '0' && c <= '9' ? digits[((int)c - (int)'0')] : c)
    );
}

// Convert all English digits in the text segments of an HTML 
// document to Arabic digit equivalents
public static string ToArabicNumsHtml(string src)
{
    string res = src;

    Regex re = new Regex(@">(.*?)<");

    // get Regex matches 
    MatchCollection matches = re.Matches(res);

    // process in reverse in case transformation function returns 
    // a string of a different length
    for (int i = matches.Count - 1; i >= 0; --i)
    {
        Match nxt = matches[i];
        if (nxt.Groups.Count == 2 && nxt.Groups[1].Length > 0)
        {
            Group g = nxt.Groups[1];
            res = res.Substring(0, g.Index) + ToArabicNums(g.Value) +
                res.Substring(g.Index + g.Length);
    }

    return res;
}

这并不完美,因为它根本不检查标签之外的 HTML 字符说明符,例如结构 &#<digits>; (&;#1777; for ۱, etc)通过Unicode值指定一个字符,并将替换这些中的数字.它也不会处理第一个标签之前或最后一个标签之后的任何额外文本.

This isn't perfect, since it doesn't check at all for HTML character specifiers outside of the tags, such as the construct &#<digits>; (&#1777; for ۱, etc)to specify a character by Unicode value, and will replace the digits in these. It also won't process any extra text before the first tag or after the last tag.

示例:

Calling: ToArabicNumsHtml("<html><body><h1>I was born in 1988</h1></body></html>")
Result: "<html><body><h1>I was born in ۱۹۸۸</h1></body></html>"

ToArabicNums 中使用您喜欢的任何代码来进行实际转换,或通过传入转换函数对其进行概括.

Use whatever code you prefer in ToArabicNums to do the actual transformation, or generalize it by passing in a transformation function.

这篇关于在 html 文件中将阿拉伯数字转换为阿拉伯/波斯数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆