如何在C#中查找字符串语言 [英] How Find The String Language in C#

查看:76
本文介绍了如何在C#中查找字符串语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在单独的程序中找到字符串的语言

i不要使用bing翻译或谷歌翻译。

谢谢。



其实我试过这个但是我不能为所有语言都这样做,我可以吗?



I need to find the languages of an string in my stand alone program
i don''t want use bing translate or google translate .
thanks.

in fact i tried this but i cannot do it for all languages, can i?

  public string FindLang(string text)
        {
            string result = "";
            if (text.Any(c => c >= 0xFB50 && c <= 0xFEFC))
            {
                result += "Arabic";
            }
            if (text.Any(c => c >= 0x0600 && c <= 0x06FF))
            {
                result += "Persian";
            }
            if (text.Any(c => c >= 0x20 && c <= 0x7E))
            {
                result += "English";
            }
            if (text.Any(c => c >= 0x0530 && c <= 0x058F))
            {
                result += "Armenian";
             }
            if (text.Any(c => c >= 0x2000 && c <= 0xFA2D))
            {
                result += "Chinese";
            }
return result;

推荐答案

如果您的字符串足够大,您可以尝试一种启发式方法:为应用程序支持的所有语言构建一个包含最常用术语的字典,然后在给定字符串上找到最佳匹配。
If your string is big enough you may attempt a heuristic approach: build a dictionary with most frequent terms for all the languages your application supports and then find ''the best match'' on the give string.


首先,只需在文本中包含unicode图表中的某些特定字符,并不意味着我们已经识别出该语言。这可能是一个额外的评估,但不能是唯一的评估。有一件事是估计文本是否来自一个简短的语言列表(让我们说,最多10个) - 另一件事是从任何字符串中判断它是否来自任何语言。字数越低,语言传播越宽,准确度就越低。并且考虑到语言不是单词集,也不是字符集的观点。估计数量在大约6,000到7,000种语言之间(维基百科 [ ^ ]。



我明白,你想要一个解决方案,但你指定的是一个非常困难的信号处理任务。如果您仔细阅读本文(检测书面文字的语言 [ ^ ]),你会得到一个很好的起点(你甚至可以直接使用它),但你也会通知,你指定的东西很可能是不可能的 - 我不知道你到底需要什么,只有你写的



祝你好运。
First of all, just by having in the text some specific characters from unicode chart, does not mean, that we have identified the language. This might be an extra evaluation, but can not be the only one. And one thing is to estimate if the text is from a short list of languages (let''s say, up to 10) - and an other thing is to tell from any string if it is from any language. As lower the word count and as wider the language spread is, the less accuracy you will have. And take into account that the languages are not a disjoint sets neither from word set, nor from character set point of view. Estimates vary between around 6,000 and 7,000 languages in number (Wikipedia[^]).

I understand, that you want a solution, but what you specified is a really hard signal processing task. If you read this article carefully (Detect a written text''s language[^]), you will get a good starting point (you can even use it directly), but be you will also notify, that what you specified is most likely not possible - and I don''t know what you exactly need, only what you wrote.

Good luck.


var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
  if (!result.error) {
    var language = 'unknown';
    for (l in google.language.Languages) {
      if (google.language.Languages[l] == result.language) {
        language = l;
        break;
      }
    }
    var container = document.getElementById("detection");
    container.innerHTML = text + " is: " + language + "";
  }
});



参考:stackoverflow.com [ ^ ]


这篇关于如何在C#中查找字符串语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆