语言如何检测工作? [英] How does language detection work?

查看:119
本文介绍了语言如何检测工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在琢磨了一段时间如何谷歌翻译(或者是一个假设的翻译)从从字段中输入的字符串检测语言。我一直在想这件事,我唯一能想到的是找词所特有的输入字符串的语言。另一种方法可以检查句子形成或其他语义除了关键字。但是,这似乎是一个非常艰巨的任务考虑到不同的语言和它们的语义。我做了一些研究,发现有使用的N-gram序列,并使用一些统计模型来检测语言方式。请问AP preciate高水平的答案了。

I have been wondering for some time how does Google translate(or maybe a hypothetical translator) detect language from the string entered in the "from" field. I have been thinking about this and only thing I can think of is looking for words that are unique to a language in the input string. The other way could be to check sentence formation or other semantics in addition to keywords. But this seems to be a very difficult task considering different languages and their semantics. I did some research to find that there are ways that use n-gram sequences and use some statistical models to detect language. Would appreciate a high level answer too.

推荐答案

您不必做文字的深入分析有什么语言它的想法。统计数据告诉我们,每一种语言都有特定的字符模式和频率。这是一个pretty的良好的一阶近似。它变得更糟时,该文本是在多国语言,但它仍然不是一件极其复杂的。 当然,如果文本太短(如一个字,更糟的是,一个简单的词),统计不工作,你需要一本字典。

You don't have to do deep analysis of text to have an idea of what language it's in. Statistics tells us that every language has specific character patterns and frequencies. That's a pretty good first-order approximation. It gets worse when the text is in multiple languages, but still it's not something extremely complex. Of course, if the text is too short (e.g. a single word, worse, a single short word), statistics doesn't work, you need a dictionary.

这篇关于语言如何检测工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆