确定句子是英语的概率的相对简单方法是什么? [英] What is a relatively simple way to determine the probability that a sentence is in English?

查看:68
本文介绍了确定句子是英语的概率的相对简单方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多代表不同语言的句子的字符串(字符集合),说:

I have a number of strings (collections of characters) that represent sentences in different languages, say:

你好,我叫乔治.

Hello, my name is George.

Das brot ist肠道.

Das brot ist gut.

...等等

我想给它们中的每一个分配分数(从0 .. 1开始),以表明它们是英语句子的可能性.是否有可以从中执行此操作的算法(或Python库)?

I want to assign each of them scores (from 0 .. 1) indicating the likelihood that they are English sentences. Is there an accepted algorithm (or Python library) from which to do this?

注意:我不在乎英语句子的语法是否完美.

Note: I don't care if the grammar of the English sentence is perfect.

推荐答案

贝叶斯分类器会是完成此任务的好选择:

A bayesian classifier would be a good choice for this task:

>>> from reverend.thomas import Bayes
>>> g = Bayes()    # guesser
>>> g.train('french','La souris est rentrée dans son trou.')
>>> g.train('english','my tailor is rich.')
>>> g.train('french','Je ne sais pas si je viendrai demain.')
>>> g.train('english','I do not plan to update my website soon.')

>>> print g.guess('Jumping out of cliffs it not a good idea.')
[('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)]

>>> print g.guess('Demain il fera très probablement chaud.')
[('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)]

这篇关于确定句子是英语的概率的相对简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆