检测文本是否为英文(批量) [英] Detecting whether or not text is English (in bulk)

查看:45
本文介绍了检测文本是否为英文(批量)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种简单的方法来检测一段简短的文本摘录,几句话,是否是英语.在我看来,这个问题比尝试检测任意语言要容易得多.有没有什么软件可以做到这一点?我正在用 python 编写,并且更喜欢 python 库,但其他东西也可以.我试过谷歌,但后来意识到 TOS 不允许自动查询.

I'm looking for a simple way to detect whether a short excerpt of text, a few sentences, is English or not. Seems to me that this problem is much easier than trying to detect an arbitrary language. Is there any software out there that can do this? I'm writing in python, and would prefer a python library, but something else would be fine too. I've tried google, but then realized the TOS didn't allow automated queries.

推荐答案

我阅读了一种使用 三元组

您可以查看文本,并尝试检测单词中最常用的三元组.如果最常用的与最常用的英文单词相匹配,则文本可以用英文书写

You can go over the text, and try to detect the most used trigrams in the words. If the most used ones match with the most used among english words, the text may be written in English

尝试查看这个 ruby​​ 项目:

Try to look in this ruby project:

https://github.com/feedbackmine/language_detector

这篇关于检测文本是否为英文(批量)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆