Ruby文字分析 [英] Ruby Text Analysis

查看:77
本文介绍了Ruby文字分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何Ruby gem或其他文本分析工具?词频,模式检测等(最好是法语的理解)

Is there any Ruby gem or else for text analysis? Word frequency, pattern detection and so forth (preferably with an understanding of french)

推荐答案

单词频率的概括是语言模型,例如单字组(=单个单词的频率),双字组(=单词对的频率),三字组(=世界三元组的频率),...,通常:n-grams

the generalization of word frequencies are Language Models, e.g. uni-grams (= single word frequency), bi-grams (= frequency of word pairs), tri-grams (=frequency of world triples), ..., in general: n-grams

您应该寻找现有的语言模型工具包-在这里重新发明轮子不是一个好主意.

You should look for an existing toolkit for Language Models — not a good idea to re-invent the wheel here.

有一些标准工具包,例如来自CMU Sphinx团队和HTK.

There are a few standard toolkits available, e.g. from the CMU Sphinx team, and also HTK.

这些工具包通常是用C编写的(为了提高速度,因为您必须处理巨大的语料库),并生成标准输出格式的ARPA n-gram文件(通常是文本格式)

These toolkits are typically written in C (for speed!! because you have to process huge corpora) and generate standard output format ARPA n-gram files (those are typically a text format)

检查以下线程,该线程包含更多详细信息和链接:

Check the following thread, which contains more details and links:

构建与Openears兼容的语言模型

使用其中一个工具包生成语言模型后,您将需要一个Ruby Gem(使该语言模型可以在Ruby中访问),或者需要将ARPA格式转换为自己的格式.

Once you generated your Language Model with one of these toolkits, you will need either a Ruby Gem which makes the language model accessible in Ruby, or you need to convert the ARPA format into your own format.

adi92的帖子列出了更多的Ruby NLP资源.

adi92's post lists some more Ruby NLP resources.

您还可以通过Google搜索"ARPA语言模型"了解更多信息

You can also Google for "ARPA Language Model" for more info

至少要检查 Google的在线N-gram工具.他们根据数字化的书籍构建了n-gram,也提供法语和其他语言版本!

Last not least check Google's online N-gram tool. They built n-grams based on the books they digitized — also available in French and other languages!

这篇关于Ruby文字分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆