分词统计方法 [英] Word splitting statistical approach

查看：77 发布时间：2020/5/18 0:48:01 algorithm nlp text-segmentation

本文介绍了分词统计方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想解决分词问题(解析长字符串中没有空格的单词). 例如，我们希望从somelongword到[some, long, word]中提取单词.

I want to solve word splitting problem (parse words from long string with no spaces). For examle we want extract words from somelongword to [some, long, word].

我们可以使用字典的某种动态方法来实现这一目标，但是我们遇到的另一个问题是解析歧义. IE. orcore => or core或orc ore(我们不考虑词组含义或词性).所以我考虑使用某种统计或机器学习方法.

We can achieve this by some dynamic approach with dictionary, but another issue we encounter is parsing ambiguity. I.e. orcore => or core or orc ore (We don't take into account phrase meaning or part of speech). So i think about usage of some statistical or ML approach.

我发现带有训练集的朴素贝叶斯和维特比算法可用于解决此问题.您能为我指出一些有关这些算法在分词问题中的应用信息吗?

I found that Naive Bayes and Viterbi algorithm with train set can be used for solving this. Can you point me some information about application of these algorithms to word splitting problem?

UPD:我已经根据Peter Norvig的代码的一些建议，在Clojure上实现了此方法.

UPD: I've implemented this method on Clojure, using some advices from Peter Norvig's code

分词统计方法 [英] Word splitting statistical approach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

分词统计方法 [英] Word splitting statistical approach

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭