任何人都知道使用动态规划的词分割的示例算法? [英] Anyone know an example algorithm for word segmentation using dynamic programming?

查看:140
本文介绍了任何人都知道使用动态规划的词分割的示例算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您搜索google进行分词,实际上并没有很好的描述,我只是想充分了解一个动态编程算法在查找字符串到单个单词中的分割过程。有谁知道有一个很好的描述一个单词分割问题的地方,还是任何人都可以描述的?



Word Segmentation基本上只是取一串字符并决定哪里如果您不知道并使用动态规划,则会将其分解为单词,这将考虑到一些数量的子问题。这是很简单的使用递归,但我还没有能够找到任何在线在线找到即使只是一个迭代算法的描述在这个在线,所以如果有任何人有任何例子,或者可以给出一个很好的算法。 / p>

感谢任何帮助。

解决方案

Python wordsegment模块有这样的算法。它使用递归和记忆来实现动态规划。



源代码可在 Github ,这里是相关的片段:

  def segment(text):
返回一个列表的文字是文字的最好的段落。

memo = dict()

def search(text,prev ='< s>'):
如果text =='':
返回0.0,[]

def candidate():
为前缀,后缀为(文字):
prefix_score = log10(score(prefix,prev))

pair =(后缀,前缀)
如果配对不在备忘录中:
memo [pair] = search(后缀,前缀)
suffix_score,suffix_words =备注[对]

yield(prefix_score + suffix_score,[prefix] + suffix_words)

return max(candidates())

result_score,result_words = search (文本))

返回result_words

注意如何备忘录缓存调用搜索,然后从候选人中选择最大值。 p>

If you search google for word segmentation there really are no very good descriptions of it and I'm just trying to fully understand the process a dynamic programming algorithm takes to find a segmentation of a string into individual words. Does anyone know a place where there is a good description of a word segmentation problem or can anyone describe it?

Word Segmentation is basically just taking a string of characters and deciding where to split it up into words if you didn't know and using dynamic programming it would take into account some amount of subproblems. This is pretty simple to do using recursion but I haven't been able to find anywhere online to find even just a description of an iterative algorithm for this online, so if anyone has any examples or can give an algorithm that would be great.

Thanks for any help.

解决方案

The Python wordsegment module has such an algorithm. It uses recursion and memoization to implement dynamic programming.

The source is available on Github, here's the relevant snippet:

def segment(text):
    "Return a list of words that is the best segmenation of `text`."

    memo = dict()

    def search(text, prev='<s>'):
        if text == '':
            return 0.0, []

        def candidates():
            for prefix, suffix in divide(text):
                prefix_score = log10(score(prefix, prev))

                pair = (suffix, prefix)
                if pair not in memo:
                    memo[pair] = search(suffix, prefix)
                suffix_score, suffix_words = memo[pair]

                yield (prefix_score + suffix_score, [prefix] + suffix_words)

        return max(candidates())

    result_score, result_words = search(clean(text))

    return result_words

Note how memo caches calls to search which in turn selects the max from candidates.

这篇关于任何人都知道使用动态规划的词分割的示例算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆