将文本预测脚本[马尔可夫链]从javascript转换为python [英] Convert text prediction script [Markov Chain] from javascript to python

查看:89
本文介绍了将文本预测脚本[马尔可夫链]从javascript转换为python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试过去几天转换这个js脚本到python代码。

i've been trying the last couple days to convert this js script to python code.

到目前为止我的实现(主要是blindfull cp,这里和那里的一些小修复):

My implementation (blindfull cp mostly, some minor fixes here and there) so far:

import random
class markov:
    memory = {}
    separator = ' '
    order = 2

    def getInitial(self):
        ret = []
        for i in range(0, self.order, 1):
            ret.append('')
        return ret

    def breakText(self, txt, cb):
        parts = txt.split(self.separator)
        prev = self.getInitial()
        def step(self):
            cb(prev, self.next)
            prev.shift()#Javascript function.
            prev.append(self.next)
        #parts.forEach(step) # - step is the function above.
        cb(prev, '')

    def learn(self, txt):
        mem = self.memory
        def learnPart(key, value):
            if not mem[key]:
                mem[key] = []
            mem[key] = value
            return mem
        self.breakText(txt, learnPart)

    def step(self, state, ret):
        nextAvailable = self.memory[state] or ['']
        self.next = nextAvailable[random.choice(nextAvailable.keys())]
        if not self.next:
            return ret
        ret.append(next)
        nextState = state.slice(1)
        return self.step(nextState, ret)

    def ask(self, seed):
        if not seed:
            seed = self.genInitial()
        seed = seed + self.step(seed, []).join(self.separator)
        return seed

问题:


  1. 我完全不了解javascript。

  1. I have absolutely no knowledge of javascript.

当我试图学习一些文本到马尔可夫类对象时[例如:a = markov(); a.learn(sdfg);]我得到以下错误:TypeError:unhashable type:'list',对于learnPart函数中的mem字典,是learn函数的成员。

When i try to "learn" some text to a "markov" class object [e.g.: a=markov(); a.learn("sdfg");] i get the following error: "TypeError: unhashable type: 'list'", for the "mem" dictionary at the "learnPart" function, member of the "learn" function.

所以到目前为止我的问题是为什么会出现这个异常[列表对象的TypeError,错误地引用字典对象(可以清除)]?

So my question so far is why does this exception [TypeError for a list object, falsely referring to a dictionary object (which is hashable)] occur?

提前感谢任何建议,路线,积分,一般帮助:D

thanks in advance for any suggestions, directions, points, help in general :D

推荐答案

写这篇文章的家伙。很高兴你发现它很有用!现在,我第一次实现Markov链实际上是在Python中,所以这个答案将集中在如何以更Pythonic的方式编写它。我将展示如何制作订单2马尔可夫链,因为它们很容易讨论,但你可以通过一些修改使它成为订单-N。

Guy who wrote the article speaking. Glad you found it useful! Now, my first implementation of a Markov chain was actually in Python, so this answer will focus on how to write it in a more Pythonic way. I'll show how to go about making an order-2 Markov chain, since they're easy to talk about, but you can of course make it order-N with some modifications.

在js中,两个突出的数据结构是通用对象和数组(它是通用对象的扩展)。但是,在Python中,您可以使用其他选项来进行更精细的控制。以下是两种实现的主要区别:

In js, the two prominent data structures are the generic object and the array (which is an extension to the generic object). In Python however, you have other options for more finely-grained control. Here're the major differences in the two implementations:


  • 我们链中的状态实际上是一个元组 - 一个不可变的,有序的结构,具有固定数量的元素。我们总是想要 n 元素(在这种情况下, n = 2 )并且他们的订单有意义。

  • A state in our chain is really a tuple - an immutable, ordered structure, with a fixed amount of elements. We always want n elements (in this case, n=2) and their order has meaning.

如果我们使用 defaultdict 包装列表,所以我们可以跳过检查状态是否存在,然后执行X,而只是执行X.

Manipulating the memory will be easier if we use a defaultdict wrapping a list, so we can skip the "checking if a state doesn't exist, and then doing X", and instead just do X.

因此,我们在顶部的集合import defaultdict 中添加并更改 markov.memory 已定义:

So, we stick a from collections import defaultdict at the top and change how markov.memory is defined:

memory = defaultdict(list)

现在我们更改 markov.getInitial 返回一个元组(记住这解释了订单2链):

Now we change markov.getInitial to return a tuple (remember this explains an order-2 chain):

def getInitial(self):
    return ('', '')

(如果你想进一步扩展它,你可以使用一个非常巧妙的Python技巧: 元组([''] * 2)将会返回同样的事情。您可以使用)而不是空字符串。

(if you want to expand it further, you can use a really neat Python trick: tuple([''] * 2) will return the same thing. Instead of empty strings, you can use None)

我们将改变使用<$ c $的内容c> genInitial 稍微。

一个强有力的概念在js中尚未存在但在Python中确实存在的是 yield 运算符(请参阅此问题以获得很好的解释。)

A strong concept which doesn't exist in js (yet) but does exist in Python is the yield operator (see this question for great explanations).

Python的另一个特性是它的泛型 for 循环。您可以轻松地查看几乎所有内容,包括生成器(使用 yield 的函数)。结合这两个,我们可以重新定义 breakText

Another feature of Python is its generic for loop. You can go over nearly anything quite easily, including generators (functions which use yield). Combining the two, and we can redefine breakText:

def breakText(self, txt):
    #our very own (ε,ε)
    prev = self.getInitial()

    for word in txt.split(self.separator):
        yield prev, word
        #will be explained in the next paragraph
        prev = (prev[1], word)

    #end-of-sentence, prev->ε
    yield prev, ''

上面的神奇部分, prev =(prev [1],word)可以通过示例解释得最好:

The magic part above, prev = (prev[1], word) can be explained best by example:

>>> a = (0, 1)
>>> a
(0, 1)
>>> a = (a[1], 2)
>>> a
(1, 2)

这就是我们通过单词列表前进的方式。现在我们转到使用 breakText ,重新定义 markov.learn

That's how we advance through the word list. And now we move up to what uses breakText, to the redefinition of markov.learn:

def learn(self, txt):
    for part in self.breakText(txt):
        key = part[0]
        value = part[1]

        self.memory[key].append(value)

因为我们的内存是 defaultdict ,所以我们不必担心密钥不存在。

Because our memory is a defaultdict, we don't have to worry about the key not existing.

好的,我们已实施了一半的链条,是时候看到它在行动了!到目前为止我们有什么:

OK, we have half of the chain implemented, time to see it in action! What we have so far:

from collections import defaultdict

class Markov:
    memory = defaultdict(list)
    separator = ' '

    def learn(self, txt):
        for part in self.breakText(txt):
            key = part[0]
            value = part[1]

            self.memory[key].append(value)

    def breakText(self, txt):
        #our very own (ε,ε)
        prev = self.getInitial()

        for word in txt.split(self.separator):
            yield prev, word
            prev = (prev[1], word)

        #end-of-sentence, prev->ε
        yield (prev, '')

    def getInitial(self):
        return ('', '')

(我将班级名称从 markov 更改为 Markov ,因为每次课程以小写字母开头时我都会畏缩不前。我把它保存为 brain.py 并加载了Python。

(I changed the class name from markov to Markov because I cringe every time a class begins with a lowercase letter). I saved it as brain.py and loaded up Python.

>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.memory
defaultdict(<class 'list'>, {('had', 'a'): ['little'], ('Mary', 'had'): ['a'], ('', ''): ['Mary'], ('little', 'lamb'): [''], ('a', 'little'): ['lamb'], ('', 'Mary'): ['had']})

成功!让我们更仔细地看一下结果,看看我们是否正确:

Success! Let's look at the result more carefully, to see that we got it right:

{ ('', ''): ['Mary'],
  ('', 'Mary'): ['had'],
  ('Mary', 'had'): ['a'],
  ('a', 'little'): ['lamb'],
  ('had', 'a'): ['little'],
  ('little', 'lamb'): ['']}

拉上准备开车吗?我们仍然必须使用这个链!

zips up Ready to drive on? We still have to use this chain!

我们已经满足了我们需要重新制作步骤。我们有defaultdict,所以我们可以立即使用 random.choice ,我可以作弊,因为我知道链的顺序。我们也可以摆脱递归(带有一些悲伤),如果我们把它看作是一个通过链条的单一步骤的函数(我在原始文章中的错误 - 一个命名错误的函数)。

We've already met what we need to remake step. We have the defaultdict, so we can use random.choice right away, and I can cheat a bit because I know the order of the chain. We can also get rid of the recursion (with some sorrow), if we see it as a function which takes a single step through the chain (my bad in the original article - a badly named function).

def step(self, state):
    choice = random.choice(self.memory[state] or [''])

    if not choice:
        return None

    nextState = (state[1], choice)
    return choice, nextState

我遗憾地添加了或[''] 因为随机.choice 关于空列表的呻吟声。最后,我们将更大部分的逻辑移动到询问(句子的实际构造):

I regretfully added the or [''] because random.choice moans about empty lists. Finally, we move a larger portion of the logic to ask (the actual construction of the sentence):

def ask(self, seed=False):
    ret = []

    if not seed:
        seed = self.getInitial()

    while True:
        link = self.step(seed)

        if link is None:
            break

        ret.append(link[0])
        seed = link[1]

    return self.separator.join(ret)

是的,有点难过。我们可以给步骤一个更好的名字,并把它变成一个发电机,但我已经迟到了与我怀孕的妻子会面,即将生下一个孩子离开了炉子着火了我的车被拖走了!我最好快点!

Yes, a bit yucky. We could have given step a better name and made it a generator, but I'm late for a meeting with my pregnant wife who's about to give birth to a baby who left the stove on fire in my car that's being towed! I better hurry!

但首先,与bob谈话:

But first, a talk with bob:

from collections import defaultdict
import random

class Markov:
    memory = defaultdict(list)
    separator = ' '

    def learn(self, txt):
        for part in self.breakText(txt):
            key = part[0]
            value = part[1]

            self.memory[key].append(value)

    def ask(self, seed=False):
        ret = []

        if not seed:
            seed = self.getInitial()

        while True:
            link = self.step(seed)

            if link is None:
                break

            ret.append(link[0])
            seed = link[1]

        return self.separator.join(ret)

    def breakText(self, txt):
        #our very own (ε,ε)
        prev = self.getInitial()

        for word in txt.split(self.separator):
            yield prev, word
            prev = (prev[1], word)

        #end-of-sentence, prev->ε
        yield (prev, '')

    def step(self, state):
        choice = random.choice(self.memory[state] or [''])

        if not choice:
            return None

        nextState = (state[1], choice)
        return choice, nextState

    def getInitial(self):
        return ('', '')

并加载:

>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.ask()
'Mary had a little lamb'
>>> bob.learn('Mary had a giant crab')
>>> bob.ask(('Mary', 'had'))
'a giant crab'



<当然,还有改进和扩展概念的空间。但如果我刚给你答案,那就不会有任何乐趣。

There is, of course, room for improvement and expanding on the concept. But it wouldn't be any fun if if I just gave you the answer.

希望4个月后这仍然有用。

Hopefully this will still help after 4 months.

这篇关于将文本预测脚本[马尔可夫链]从javascript转换为python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆