将文本预测脚本[马尔可夫链]从javascript转换为python [英] Convert text prediction script [Markov Chain] from javascript to python
问题描述
我一直在尝试过去几天转换这个js脚本到python代码。
i've been trying the last couple days to convert this js script to python code.
到目前为止我的实现(主要是blindfull cp,这里和那里的一些小修复):
My implementation (blindfull cp mostly, some minor fixes here and there) so far:
import random
class markov:
memory = {}
separator = ' '
order = 2
def getInitial(self):
ret = []
for i in range(0, self.order, 1):
ret.append('')
return ret
def breakText(self, txt, cb):
parts = txt.split(self.separator)
prev = self.getInitial()
def step(self):
cb(prev, self.next)
prev.shift()#Javascript function.
prev.append(self.next)
#parts.forEach(step) # - step is the function above.
cb(prev, '')
def learn(self, txt):
mem = self.memory
def learnPart(key, value):
if not mem[key]:
mem[key] = []
mem[key] = value
return mem
self.breakText(txt, learnPart)
def step(self, state, ret):
nextAvailable = self.memory[state] or ['']
self.next = nextAvailable[random.choice(nextAvailable.keys())]
if not self.next:
return ret
ret.append(next)
nextState = state.slice(1)
return self.step(nextState, ret)
def ask(self, seed):
if not seed:
seed = self.genInitial()
seed = seed + self.step(seed, []).join(self.separator)
return seed
问题:
-
我完全不了解javascript。
I have absolutely no knowledge of javascript.
当我试图学习一些文本到马尔可夫类对象时[例如:a = markov(); a.learn(sdfg);]我得到以下错误:TypeError:unhashable type:'list',对于learnPart函数中的mem字典,是learn函数的成员。
When i try to "learn" some text to a "markov" class object [e.g.: a=markov(); a.learn("sdfg");] i get the following error: "TypeError: unhashable type: 'list'", for the "mem" dictionary at the "learnPart" function, member of the "learn" function.
所以到目前为止我的问题是为什么会出现这个异常[列表对象的TypeError,错误地引用字典对象(可以清除)]?
So my question so far is why does this exception [TypeError for a list object, falsely referring to a dictionary object (which is hashable)] occur?
提前感谢任何建议,路线,积分,一般帮助:D
thanks in advance for any suggestions, directions, points, help in general :D
推荐答案
写这篇文章的家伙。很高兴你发现它很有用!现在,我第一次实现Markov链实际上是在Python中,所以这个答案将集中在如何以更Pythonic的方式编写它。我将展示如何制作订单2马尔可夫链,因为它们很容易讨论,但你可以通过一些修改使它成为订单-N。
Guy who wrote the article speaking. Glad you found it useful! Now, my first implementation of a Markov chain was actually in Python, so this answer will focus on how to write it in a more Pythonic way. I'll show how to go about making an order-2 Markov chain, since they're easy to talk about, but you can of course make it order-N with some modifications.
在js中,两个突出的数据结构是通用对象和数组(它是通用对象的扩展)。但是,在Python中,您可以使用其他选项来进行更精细的控制。以下是两种实现的主要区别:
In js, the two prominent data structures are the generic object and the array (which is an extension to the generic object). In Python however, you have other options for more finely-grained control. Here're the major differences in the two implementations:
-
我们链中的状态实际上是一个元组 - 一个不可变的,有序的结构,具有固定数量的元素。我们总是想要
n
元素(在这种情况下,n = 2
)并且他们的订单有意义。
A state in our chain is really a tuple - an immutable, ordered structure, with a fixed amount of elements. We always want
n
elements (in this case,n=2
) and their order has meaning.
如果我们使用 defaultdict 包装列表,所以我们可以跳过检查状态是否存在,然后执行X,而只是执行X.
Manipulating the memory will be easier if we use a defaultdict wrapping a list, so we can skip the "checking if a state doesn't exist, and then doing X", and instead just do X.
因此,我们在顶部的集合import defaultdict 中添加并更改
markov.memory
已定义:
So, we stick a from collections import defaultdict
at the top and change how markov.memory
is defined:
memory = defaultdict(list)
现在我们更改 markov.getInitial
返回一个元组(记住这解释了订单2链):
Now we change markov.getInitial
to return a tuple (remember this explains an order-2 chain):
def getInitial(self):
return ('', '')
(如果你想进一步扩展它,你可以使用一个非常巧妙的Python技巧: 元组([''] * 2)
将会返回同样的事情。您可以使用无
)而不是空字符串。
(if you want to expand it further, you can use a really neat Python trick: tuple([''] * 2)
will return the same thing. Instead of empty strings, you can use None
)
我们将改变使用<$ c $的内容c> genInitial 稍微。
一个强有力的概念在js中尚未存在但在Python中确实存在的是 yield
运算符(请参阅此问题以获得很好的解释。)
A strong concept which doesn't exist in js (yet) but does exist in Python is the yield
operator (see this question for great explanations).
Python的另一个特性是它的泛型 for
循环。您可以轻松地查看几乎所有内容,包括生成器(使用 yield
的函数)。结合这两个,我们可以重新定义 breakText
:
Another feature of Python is its generic for
loop. You can go over nearly anything quite easily, including generators (functions which use yield
). Combining the two, and we can redefine breakText
:
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
#will be explained in the next paragraph
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield prev, ''
上面的神奇部分, prev =(prev [1],word)
可以通过示例解释得最好:
The magic part above, prev = (prev[1], word)
can be explained best by example:
>>> a = (0, 1)
>>> a
(0, 1)
>>> a = (a[1], 2)
>>> a
(1, 2)
这就是我们通过单词列表前进的方式。现在我们转到使用 breakText
,重新定义 markov.learn
:
That's how we advance through the word list. And now we move up to what uses breakText
, to the redefinition of markov.learn
:
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
因为我们的内存是 defaultdict
,所以我们不必担心密钥不存在。
Because our memory is a defaultdict
, we don't have to worry about the key not existing.
好的,我们已实施了一半的链条,是时候看到它在行动了!到目前为止我们有什么:
OK, we have half of the chain implemented, time to see it in action! What we have so far:
from collections import defaultdict
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def getInitial(self):
return ('', '')
(我将班级名称从 markov
更改为 Markov
,因为每次课程以小写字母开头时我都会畏缩不前。我把它保存为 brain.py
并加载了Python。
(I changed the class name from markov
to Markov
because I cringe every time a class begins with a lowercase letter). I saved it as brain.py
and loaded up Python.
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.memory
defaultdict(<class 'list'>, {('had', 'a'): ['little'], ('Mary', 'had'): ['a'], ('', ''): ['Mary'], ('little', 'lamb'): [''], ('a', 'little'): ['lamb'], ('', 'Mary'): ['had']})
成功!让我们更仔细地看一下结果,看看我们是否正确:
Success! Let's look at the result more carefully, to see that we got it right:
{ ('', ''): ['Mary'],
('', 'Mary'): ['had'],
('Mary', 'had'): ['a'],
('a', 'little'): ['lamb'],
('had', 'a'): ['little'],
('little', 'lamb'): ['']}
拉上准备开车吗?我们仍然必须使用这个链!
zips up Ready to drive on? We still have to use this chain!
我们已经满足了我们需要重新制作步骤
。我们有defaultdict,所以我们可以立即使用 random.choice
,我可以作弊,因为我知道链的顺序。我们也可以摆脱递归(带有一些悲伤),如果我们把它看作是一个通过链条的单一步骤的函数(我在原始文章中的错误 - 一个命名错误的函数)。
We've already met what we need to remake step
. We have the defaultdict, so we can use random.choice
right away, and I can cheat a bit because I know the order of the chain. We can also get rid of the recursion (with some sorrow), if we see it as a function which takes a single step through the chain (my bad in the original article - a badly named function).
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
我遗憾地添加了或['']
因为随机.choice
关于空列表的呻吟声。最后,我们将更大部分的逻辑移动到询问
(句子的实际构造):
I regretfully added the or ['']
because random.choice
moans about empty lists. Finally, we move a larger portion of the logic to ask
(the actual construction of the sentence):
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
是的,有点难过。我们可以给步骤
一个更好的名字,并把它变成一个发电机,但我已经迟到了与我怀孕的妻子会面,即将生下一个孩子离开了炉子着火了我的车被拖走了!我最好快点!
Yes, a bit yucky. We could have given step
a better name and made it a generator, but I'm late for a meeting with my pregnant wife who's about to give birth to a baby who left the stove on fire in my car that's being towed! I better hurry!
但首先,与bob谈话:
But first, a talk with bob:
from collections import defaultdict
import random
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
def getInitial(self):
return ('', '')
并加载:
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.ask()
'Mary had a little lamb'
>>> bob.learn('Mary had a giant crab')
>>> bob.ask(('Mary', 'had'))
'a giant crab'
<当然,还有改进和扩展概念的空间。但如果我刚给你答案,那就不会有任何乐趣。
There is, of course, room for improvement and expanding on the concept. But it wouldn't be any fun if if I just gave you the answer.
希望4个月后这仍然有用。
Hopefully this will still help after 4 months.
这篇关于将文本预测脚本[马尔可夫链]从javascript转换为python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!