使用马尔可夫链(或类似的东西)产生一个IRC机器人 [英] Using Markov chains (or something similar) to produce an IRC-bot

查看:113
本文介绍了使用马尔可夫链(或类似的东西)产生一个IRC机器人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了google,却发现一点我听不懂的东西.

I tried google and found little that I could understand.

我了解马尔可夫链到一个非常基本的水平:这是一个数学模型,仅取决于之前输入更改状态的信息..这样的FSM具有加权随机机会,而不是不同的标准?

I understand Markov chains to a very basic level: It's a mathematical model that only depends on previous input to change states..so sort of a FSM with weighted random chances instead of different criteria?

我听说您可以使用它们来生成半智能的废话,给定现有单词的句子以用作种类的字典.

I've heard that you can use them to generate semi-intelligent nonsense, given sentences of existing words to use as a dictionary of kinds.

我想不出搜索词来找到它,所以有人可以链接我或解释我如何产生可以给出半智能答案的东西吗? (如果您询问有关派的问题,它将不会继续讲述它所听说的越南战争)

I can't think of search terms to find this, so can anyone link me or explain how I could produce something that gives a semi-intelligent answer? (if you asked it about pie, it would not start going on about the vietnam war it had heard about)

我打算:

  • 让此漫游器在IRC频道中闲置了一段时间
  • 从字符串中删除所有用户名,并存储为句子或其他内容
  • 随着时间的流逝,以此为基础.

推荐答案

是的,马尔可夫链是具有概率状态转移的有限状态机.要使用简单的一阶马尔可夫链生成随机文本,请执行以下操作:

Yes, a Markov chain is a finite-state machine with probabilistic state transitions. To generate random text with a simple, first-order Markov chain:

  1. 从语料库(文本集合)中收集bigram(相邻单词对)统计信息.
  2. 制作一个马尔可夫链,每个单词具有一个状态.为文本结尾保留特殊状态.
  3. 从状态/单词 x 跳到 y 的概率是紧随 x 的单词 y 的概率. em>,由训练语料库中的相对二元组频率估计.
  4. 以随机词 x 开头(可能取决于该词作为语料库中句子的第一个词出现的频率).然后考虑到 x 之后的 y 概率(状态转换概率),选择一个状态/单词 y 随机跳转.重复直到您到达文本结尾.
  1. Collect bigram (adjacent word pair) statistics from a corpus (collection of text).
  2. Make a markov chain with one state per word. Reserve a special state for end-of-text.
  3. The probability of jumping from state/word x to y is the probability of the words y immediately following x, estimated from relative bigram frequencies in the training corpus.
  4. Start with a random word x (perhaps determined by how often that word occurs as the first word of a sentence in the corpus). Then pick a state/word y to jump to randomly, taking into account the probability of y following x (the state transition probability). Repeat until you hit end-of-text.

如果您想从中得到一些半智能的东西,那么最好的选择就是将其训练在许多精心收集的文本上. 很多"部分使其很有可能产生适当的句子(或合理的IRC说话); 精心收集"部分意味着您可以控制所谈论的内容.引入高阶马尔可夫链在这两个方面也有帮助,但是需要更多的存储空间来存储必要的统计信息.您可能还会研究统计平滑等问题.

If you want to get something semi-intelligent out of this, then your best shot is to train it on lots of carefully collected texts. The "lots" part makes it produce proper sentences (or plausible IRC speak) with high probability; the "carefully collected" part means you control what it talks about. Introducing higher-order Markov chains also helps in both areas, but takes more storage to store the necessary statistics. You may also look into things like statistical smoothing.

但是,让您的IRC机器人真正响应它所讲的内容要比Markov链花费更多的时间.可以通过对所说的内容进行文本分类(也称为主题发现)来完成. ,然后选择特定领域的马尔可夫链进行文本生成.朴素贝叶斯(NaïveBayes)是流行的主题发现模型.

However, having your IRC bot actually respond to what is said to it takes a lot more than Markov chains. It may be done by doing text categorization (aka topic spotting) on what is said, then picking a domain-specific Markov chain for text generation. Naïve Bayes is a popular model for topic spotting.

Kernighan和Pike在 编程实践 探索马尔可夫链算法的各种实现策略. Jurafsky和Martin广泛地涵盖了这些以及自然语言的产生, 语音和语言处理 .

Kernighan and Pike in The Practice of Programming explore various implementation strategies for Markov chain algorithms. These, and natural language generation in general, is covered in great depth by Jurafsky and Martin, Speech and Language Processing.

这篇关于使用马尔可夫链(或类似的东西)产生一个IRC机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆