如何确定随机字符串听起来像英语? [英] How do I determine if a random string sounds like English?
问题描述
我有一种算法,可以根据输入单词列表生成字符串.如何仅将听起来像英语单词的字符串分开? IE.保留 LORD 的同时丢弃 RDLO .
编辑:为澄清起见,它们不必是词典中的实际单词.他们只需要听起来像英语.例如,将接受 KEAL .
您可以构建包含大量英文文本的马尔可夫链.
然后,您可以将单词输入到markov链中,并检查单词是英语的可能性有多大.
请参阅此处: http://en.wikipedia.org/wiki/Markov_chain >
在页面底部,您可以看到markov文本生成器.您想要的恰恰相反.
简而言之:markov链为每个字符存储下一个字符将跟随的概率.如果您有足够的内存,可以将此想法扩展为两个或三个字符.I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD.
EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted.
You can build a markov-chain of a huge english text.
Afterwards you can feed words into the markov chain and check how high the probability is that the word is english.
See here: http://en.wikipedia.org/wiki/Markov_chain
At the bottom of the page you can see the markov text generator. What you want is exactly the reverse of it.
In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.
这篇关于如何确定随机字符串听起来像英语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!