如何确定随机字符串是否听起来像英语? [英] How do I determine if a random string sounds like English?
问题描述
我有一个算法,可以根据输入单词列表生成字符串.如何仅将听起来像英语单词的字符串分开?IE.丢弃RDLO,同时保留LORD.
澄清一下,它们不需要是字典中的实际单词.他们只需要听起来像英语.例如 KEAL 将被接受.
您可以构建一个巨大的英文文本的马尔可夫链.
之后,您可以将单词输入马尔可夫链并检查单词是英语的概率有多大.
见这里:http://en.wikipedia.org/wiki/Markov_chain >
在页面底部,您可以看到马尔可夫文本生成器.你想要的恰恰相反.
简而言之:马尔可夫链为每个字符存储下一个字符出现的概率.如果你有足够的内存,你可以将这个想法扩展到两个或三个字符.
I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD.
EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted.
You can build a markov-chain of a huge english text.
Afterwards you can feed words into the markov chain and check how high the probability is that the word is english.
See here: http://en.wikipedia.org/wiki/Markov_chain
At the bottom of the page you can see the markov text generator. What you want is exactly the reverse of it.
In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.
这篇关于如何确定随机字符串是否听起来像英语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!