从一组给定的单词中做出有意义的句子 [英] Making a meaningful sentence from a given set of words
问题描述
我正在开发一个程序,该程序需要从给定的一组单词中创建语法正确的句子.在这里,我将把字符串列表的输入传递给程序,并且我的输出应该是一个有意义的句子,用这些单词和一些其他必要的单词组成.例如
I am working on a program that needs to create a sentence that is grammatically correct from a given set of words. Here I will be passing an input of a list of strings to the program and my output should be a meaningful sentence made with those words, and a few other words that are necessary. Eg.
Input: {'You' , 'House' , 'Beautiful'}
Output: 'Your house is beautiful' (or) 'you house is beautiful'
Input: {'Father' , 'Love' , 'Child'}
Output: 'The father loves the child'
我该如何使用NLTK和(或)机器学习来实现这一目标?
How do I implement this with NLTK and(or) Machine Learning?
关于我应该如何处理的任何建议?我已经准备好提出最疯狂的想法.谢谢! :)
Any suggestions as to how I should go about this? I'm ready to even the most wildest ideas. Thanks! :)
推荐答案
在这种情况下,您可以应用 n-gram模型.这个想法是一个句子
In this case you can apply an n-gram model. The idea is that a sentence
我非常喜欢NLP.
I like NLP very much.
得到以下3-克:
-
<s> I like
-
I like NLP
-
like NLP very
-
NLP very much
-
very much </s>
<s> I like
I like NLP
like NLP very
NLP very much
very much </s>
然后您将其视为概率模型P(word3 | word1 word2)
.
Then you think of it as a probability model P(word3 | word1 word2)
.
所以您的工作将是:
- 彼此获取大量n个单词的数据(例如,我认为 https://books.google.com /ngrams 具有下载选项)
- 对于一组给定的单词,找到所有仅包含这些单词的n-gram.
- 找到最可能的组合.
- Get a lot of data of n words after each other (e.g. I think https://books.google.com/ngrams has a download option)
- For a given set of words, find all n-grams which contain only those words
- Find the most likely combination.
请注意:
- n至少应为3
- n越大,得到的可能性就越大"退避",因为您没有数据(但是n-gram可能存在并且很有意义)
- 甚至n = 5的数据已经非常多了
- n should be at least 3
- the bigger n gets, the more likely it gets that you have to "back off" as you don't have data (but the n-gram might exist and make sense)
- even n=5 is already VERY much data
这篇关于从一组给定的单词中做出有意义的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!