在Python中使用NLTK时,generate()会做什么? [英] What does generate() do when using NLTK in Python?

查看:451
本文介绍了在Python中使用NLTK时,generate()会做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去三天我一直在与NLTK合作,以熟悉并阅读《自然语言处理》一书以了解正在发生的事情.我很好奇是否有人可以为我澄清以下内容:

请注意,第一次运行此命令的速度很慢,因为它会 收集有关单词序列的统计信息.每次运行时,您 将获得不同的输出文本.现在尝试在 就职地址或Internet聊天室的样式.虽然 文字是随机的,它会重用来源中常见的单词和短语 文字,让我们对其样式和内容有所了解. (缺少的是 在此随机生成的文本中?)

这部分文字第1章只是说它收集统计信息",它将得到不同的输出文本"

具体生成什么 以及它如何工作?

这个generate()的示例使用text3,这是圣经的创世纪:

一开始,在我和你之间,你可能在花园里 进挪亚和方舟,进柜子说,还没有. 我们的任何部分或遗产,使您成为以法莲和 她随附的公爵的沙;他们来了.还 他用刺耳的鸽子把你的鸽子赶出了你,向他们哭泣. 很大;她想到了他们的名字,并用他们的名字称呼他们. 子宫结束后?然后他

在这里,generate()函数似乎只是输出通过在标点符号处截断文本并随机重新组合而创建的短语,但是它具有一定的可读性.

解决方案

type(text3)将告诉您text3的类型为nltk.text.Text.

要引用文档 Text.generate():

打印使用Trigram语言模型生成的随机文本.

这意味着NLTK为创世纪文本创建了 N-Gram模型 ,计算三个单词序列的每次出现次数,以便可以预测本文中任何给定两个单词的最有可能的后继者. N-Gram模型将在NLTK书的第5章第5章中进行详细说明. >

另请参阅此问题的答案

I've been working with NLTK for the past three days to get familiar and reading the "Natural Language processing" book to understand what's going on. I'm curious if someone could clarify for me the following:

Note that the first time you run this command, it is slow because it gathers statistics about word sequences. Each time you run it, you will get different output text. Now try generating random text in the style of an inaugural address or an Internet chat room. Although the text is random, it re-uses common words and phrases from the source text and gives us a sense of its style and content. (What is lacking in this randomly generated text?)

This part of the text, chapter 1, simply says that it "gathers statistics" and it will get "different output text"

What specifically does generate do and how does it work?

This example of generate() uses text3, which is the Bible's Genesis:

In the beginning , between me and thee and in the garden thou mayest come in unto Noah into the ark , and Mibsam , And said , Is there yet any portion or inheritance for us , and make thee as Ephraim and as the sand of the dukes that came with her ; and they were come . Also he sent forth the dove out of thee , with tabret , and wept upon them greatly ; and she conceived , and called their names , by their names after the end of the womb ? And he

Here, the generate() function seems to simply output phrases created by cutting off text at punctuation and randomly reassembling it but it has a bit of readability to it.

解决方案

type(text3) will tell you that text3 is of type nltk.text.Text.

To cite the documentation of Text.generate():

Print random text, generated using a trigram language model.

That means that NLTK has created an N-Gram model for the Genesis text, counting each occurence of sequences of three words so that it can predict the most likely successor of any given two words in this text. N-Gram models will be explained in more detail in chapter 5 of the NLTK book.

See also the answers to this question.

这篇关于在Python中使用NLTK时,generate()会做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆