Python代码流无法按预期工作? [英] Python code flow does not work as expected?

查看:63
本文介绍了Python代码流无法按预期工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过正则表达式和NLTK的python处理各种文本-位于 http://www .nltk.org/book- .我正在尝试创建一个随机文本生成器,但有一个小问题.首先,这是我的代码流:

I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a slight problem. Firstly, here is my code flow:

  1. 输入一个句子作为输入-这称为触发字符串,分配给变量-

  1. Enter a sentence as input -this is called trigger string, is assigned to a variable-

获取触发字符串中最长的单词

Get longest word in trigger string

搜索所有Project Gutenberg数据库以查找包含该单词的句子-不论大写小写-

Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-

返回包含我在第3步中提到的单词的最长句子

Return the longest sentence that has the word I spoke about in step 3

将步骤1和步骤4中的句子一起添加

Append the sentence in Step 1 and Step4 together

将步骤4中的句子分配为新的触发"句子,然后重复该过程.请注意,我必须在第二句话中得到最长的单词,然后继续这样,依此类推-

Assign the sentence in Step 4 as the new 'trigger' sentence and repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

到目前为止,我只能执行一次此操作.当我尝试继续进行此操作时,该程序只会继续打印搜索结果产生的第一句话.实际上,它应该在这个新句子中寻找最长的单词,并继续应用上述代码流.

So far, I have been able to do this only once. When I try to keep this to continue, the program only keeps printing the first sentence my search yields. It should actually look for the longest word in this new sentence and keep applying my code flow described above.

下面是我的代码以及示例输入/输出:

Below is my code along with a sample input/output :

大量代码"

"Thane of code"

样本输出

代码般的挪威(挪威)本人,有着可怕的数字,在那个最不忠诚的托盘(Traytor)的协助下,卡多(Thane)爆发了一场小规模的冲突,直到贝罗娜(Bellona)的Bridegroome(新娘)举足轻重,并与他进行自我比较来面对他,点.反对波因特,叛逆的Arme赢得了Arme,遏制了他不光彩的精神:总而言之,The Victorie战胜了vs.

"Thane of code Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs"

现在,这实际上应采用以挪威本人...."开头的句子,并在其中寻找最长的单词,然后执行上述步骤,依此类推,但并非如此.有什么建议?谢谢.

Now this should actually take the sentence that starts with 'Norway himselfe....' and look for the longest word in it and do the steps above and so on but it doesn't. Any suggestions? Thanks.

import nltk

from nltk.corpus import gutenberg

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str

split_str = triggerSentence.split()#split the sentence into words

longestLength = 0

longestString = ""

montyPython = 1

while montyPython:

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-

    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    # I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr

推荐答案

先生.汉金的答案更为优雅,但以下内容与您最初采用的方法更加一致:

Mr. Hankin's answer is more elegant, but the following is more in keeping with the approach you began with:

import sys
import string
import nltk
from nltk.corpus import gutenberg

def longest_element(p):
    """return the first element of p which has the greatest len()"""
    max_len = 0
    elem = None
    for e in p:
        if len(e) > max_len:
            elem = e
            max_len = len(e)
    return elem

def downcase(p):
    """returns a list of words in p shifted to lower case"""
    return map(string.lower, p)


def unique_words():
    """it turns out unique_words was never referenced so this is here
       for pedagogy"""
    # there are 2.6 million words in the gutenburg corpus but only ~42k unique
    # ignoring case, let's pare that down a bit
    for word in gutenberg.words():
        words.add(word.lower())
    print 'gutenberg.words() has', len(words), 'unique caseless words'
    return words

print 'loading gutenburg corpus...'
sentences = []
for sentence in gutenberg.sents():
    sentences.append(downcase(sentence))

trigger = sys.argv[1:]
target = longest_element(trigger).lower()
last_target = None

while target != last_target:
    matched_sentences = []
    for sentence in sentences:
        if target in sentence:
            matched_sentences.append(sentence)

    print '===', target, 'matched', len(matched_sentences), 'sentences'
    longestSentence = longest_element(matched_sentences)
    print ' '.join(longestSentence)

    trigger = longestSentence
    last_target = target
    target = longest_element(trigger).lower()

尽管给出了例句,但它在两个周期内达到固定:

Given your sample sentence though, it reaches fixation in two cycles:

$ python nltkgut.py塔恩代码
正在加载gutenburg语料库...
===目标匹配24个句子
挪威人,很可怕 数字,得到最多的帮助 不忠的托运人 卡多(Cawdor),开始了一场小小的冲突, 直到那个贝罗纳的新娘, 在普罗旺斯摔倒,与他面对面. 自我比较,反对 点,叛逆的武装 ,遏制他不光彩的精神: 结论是,胜利败在vs
===目标Bridegroome匹配1句话
挪威本人 可怕的数字,并辅以 最不忠实的托运人 卡多(Cawdor),开始了一场小小的冲突, 直到那个贝罗纳的新娘, 在普罗维普(proe)中摔倒,与他面对面. 自我比较,反对 点,叛逆的武装 ,遏制他不光彩的精神: 结论是,胜利在vs上失败了.

$ python nltkgut.py Thane of code
loading gutenburg corpus...
=== target thane matched 24 sentences
norway himselfe , with terrible numbers , assisted by that most disloyall traytor , the thane of cawdor , began a dismall conflict , till that bellona ' s bridegroome , lapt in proofe , confronted him with selfe - comparisons , point against point , rebellious arme ' gainst arme , curbing his lauish spirit : and to conclude , the victorie fell on vs
=== target bridegroome matched 1 sentences
norway himselfe , with terrible numbers , assisted by that most disloyall traytor , the thane of cawdor , began a dismall conflict , till that bellona ' s bridegroome , lapt in proofe , confronted him with selfe - comparisons , point against point , rebellious arme ' gainst arme , curbing his lauish spirit : and to conclude , the victorie fell on vs

对最后一个问题的回答的部分麻烦在于,它确实按照您的要求进行了操作,但是您提出的问题比您想要的答案还要具体.这样,响应就陷入了一些不确定的列表表达式中,我不确定您是否理解.我建议您更加自由地使用print语句,如果您不知道它的作用,请不要导入代码.在展开列表表达式时,我发现(如上所述)您从未使用过语料库单词列表.功能也有帮助.

Part of the trouble with the response to the last problem is that it did what you asked, but you asked a more specific question than you wanted an answer to. Thus the response got bogged down in some rather complicated list expressions that I'm not sure you understood. I suggest that you make more liberal use of print statements and don't import code if you don't know what it does. While unwrapping the list expressions I found (as noted) that you never used the corpus wordlist. Functions are a help also.

这篇关于Python代码流无法按预期工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆