python中的Lambda函数 [英] Lambda Functions in python

查看:111
本文介绍了python中的Lambda函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在NLTK工具包中,我尝试使用lambda函数来过滤结果.

In the NLTK toolkit, I try to use the lambda function to filter the results.

我有一个test_file和一个terms_file

I have a test_file and a terms_file

我正在做的是使用NLTK中的似然比来对多词术语进行排序.但是,这里的输入是多词术语的引理,因此我创建了一个函数,该函数从每个多词术语中提取其引理,然后在lambda函数中引入引理.

What I'm doing is to use the likelihood_ratio in NLTK to rank the multi word terms in the terms_file. But, the input here is the lemma of the multi word terms, so I created a function which extracts from each multi word term its lemma to be introduced afterthat in the lambda function.

所以看起来像这样

text_file = myfile
terms_file= myfile
def lem(file):
    return lemma for each term in the file

我的问题在这里

如何在过滤器中调用此函数,因为当我执行以下操作时,此函数不起作用.

How can I call this function in the filter, because when I do what following it does not work.

finder = BigramCollocationFinder.from_words(text_file)
finder.apply_ngram_filter(lambda *w: w not in lem(terms_file))
finder.score_ngrams(BigramAssocMeasures.likelihood_ratio)
print(finder)

也无法进行迭代

   finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)]) 

推荐答案

(这是一个疯狂的猜测,但是我非常有信心这是造成您问题的原因.

(This is sort of a wild guess, but I'm pretty confident that this is the cause of your problem.

从伪代码来看,lem函数在文件句柄上运行,从该文件中读取一些信息.您需要了解文件句柄是一个迭代器,并且它一旦进行迭代,将被耗尽.也就是说,第一次调用lem可以正常工作,但是随后完全读取了文件,进一步的调用将不会产生任何结果.

Judging from your pseudo-code, the lem function operates on a file handle, reading some information from that file. You need to understand that a file handle is an iterator, and it will be exhausted when iterated once. That is, the first call to lem works as expected, but then the file is fully read and further calls will yield no results.

因此,我建议将lem的结果存储在列表中.这也应该比一次又一次地读取文件快得多.尝试这样的事情:

Thus, I suggest storing the result of lem in a list. This should also be much faster than reading the file again and again. Try something like this:

all_lemma = lem(terms_file) # temporary variable holding the result of `lem`
finder.apply_ngram_filter(lambda *w: w not in all_lemma)

您的行finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)])不起作用,因为虽然这会根据lem的结果创建一个列表,但每次执行lambda时它都会这样做,因此最终会遇到相同的问题.

Your line finder.apply_ngram_filter(lambda *w: w not in [x for x in lem(terms_file)]) does not work, because while this creates a list from the result of lem, it does so each time the lambda is executed, so you end up with the same problem.

(不确定apply_ngram_filter的作用,因此之后可能还会有更多问题.)

(Not sure what apply_ngram_filter does, so there might be more problems after that.)

更新:从您的其他问题来看,lem本身就是一个生成器函数.在这种情况下,您必须将结果显式转换为列表.否则,当该生成器用尽时,您将遇到同样的问题.

Update: Judging from your other question, it seems like lem itself is a generator function. In this case, you have to explicitly convert the results to a list; otherwise you will run into just the same problem when that generator is exhausted.

all_lemma = list(lem(terms_file))

如果lem产生的元素是可哈希的,则还可以创建set而不是列表,即all_lemma = set(lem(terms_file));这将使filter中的查找更快.

If the elements yielded by lem are hashable, you can also create a set instead of a list, i.e. all_lemma = set(lem(terms_file)); this will make the lookup in the filter much faster.

这篇关于python中的Lambda函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆