计算嵌套列表中的频率 [英] computing frequencies in a nested list

查看:86
本文介绍了计算嵌套列表中的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用嵌套列表中的字典来计算单词的频率.每个嵌套列表都是分解为每个单词的句子.另外,我想删除句子开头的专有名词和小写单词.甚至有可能获得专有名词吗?

I'm trying to compute the frequencies of words using a dictionary in a nested lists. Each nested list is a sentence broken up into each word. Also, I want to delete proper nouns and lower case words at the beginning of the sentence. Is it even possible to get ride of proper nouns?

x = [["Hey", "Kyle","are", "you", "doing"],["I", "am", "doing", "fine"]["Kyle", "what", "time" "is", "it"]

from collections import Counter
def computeFrequencies(x):
    count = Counter()
    for listofWords in L:
        for word in L:
            count[word] += 1
    return count

返回错误:不可散列的类型:列表"

It is returning an error: unhashable type: 'list'

我想在没有字典周围的Counter()的情况下完全返回此值:

I want to return exactly this without the Counter() around the dictionary:

{"hey": 1, "how": 1, "are": 1, "you": 1, "doing": 2, "i": , "am": 1, "fine": 1, "what": 1, "time": 1, "is": 1, "it": 1}

推荐答案

由于您的数据是嵌套的,因此可以使用chain.from_iterable将其扁平化

Since your data is nested, you can flatten it with chain.from_iterable like this

from itertools import chain
from collections import Counter
print Counter(chain.from_iterable(x))
# Counter({'doing': 2, 'Kyle': 2, 'what': 1, 'timeis': 1, 'am': 1, 'Hey': 1, 'I': 1, 'are': 1, 'it': 1, 'you': 1, 'fine': 1})

如果要使用生成器表达式,则可以

If you want to use generator expression, then you can do

from collections import Counter
print Counter(item for items in x for item in items)

如果您不想使用Counter来执行此操作,则可以使用像这样的普通词典

If you want to do this without using Counter, then you can use a normal dictionary like this

my_counter = {}
for line in x:
    for word in line:
        my_counter[word] = my_counter.get(word, 0) + 1
print my_counter

您也可以像这样使用collections.defaultdict

from collections import defaultdict
my_counter = defaultdict(int)
for line in x:
    for word in line:
        my_counter[word] += 1

print my_counter

好吧,如果您只是想将Counter对象转换为dict对象(我相信根本没有必要,因为Counter实际上是一个字典.您可以访问键值,进行迭代,删除像普通词典对象一样更新Counter对象),则可以使用

Okay, if you simply want to convert the Counter object to a dict object (which I believe is not necessary at all since Counter is actually a dictionary. You can access key-values, iterate, delete update the Counter object just like a normal dictionary object), you can use bsoist's suggestion,

print dict(Counter(chain.from_iterable(x)))

这篇关于计算嵌套列表中的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆