将列表元素映射到字典中的键以获取python中的十进制值 [英] Map list elements to keys in dictionary for decimal values in python

查看:111
本文介绍了将列表元素映射到字典中的键以获取python中的十进制值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的单词列表如下.

mylist = ['cat', 'yellow', 'car', 'red', 'green', 'jeep', 'rat','lorry']

我还为数据集中的每篇文章提供了一个列表列表,其中包含以下示例中给出的"mylist"值(即,如果"mylist"一词出现在论文中,则会产生0-1之间的值)

I also have a list of lists for each essay in the dataset that contain values for the 'mylist' as given in the examples below (i.e, if 'mylist' word appears in essay it produces a value between 0-1).

[[0,0.7,0,0,0,0.3,0,0.6], [0.2,0,0,0,0,0,0.8,0]]

换句话说,

[0,0.7,0,0,0,0.3,0,0.6] says that this only has values 'yellow', 'jeep', 'lorry'

现在我有以下类别的字典.

Now I have a dictionary of categories as below.

mydictionary = {'colour': ['red', 'yellow', 'green'], 'animal': ['rat','cat'], 
'vehicle': ['car', 'jeep']}

现在,通过使用"mydictionary"键值,我希望按以下方式转换列表列表(即,如果"mylist"的一个或多个值是1,则将键标记为平均值得分值).

Now by using 'mydictionary' key values I want to transform the list of lists as follows (That is, if one or more values of the 'mylist' is 1, I mark the key as the average values of the scores).

[[0.7, 0, 0.45], [0, 0.5, 0]]

换句话说,

[0.7, 0, 0.45] says that;
0.7 - average value for elements in 'colours' = 0.7/1 = 0.7
0 - no elements in 'animals'
0.45 - average value for elements in 'vehicles' = (0.3+0.6)/2 = 0.45

所以我的输出应该是如上所述的列表列表-> [[[0.7,0,0.45],[0,0.5,0]]

So my output should be a list of lists as mentioned above -> [[0.7, 0, 0.45], [0, 0.5, 0]]

我想知道使用熊猫数据框是否可以做到这一点.

I am interested in knowing if this is possible to do using pandas dataframes.

推荐答案

您应该真正重新考虑您的数据结构.您将面临的一个问题是dict本质上是无序的.因此,首先,通过将值放入有序容器中来实现顺序(list可以正常工作):

You should really reconsider your data-structures. One problem you will face is that dict's are inherently unordered. So first, enfore the order by putting the values in an ordered container (a list works fine):

>>> vals = [mydictionary['colour'], mydictionary['animal'], mydictionary['vehicle']]

现在是论文:

>>> essays = [[0,0.7,0,0,0,0.3,0,0.6], [0.2,0,0,0,0,0,0.8,0]]

然后是一个简单的循环,从mylist到每个文章权重构建一个映射,并使用statistics包来实现mean功能:

Then, a simple loop, building a map from mylist to each essay weights, and using the statistics package for a mean function:

>>> import statistics as stats
>>> result = []
>>> for essay in essays:
...     map = dict(zip(mylist, essay))
...     result.append([stats.mean(map[e] for e in v) for v in vals])
...
>>> result
[[0.2333333333333333, 0, 0.15], [0, 0.5, 0]]

老实说,不确定pandas是否是最好的工具,但我想您可以像这样使用DataFrame:

Honestly, Not sure if pandas is the best tool for this, but I suppose you could use a DataFrame like this:

>>> df = pd.DataFrame({'essay{}'.format(i):essay for i, essay in enumerate(essays)}, index=mylist)
>>> df
        essay0  essay1
cat        0.0     0.2
yellow     0.7     0.0
car        0.0     0.0
red        0.0     0.0
green      0.0     0.0
jeep       0.3     0.0
rat        0.0     0.8
lorry      0.6     0.0

然后,进行石斑鱼制图:

Then, make a grouper-mapping:

>>> grouper  = {v: k for k, vv in mydictionary.items() for v in vv}

然后使用pd.DataFrame.groupby:

>>> df.groupby(grouper).mean()
           essay0  essay1
animal   0.000000     0.5
colour   0.233333     0.0
vehicle  0.150000     0.0

修改

注释之后,修复非常简单,您只需将权重具体化为一个列表,像这样过滤0即可:[map[e] for e in v if map[e]],然后采用该列表的mean.但是,您必须注意列表不为空.只需定义一个辅助函数即可检查或返回默认值0:

Edit

After the comment, the fix is quite simple, you just materialize the weights into a list, filter for 0 like so: [map[e] for e in v if map[e]], and then take the mean of that list. However, you have to take care that the list is not empty. Simply define a helper function that checks or returns a default of 0:

>>> def mean_default(seq):
...     if seq:
...         return stats.mean(seq)
...     else:
...         return 0
...

然后简单地:

>>> result = []
>>> for essay in essays:
...     map = dict(zip(mylist, essay))
...     result.append([mean_default([map[e] for e in v if map[e]]) for  in vals])

对于pandas,如@IanS所示,只需将0替换为np.nan.

For pandas, as @IanS showed, simply replace 0 with np.nan.

这篇关于将列表元素映射到字典中的键以获取python中的十进制值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆