将列表元素映射到字典中的键以获取python中的十进制值 [英] Map list elements to keys in dictionary for decimal values in python
问题描述
我的单词列表如下.
mylist = ['cat', 'yellow', 'car', 'red', 'green', 'jeep', 'rat','lorry']
我还为数据集中的每篇文章提供了一个列表列表,其中包含以下示例中给出的"mylist"值(即,如果"mylist"一词出现在论文中,则会产生0-1之间的值)
I also have a list of lists for each essay in the dataset that contain values for the 'mylist' as given in the examples below (i.e, if 'mylist' word appears in essay it produces a value between 0-1).
[[0,0.7,0,0,0,0.3,0,0.6], [0.2,0,0,0,0,0,0.8,0]]
换句话说,
[0,0.7,0,0,0,0.3,0,0.6] says that this only has values 'yellow', 'jeep', 'lorry'
现在我有以下类别的字典.
Now I have a dictionary of categories as below.
mydictionary = {'colour': ['red', 'yellow', 'green'], 'animal': ['rat','cat'],
'vehicle': ['car', 'jeep']}
现在,通过使用"mydictionary"键值,我希望按以下方式转换列表列表(即,如果"mylist"的一个或多个值是1,则将键标记为平均值得分值).
Now by using 'mydictionary' key values I want to transform the list of lists as follows (That is, if one or more values of the 'mylist' is 1, I mark the key as the average values of the scores).
[[0.7, 0, 0.45], [0, 0.5, 0]]
换句话说,
[0.7, 0, 0.45] says that;
0.7 - average value for elements in 'colours' = 0.7/1 = 0.7
0 - no elements in 'animals'
0.45 - average value for elements in 'vehicles' = (0.3+0.6)/2 = 0.45
所以我的输出应该是如上所述的列表列表-> [[[0.7,0,0.45],[0,0.5,0]]
So my output should be a list of lists as mentioned above -> [[0.7, 0, 0.45], [0, 0.5, 0]]
我想知道使用熊猫数据框是否可以做到这一点.
I am interested in knowing if this is possible to do using pandas dataframes.
推荐答案
您应该真正重新考虑您的数据结构.您将面临的一个问题是dict
本质上是无序的.因此,首先,通过将值放入有序容器中来实现顺序(list
可以正常工作):
You should really reconsider your data-structures. One problem you will face is that dict
's are inherently unordered. So first, enfore the order by putting the values in an ordered container (a list
works fine):
>>> vals = [mydictionary['colour'], mydictionary['animal'], mydictionary['vehicle']]
现在是论文:
>>> essays = [[0,0.7,0,0,0,0.3,0,0.6], [0.2,0,0,0,0,0,0.8,0]]
然后是一个简单的循环,从mylist
到每个文章权重构建一个映射,并使用statistics
包来实现mean
功能:
Then, a simple loop, building a map from mylist
to each essay weights, and using the statistics
package for a mean
function:
>>> import statistics as stats
>>> result = []
>>> for essay in essays:
... map = dict(zip(mylist, essay))
... result.append([stats.mean(map[e] for e in v) for v in vals])
...
>>> result
[[0.2333333333333333, 0, 0.15], [0, 0.5, 0]]
老实说,不确定pandas
是否是最好的工具,但我想您可以像这样使用DataFrame
:
Honestly, Not sure if pandas
is the best tool for this, but I suppose you could use a DataFrame
like this:
>>> df = pd.DataFrame({'essay{}'.format(i):essay for i, essay in enumerate(essays)}, index=mylist)
>>> df
essay0 essay1
cat 0.0 0.2
yellow 0.7 0.0
car 0.0 0.0
red 0.0 0.0
green 0.0 0.0
jeep 0.3 0.0
rat 0.0 0.8
lorry 0.6 0.0
然后,进行石斑鱼制图:
Then, make a grouper-mapping:
>>> grouper = {v: k for k, vv in mydictionary.items() for v in vv}
然后使用pd.DataFrame.groupby
:
>>> df.groupby(grouper).mean()
essay0 essay1
animal 0.000000 0.5
colour 0.233333 0.0
vehicle 0.150000 0.0
修改
注释之后,修复非常简单,您只需将权重具体化为一个列表,像这样过滤0即可:[map[e] for e in v if map[e]]
,然后采用该列表的mean
.但是,您必须注意列表不为空.只需定义一个辅助函数即可检查或返回默认值0:
Edit
After the comment, the fix is quite simple, you just materialize the weights into a list, filter for 0 like so: [map[e] for e in v if map[e]]
, and then take the mean
of that list. However, you have to take care that the list is not empty. Simply define a helper function that checks or returns a default of 0:
>>> def mean_default(seq):
... if seq:
... return stats.mean(seq)
... else:
... return 0
...
然后简单地:
>>> result = []
>>> for essay in essays:
... map = dict(zip(mylist, essay))
... result.append([mean_default([map[e] for e in v if map[e]]) for in vals])
对于pandas
,如@IanS所示,只需将0
替换为np.nan
.
For pandas
, as @IanS showed, simply replace 0
with np.nan
.
这篇关于将列表元素映射到字典中的键以获取python中的十进制值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!