Python Pandas:如何拆分数据帧列中排序的字典 [英] Python Pandas: How to split a sorted dictionary in a column of a dataframe
问题描述
我有一个这样的dataFrame:
I have a dataFrame like this:
id asn orgs
0 3320 {'Deutsche Telekom AG': 2288}
1 47886 {'Joyent': 16, 'Equinix (Netherlands) B.V.': 7}
2 47601 {'fusion services': 1024, 'GCE Global Maritime':16859}
3 33438 {'Highwinds Network Group': 893}
我想排序组织列,其实际上是字典,然后提取得到两个不同列中具有最高值的对(k,v)。像这样:
I would like to sort the 'orgs' column which is actually a dictionary and then extract get the pair(k,v) with the highest values in two different columns. Like this:
id asn org value
0 3320 'Deutsche Telekom AG' 2288
1 47886 'Joyent' 16
2 47601 'GCE Global Maritime' 16859
3 33438 'Highwinds Network Group' 893
目前我正在运行这段代码,但没有正确排序,然后我不知道如何提取最高价值的对。
Currently I am running this code but it does not properly sort, and then I am not sure how to extract the pair with highest value.
df.orgs.apply(lambda x : sorted(x.items(),key=operator.itemgetter(1),reverse=True))
这给了我一个这样的列表:
which gave me a list like this:
id asn orgs
0 3320 [('Deutsche Telekom AG', 2288)]
1 47886 [('Joyent', 16),( 'Equinix (Netherlands) B.V.', 7)]
2 47601 [('GCE Global Maritime',16859),('fusion services', 1024)]
3 33438 [('Highwinds Network Group', 893)]
现在我该怎么办关键和最高的价值分成两个单独的列?任何人都可以帮助?
Now how can I put the key and the value of the highest into two seperate columns? Can anybody help?
推荐答案
另一种方法定义一个刚刚调用 min
在dict上并返回一个系列,所以你可以分配到多个列(从 @ Alex Martelli的答案):
Another approach define a function that just calls min
on the dict and return a Series so you can assign to multiple columns (function body taken from @Alex Martelli's answer):
In [17]:
def func(x):
k = min(x, key=x.get)
return pd.Series([k, x[k]])
df[['orgs', 'value']] = df['orgs'].apply(func)
df
Out[17]:
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Equinix (Netherlands) B.V. 7
2 47601 2 fusion services 1024
3 33438 3 Highwinds Network Group 893
编辑
如果您的数据有空的dicss,那么你可以测试 len
:
If your data has empty dicss, then you can just test the len
:
In [34]:
df = pd.DataFrame({'id':[0,1,2,3,4],
'asn':[3320,47886,47601,33438,56],
'orgs':[{'Deutsche Telekom AG': 2288},
{'Joyent': 16, 'Equinix (Netherlands) B.V.': 7},
{'fusion services': 1024, 'GCE Global Maritime':16859},
{'Highwinds Network Group': 893},{}]})
df
Out[34]:
asn id orgs
0 3320 0 {'Deutsche Telekom AG': 2288}
1 47886 1 {'Equinix (Netherlands) B.V.': 7, 'Joyent': 16}
2 47601 2 {'GCE Global Maritime': 16859, 'fusion service...
3 33438 3 {'Highwinds Network Group': 893}
4 56 4 {}
In [36]:
def func(x):
if len(x) > 0:
k = min(x, key=x.get)
return pd.Series([k, x[k]])
return pd.Series([np.NaN, np.NaN])
df[['orgs', 'value']] = df['orgs'].apply(func)
df
Out[36]:
asn id orgs value
0 3320 0 Deutsche Telekom AG 2288
1 47886 1 Equinix (Netherlands) B.V. 7
2 47601 2 fusion services 1024
3 33438 3 Highwinds Network Group 893
4 56 4 NaN NaN
这篇关于Python Pandas:如何拆分数据帧列中排序的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!