通过跨多个列进行过滤,从唯一值对创建数据帧 [英] Create dataframes from unique value pairs by filtering across multiple columns
问题描述
这是我的代码失败(给数据框df):
dd = defaultdict(dict)#create空白默认字典
values_col1 = df.col1.unique()#get value_col1中i的df
的第1列的唯一值:
dd [i] = df [(df ['col1'] == i)]#为每个唯一值创建一个排序的df并放入一个字典
values_col2 = dd [i]。 col2.unique()#get values_col2中df
的列2的唯一值:
dd [i] [m] = dd [i] [(dd [i] ['col2'] == m)] #for每个唯一的列2创建一个子字典
当我运行它,我得到一个很长的错误信息。我不会在这里插入所有的东西,但这里有一些:
C:\Anaconda3\lib\site在get_loc(self,
key,方法,容差)中的-packages\pandas\indexes\base.py 1944 try:
- > 1945返回self._engine.get_loc(key)1946除了KeyError :
...
ValueError:通过的物件数量错误6,展示位置暗示1
使用大熊猫 groupby
功能来提取唯一索引和您的数据框的相应行。
导入熊猫作为pd
从集合import defaultdict
df = pd.DataFrame({'col1':['A'] * 4 + ['B'] * 4,
'col2':[0,1] * 4,
'col3':np.arange(8),
'col4':np.arange(10,18)})
dd = defaultdict dict)
grouping = df.groupby(['col1','col2'])
(c1,c2),g分组:
dd [c1] [c2] = g
这是生成的 df
p>
col1 col2 col3 col4
0 A 0 0 10
1 A 1 1 11
2 A 0 2 12
3 A 1 3 13
4 B 0 4 14
5 B 1 5 15
6 B 0 6 16
7 B 1 7 17
这是提取的 dd
(好吧, code> dict(dd)真的)
{'B':{0: col1 col2 col3 col4
4 B 0 4 14
6 B 0 6 16,
1:col1 col2 col3 col4
5 B 1 5 15
7 B 1 7 17},
'A':{0:col1 col2 col3 col4
0 A 0 0 10
2 A 0 2 12,
1 :col1 col2 col3 col4
1 A 1 1 11
3 A 1 3 13}}
(我不知道你的用例是什么,但你可能不会将 groupby
对象解析为字典) p>
I want to filter values across multiple columns creating dataframes for the unique value combinations. Any help would be appreciated.
Here is my code that is failing (given dataframe df):
dd = defaultdict(dict) #create blank default dictionary
values_col1 = df.col1.unique() #get the unique values from column 1 of df
for i in values_col1:
dd[i] = df[(df['col1']==i)] #for each unique value create a sorted df and put in in a dictionary
values_col2 = dd[i].col2.unique() #get the unique values from column2 of df
for m in values_col2:
dd[i][m] = dd[i][(dd[i]['col2']==m)] #for each unique column2 create a sub dictionary
When I run it I get a very long error message. I won't insert the whole thing here, but here is some of it:
C:\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance) 1944 try: -> 1945 return self._engine.get_loc(key) 1946 except KeyError:
...
ValueError: Wrong number of items passed 6, placement implies 1
Use pandas groupby
functionality to extract the unique indices and the corresponding rows of your dataframe.
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'col1': ['A']*4 + ['B']*4,
'col2': [0,1]*4,
'col3': np.arange(8),
'col4': np.arange(10, 18)})
dd = defaultdict(dict)
grouped = df.groupby(['col1', 'col2'])
for (c1, c2), g in grouped:
dd[c1][c2] = g
This is the generated df
:
col1 col2 col3 col4
0 A 0 0 10
1 A 1 1 11
2 A 0 2 12
3 A 1 3 13
4 B 0 4 14
5 B 1 5 15
6 B 0 6 16
7 B 1 7 17
And this is the extracted dd
(well, dict(dd)
really)
{'B': {0: col1 col2 col3 col4
4 B 0 4 14
6 B 0 6 16,
1: col1 col2 col3 col4
5 B 1 5 15
7 B 1 7 17},
'A': {0: col1 col2 col3 col4
0 A 0 0 10
2 A 0 2 12,
1: col1 col2 col3 col4
1 A 1 1 11
3 A 1 3 13}}
(I don't know what your use case for this is, but you may be better off not parsing the groupby
object to a dictionary anyway).
这篇关于通过跨多个列进行过滤,从唯一值对创建数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!