通过跨多个列进行过滤，从唯一值对创建数据帧 [英] Create dataframes from unique value pairs by filtering across multiple columns

查看：975 发布时间：2017/3/26 4:23:18 python pandas dataframe

本文介绍了通过跨多个列进行过滤，从唯一值对创建数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想过滤跨多个列的值，为独特的值组合创建数据框。任何帮助将不胜感激。

这是我的代码失败（给数据框df）：

  dd = defaultdict（dict）#create空白默认字典
 values_col1 = df.col1.unique（）#get value_col1中i的df 
的第1列的唯一值：
 dd [i] = df [（df ['col1'] == i）]＃为每个唯一值创建一个排序的df并放入一个字典
 values_col2 = dd [i]。 col2.unique（）#get values_col2中df 
的列2的唯一值：
 dd [i] [m] = dd [i] [（dd [i] ['col2'] == m）] #for每个唯一的列2创建一个子字典

当我运行它，我得到一个很长的错误信息。我不会在这里插入所有的东西，但这里有一些：

C：\Anaconda3\lib\site在get_loc（self，
key，方法，容差）中的-packages\pandas\indexes\base.py 1944 try：
- > 1945返回self._engine.get_loc（key）1946除了KeyError ：

...

ValueError：通过的物件数量错误6，展示位置暗示1

解决方案

使用大熊猫 groupby 功能来提取唯一索引和您的数据框的相应行。

 导入熊猫作为pd 
从集合import defaultdict 
 
 df = pd.DataFrame（{'col1'：['A'] * 4 + ['B'] * 4，
'col2'：[0,1] * 4，
 'col3'：np.arange（8），
'col4'：np.arange（10，18）}）
 
 dd = defaultdict dict）
 grouping = df.groupby（['col1'，'col2']）
（c1，c2），g分组：
 dd [c1] [c2] = g

这是生成的 df p>

  col1 col2 col3 col4 
 0 A 0 0 10 
 1 A 1 1 11 
 2 A 0 2 12 
 3 A 1 3 13 
 4 B 0 4 14 
 5 B 1 5 15 
 6 B 0 6 16 
 7 B 1 7 17

这是提取的 dd （好吧， code> dict（dd）真的）

  {'B'：{0： col1 col2 col3 col4 
 4 B 0 4 14 
 6 B 0 6 16，
 1：col1 col2 col3 col4 
 5 B 1 5 15 
 7 B 1 7 17}，
'A'：{0：col1 col2 col3 col4 
 0 A 0 0 10 
 2 A 0 2 12，
 1 ：col1 col2 col3 col4 
 1 A 1 1 11 
 3 A 1 3 13}}

（我不知道你的用例是什么，但你可能不会将 groupby 对象解析为字典） p>

I want to filter values across multiple columns creating dataframes for the unique value combinations. Any help would be appreciated.

Here is my code that is failing (given dataframe df):

dd = defaultdict(dict)  #create blank default dictionary
values_col1 = df.col1.unique()   #get the unique values from column 1 of df
for i in values_col1:
    dd[i] = df[(df['col1']==i)]    #for each unique value create a sorted df and put in in a dictionary
    values_col2 = dd[i].col2.unique() #get the unique values from column2 of df
    for m in values_col2:  
        dd[i][m] = dd[i][(dd[i]['col2']==m)]  #for each unique column2 create a sub dictionary

When I run it I get a very long error message. I won't insert the whole thing here, but here is some of it:

C:\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance) 1944 try: -> 1945 return self._engine.get_loc(key) 1946 except KeyError:

...

ValueError: Wrong number of items passed 6, placement implies 1

解决方案

Use pandas groupby functionality to extract the unique indices and the corresponding rows of your dataframe.

import pandas as pd
from collections import defaultdict

df = pd.DataFrame({'col1': ['A']*4 + ['B']*4,
                   'col2': [0,1]*4,
                   'col3': np.arange(8),
                   'col4': np.arange(10, 18)})

dd = defaultdict(dict)
grouped = df.groupby(['col1', 'col2'])
for (c1, c2), g in grouped:
    dd[c1][c2] = g

This is the generated df:

  col1  col2  col3  col4
0    A     0     0    10
1    A     1     1    11
2    A     0     2    12
3    A     1     3    13
4    B     0     4    14
5    B     1     5    15
6    B     0     6    16
7    B     1     7    17

And this is the extracted dd (well, dict(dd) really)

{'B': {0:   col1  col2  col3  col4
          4    B     0     4    14
          6    B     0     6    16,
       1:   col1  col2  col3  col4
          5    B     1     5    15
          7    B     1     7    17},
 'A': {0:   col1  col2  col3  col4
          0    A     0     0    10
          2    A     0     2    12,
       1:   col1  col2  col3  col4
          1    A     1     1    11
          3    A     1     3    13}}

(I don't know what your use case for this is, but you may be better off not parsing the groupby object to a dictionary anyway).

这篇关于通过跨多个列进行过滤，从唯一值对创建数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过跨多个列进行过滤，从唯一值对创建数据帧 [英] Create dataframes from unique value pairs by filtering across multiple columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

通过跨多个列进行过滤，从唯一值对创建数据帧 [英] Create dataframes from unique value pairs by filtering across multiple columns

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭