从0/1数据框到项目集列表的python pandas [英] python pandas from 0/1 dataframe to an itemset list

查看:56
本文介绍了从0/1数据框到项目集列表的python pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从这种形式的0/1 pandas/numpy数据框中获取数据的最有效方法是:

What is the most efficient way to go from a 0/1 pandas/numpy dataframe of this form::

>>> dd
{'a': {0: 1, 1: 0, 2: 1, 3: 0, 4: 1, 5: 1},
 'b': {0: 1, 1: 1, 2: 0, 3: 0, 4: 1, 5: 1},
 'c': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 1},
 'd': {0: 0, 1: 1, 2: 1, 3: 1, 4: 0, 5: 1},
 'e': {0: 0, 1: 0, 2: 1, 3: 0, 4: 0, 5: 0}}
>>> df = pd.DataFrame(dd)
>>> df 
   a  b  c  d  e
0  1  1  0  0  0
1  0  1  1  1  0
2  1  0  1  1  1
3  0  0  0  1  0
4  1  1  1  0  0
5  1  1  1  1  0
>>>

到列表的项目集列表?::

To an itemset list of list ?::

itemset = [['a', 'b'],
           ['b', 'c', 'd'],
           ['a', 'c', 'd', 'e'],
           ['d'],
           ['a', 'b', 'c'],
           ['a', 'b', 'c', 'd']]

df.shape〜(1e6, 500)

df.shape ~ (1e6, 500)

推荐答案

您可以先通过

You can first multiple by columns names by mul and convert DataFrame to numpy array by values:

print (df.mul(df.columns.to_series()).values)
[['a' 'b' '' '' '']
 ['' 'b' 'c' 'd' '']
 ['a' '' 'c' 'd' 'e']
 ['' '' '' 'd' '']
 ['a' 'b' 'c' '' '']
 ['a' 'b' 'c' 'd' '']]

通过嵌套列表理解删除空字符串:

Remove empty string by nested list comprehension:

print ([[y for y in x if y != ''] for x in df.mul(df.columns.to_series()).values])
[['a', 'b'], 
 ['b', 'c', 'd'],
 ['a', 'c', 'd', 'e'], 
 ['d'], 
 ['a', 'b', 'c'], 
 ['a', 'b', 'c', 'd']]

这篇关于从0/1数据框到项目集列表的python pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆