pandas :枢轴转换为真/假,删除列 [英] Pandas: Pivot to True/False, drop column
本文介绍了 pandas :枢轴转换为真/假,删除列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试创建我认为是简单的数据透视表的表,但是存在严重的问题.我无法做两件事:
I'm trying to create what I think is a simple pivot table but am having serious issues. There are two things I'm unable to do:
- 删除最后的合作伙伴"列.
- 如果每个公司都有该伙伴,则将值设置为True或False.
设置:
df = pd.DataFrame({'company':['a','b','c','b'], 'partner':['x','x','y','y'], 'str':['just','some','random','words']})
所需的输出:
company x y
a True False
b True True
c False True
我从开始:
df = df.pivot(values = 'partner', columns = 'partner', index = 'company').reset_index()
这使我很接近,但是当我尝试摆脱伙伴"列时,我什至无法引用它,它也不是索引".
which gets me close, but when I try to get rid of the "partner" column, I can't even reference it, and it's not the "index".
对于第二个问题,我可以使用:
For the second issue, I can use:
df.fillna(False, inplace = True)
df.loc[~(df['x'] == False), 'x'] = True
df.loc[~(df['y'] == False), 'y'] = True
但是这似乎令人难以置信.任何帮助将不胜感激.
but that seems incredibly hacky. Any help would be appreciated.
推荐答案
选项1
df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool)
partner x y
company
a True False
b True True
c False True
摆脱列对象的名称
df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool) \
.rename_axis(None, 1).reset_index()
company x y
0 a True False
1 b True True
2 c False True
选项2
pd.crosstab(df.company, df.partner).astype(bool)
partner x y
company
a True False
b True True
c False True
pd.crosstab(df.company, df.partner).astype(bool) \
.rename_axis(None, 1).reset_index()
company x y
0 a True False
1 b True True
2 c False True
选项3
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size
b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)
pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))
company x y
0 a True False
1 b True True
2 c False True
计时
小数据
Timing
small data
%timeit df.groupby(['company', 'partner']).size().unstack(fill_value=0).astype(bool).rename_axis(None, 1).reset_index()
%timeit pd.crosstab(df.company, df.partner).astype(bool).rename_axis(None, 1).reset_index()
%%timeit
f1, u1 = pd.factorize(df.company.values)
f2, u2 = pd.factorize(df.partner.values)
n, m = u1.size, u2.size
b = np.bincount(f1 * m + f2)
pad = np.zeros(n * m - b.size, dtype=int)
b = np.append(b, pad)
v = b.reshape(n, m).astype(bool)
pd.DataFrame(np.column_stack([u1, v]), columns=np.append('company', u2))
1000 loops, best of 3: 1.67 ms per loop
100 loops, best of 3: 5.97 ms per loop
1000 loops, best of 3: 301 µs per loop
这篇关于 pandas :枢轴转换为真/假,删除列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文