如何将基于其他列值的列附加到 Pandas 数据框 [英] How to append columns based on other column values to pandas dataframe
问题描述
我有以下问题:我想将列附加到数据框.这些列是此数据框另一行中的唯一值,填充了该值在该行中的出现.它看起来像这样:
df:列 1 列 20 1 a,b,c1 2 a,e2 3 一3 4 c, f4 5 c,f
我想得到的是:
Column1 Column2 a b c e f0 1 a,b,c 1 1 11 2 a,e 1 12 3 13 4 c,f 1 14 5 c,f 1 1
(空格可以是 nan 或 0,这无关紧要.)
我现在已经编写了一些代码来实现这一点,但它不是附加列,而是附加行,因此我的输出如下所示:
Column1 Column20 1 a,b,c1 2 a,e2 3 一3 4 c, f4 5 c,f1 11 11 11 11 1
代码如下:
def NewCols(x):对于 i,df['Column2'].iteritems() 中的值:listi=value.split(',')对于列表中的值:字符串 = 值x[string]=list.count(string)返回 xdf1=df.apply(NewCols)
我在这里尝试做的是遍历数据帧的每一行并以逗号分割包含在 Column2 中的字符串 (a,b,c),因此变量 listi
然后是一个包含分隔字符串值的列表.对于这些值中的每一个,我想创建一个新列并用 listi
中该值的出现次数填充它.我很困惑为什么代码附加行而不是列.有人知道为什么以及如何纠正吗?
虽然我们可以使用 get_dummies
做到这一点,但我们也可以直接欺骗和使用 pd.value_counts
:
<小时>
一步一步,我们有
<预><代码>>>>df.Column2.str.split(",")0 [a, b, c]1 [a, e]2 [一]3 [c, f]4 [c, f]数据类型:对象>>>df.Column2.str.split(",").apply(pd.value_counts)a b c e f0 1 1 1 NaN NaN1 1 NaN NaN 1 NaN2 1 NaN NaN NaN NaN3 NaN NaN 1 NaN 14 NaN NaN 1 NaN 1>>>df.Column2.str.split(",").apply(pd.value_counts).fillna(0)a b c e f0 1 1 1 0 01 1 0 0 1 02 1 0 0 0 03 0 0 1 0 14 0 0 1 0 1>>>df.join(df.Column2.str.split(",").apply(pd.value_counts).fillna(0))列 1 列 2 a b c e f0 1 a,b,c 1 1 1 0 01 2 a,e 1 0 0 1 02 3 1 0 0 0 03 4 c,f 0 0 1 0 14 5 c,f 0 0 1 0 1I have the following problem: I want to append columns to a dataframe. These columns are the unique values in another row of this dataframe, filled with the occurence of this value in this row. It looks like this:
df:
Column1 Column2
0 1 a,b,c
1 2 a,e
2 3 a
3 4 c,f
4 5 c,f
What I am trying to get is:
Column1 Column2 a b c e f
0 1 a,b,c 1 1 1
1 2 a,e 1 1
2 3 a 1
3 4 c,f 1 1
4 5 c,f 1 1
(the empty spaces can be nan or 0, it matters not.)
I have now written some code to aceive this, but instead of appending columns, it appends rows, so that my output looks like this:
Column1 Column2
0 1 a,b,c
1 2 a,e
2 3 a
3 4 c,f
4 5 c,f
a 1 1
b 1 1
c 1 1
e 1 1
f 1 1
The code looks like this:
def NewCols(x):
for i, value in df['Column2'].iteritems():
listi=value.split(',')
for value in listi:
string = value
x[string]=list.count(string)
return x
df1=df.apply(NewCols)
What I am trying to do here is to iterate through each row of the dataframe and split the string (a,b,c) contained in Column2 at comma, so the variable listi
is then a list containing the separated string values. For each of this values I then want to make a new column and fill it with the number of occurences of that value in listi
. I am confused why the code appends rows instead of columns. Does somebody know why and how I can correct that?
While we could do this using get_dummies
, we can also cheat and use pd.value_counts
directly:
>>> df = pd.DataFrame({'Column1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Column2': {0: 'a,b,c', 1: 'a,e', 2: 'a', 3: 'c,f', 4: 'c,f'}})
>>> df.join(df.Column2.str.split(",").apply(pd.value_counts).fillna(0))
Column1 Column2 a b c e f
0 1 a,b,c 1 1 1 0 0
1 2 a,e 1 0 0 1 0
2 3 a 1 0 0 0 0
3 4 c,f 0 0 1 0 1
4 5 c,f 0 0 1 0 1
Step-by-step, we have
>>> df.Column2.str.split(",")
0 [a, b, c]
1 [a, e]
2 [a]
3 [c, f]
4 [c, f]
dtype: object
>>> df.Column2.str.split(",").apply(pd.value_counts)
a b c e f
0 1 1 1 NaN NaN
1 1 NaN NaN 1 NaN
2 1 NaN NaN NaN NaN
3 NaN NaN 1 NaN 1
4 NaN NaN 1 NaN 1
>>> df.Column2.str.split(",").apply(pd.value_counts).fillna(0)
a b c e f
0 1 1 1 0 0
1 1 0 0 1 0
2 1 0 0 0 0
3 0 0 1 0 1
4 0 0 1 0 1
>>> df.join(df.Column2.str.split(",").apply(pd.value_counts).fillna(0))
Column1 Column2 a b c e f
0 1 a,b,c 1 1 1 0 0
1 2 a,e 1 0 0 1 0
2 3 a 1 0 0 0 0
3 4 c,f 0 0 1 0 1
4 5 c,f 0 0 1 0 1
这篇关于如何将基于其他列值的列附加到 Pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!