如何基于其他列的值，以大 pandas 数据框列追加 [英] How to append columns based on other column values to pandas dataframe

查看：173 发布时间：2016/5/25 21:38:50 python pandas append dataframe

本文介绍了如何基于其他列的值，以大 pandas 数据框列追加的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下问题：我想追加列一个数据帧。这些列在这个数据帧的另一行的唯一值，充满了此行的这个值的发生。它看起来是这样的：

I have the following problem: I want to append columns to a dataframe. These columns are the unique values in another row of this dataframe, filled with the occurence of this value in this row. It looks like this:

df:

   Column1  Column2
0     1       a,b,c
1     2       a,e
2     3       a
3     4       c,f
4     5       c,f

我想获得的是：

    Column1  Column2  a  b  c  e  f
0     1       a,b,c   1  1  1
1     2       a,e     1        1
2     3       a       1
3     4       c,f           1     1
4     5       c,f           1     1

（空的空间可以是男或0，这并不重要。）

(the empty spaces can be nan or 0, it matters not.)

我现在已经写了一些code到aceive这一点，但不是附加列追加行，让自己看起来就像这样：

I have now written some code to aceive this, but instead of appending columns, it appends rows, so that my output looks like this:

        Column1  Column2
    0     1       a,b,c
    1     2       a,e
    2     3       a
    3     4       c,f
    4     5       c,f
    a     1        1
    b     1        1
    c     1        1
    e     1        1
    f     1        1

在code是这样的：

The code looks like this:

def NewCols(x):
    for i, value in df['Column2'].iteritems():
        listi=value.split(',')
        for value in listi:
            string = value
            x[string]=list.count(string)
    return x

df1=df.apply(NewCols)

我想在这里做的是通过数据帧中的每一行进行迭代，并分割字符串（A，B，C）以逗号包含在列2，所以变量 LISTI
然后含有分离的串值的列表。对于每一个这个值的话我想使一个新列，并与价值OCCURENCES在 LISTI 数填充它。我很困惑，为什么code追加行而不是列。是否有人知道为什么，我该如何纠正？

What I am trying to do here is to iterate through each row of the dataframe and split the string (a,b,c) contained in Column2 at comma, so the variable listi is then a list containing the separated string values. For each of this values I then want to make a new column and fill it with the number of occurences of that value in listi. I am confused why the code appends rows instead of columns. Does somebody know why and how I can correct that?

推荐答案

虽然我们可以做到这一点使用 get_dummies ，我们也可以欺骗，并使用 pd.value_counts 直接

While we could do this using get_dummies, we can also cheat and use pd.value_counts directly:

>>> df = pd.DataFrame({'Column1': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Column2': {0: 'a,b,c', 1: 'a,e', 2: 'a', 3: 'c,f', 4: 'c,f'}})
>>> df.join(df.Column2.str.split(",").apply(pd.value_counts).fillna(0))
   Column1 Column2  a  b  c  e  f
0        1   a,b,c  1  1  1  0  0
1        2     a,e  1  0  0  1  0
2        3       a  1  0  0  0  0
3        4     c,f  0  0  1  0  1
4        5     c,f  0  0  1  0  1

步骤一步，我们有

Step-by-step, we have

>>> df.Column2.str.split(",")
0    [a, b, c]
1       [a, e]
2          [a]
3       [c, f]
4       [c, f]
dtype: object
>>> df.Column2.str.split(",").apply(pd.value_counts)
    a   b   c   e   f
0   1   1   1 NaN NaN
1   1 NaN NaN   1 NaN
2   1 NaN NaN NaN NaN
3 NaN NaN   1 NaN   1
4 NaN NaN   1 NaN   1
>>> df.Column2.str.split(",").apply(pd.value_counts).fillna(0)
   a  b  c  e  f
0  1  1  1  0  0
1  1  0  0  1  0
2  1  0  0  0  0
3  0  0  1  0  1
4  0  0  1  0  1
>>> df.join(df.Column2.str.split(",").apply(pd.value_counts).fillna(0))
   Column1 Column2  a  b  c  e  f
0        1   a,b,c  1  1  1  0  0
1        2     a,e  1  0  0  1  0
2        3       a  1  0  0  0  0
3        4     c,f  0  0  1  0  1
4        5     c,f  0  0  1  0  1

这篇关于如何基于其他列的值，以大 pandas 数据框列追加的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何基于其他列的值，以大 pandas 数据框列追加 [英] How to append columns based on other column values to pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何基于其他列的值，以大 pandas 数据框列追加 [英] How to append columns based on other column values to pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭