pandas :用一些numpy数组填充一列 [英] pandas: fill a column with some numpy arrays

查看:124
本文介绍了 pandas :用一些numpy数组填充一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python2.7和pandas 0.11.0.

I am using python2.7 and pandas 0.11.0.

我尝试使用DataFrame.apply(func)填充数据框的一列. func()函数应该返回一个numpy数组(1x3).

I try to fill a column of a dataframe using DataFrame.apply(func). The func() function is supposed to return a numpy array (1x3).

import pandas as pd
import numpy as np

df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
print(df)

              A         B         C
    0  0.910142  0.788300  0.114164
    1 -0.603282 -0.625895  2.843130
    2  1.823752 -0.091736 -0.107781
    3  0.447743 -0.163605  0.514052

用于测试目的的功能:

def test(row):
   # some complex calc here 
   # based on the values from different columns 
   return np.array((1,2,3))

df['D'] = df.apply(test, axis=1)

[...]
ValueError: Wrong number of items passed 1, indices imply 3

有趣的是,当我从头开始创建数据框时,它工作得很好,并且可以按预期返回:

The funny is that when I create the dataframe from scratch, it works pretty well, and returns as expected:

dic = {'A': {0: 0.9, 1: -0.6, 2: 1.8, 3: 0.4}, 
     'C': {0: 0.1, 1: 2.8, 2: -0.1, 3: 0.5}, 
     'B': {0: 0.7, 1: -0.6, 2: -0.1, 3: -0.1},
     'D': {0:np.array((1,2,3)), 
          1:np.array((1,2,3)), 
          2:np.array((1,2,3)), 
          3:np.array((1,2,3))}}

df= pd.DataFrame(dic)
print(df)
         A    B    C          D
    0  0.9  0.7  0.1  [1, 2, 3]
    1 -0.6 -0.6  2.8  [1, 2, 3]
    2  1.8 -0.1 -0.1  [1, 2, 3]
    3  0.4 -0.1  0.5  [1, 2, 3]

预先感谢

推荐答案

如果您尝试从传递给apply的函数中返回多个值,并且调用了apply的DataFrame具有相同数量的沿着轴上的项目(在本例中为列)作为您返回的值的数量,Pandas将根据返回值创建一个与原始DataFrame相同标签的DataFrame.如果您只是这样做,就可以看到以下内容:

If you try to return multiple values from the function that is passed to apply, and the DataFrame you call the apply on has the same number of item along the axis (in this case columns) as the number of values you returned, Pandas will create a DataFrame from the return values with the same labels as the original DataFrame. You can see this if you just do:

>>> def test(row):
        return [1, 2, 3]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3

这就是为什么出现错误的原因,因为您无法将DataFrame分配给DataFrame列.

And that is why you get the error, since you cannot assign a DataFrame to DataFrame column.

如果返回任何其他数量的值,它将仅返回可以分配的一系列对象:

If you return any other number of values, it will return just a series object, that can be assigned:

>>> def test(row):
       return [1, 2]
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df.apply(test, axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
>>> df['D'] = df.apply(test, axis=1)
>>> df
          A         B         C       D
0  0.333535  0.209745 -0.972413  [1, 2]
1  0.469590  0.107491 -1.248670  [1, 2]
2  0.234444  0.093290 -0.853348  [1, 2]
3  1.021356  0.092704 -0.406727  [1, 2]

我不确定为什么熊猫会这样做,以及为什么仅在返回值是listndarray时才这样做,因为如果返回tuple,它将不会这样做:

I'm not sure why Pandas does this, and why it does it only when the return value is a list or an ndarray, since it won't do it if you return a tuple:

>>> def test(row):
        return (1, 2, 3)
>>> df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))
>>> df['D'] = df.apply(test, axis=1)
>>> df
          A         B         C          D
0  0.121136  0.541198 -0.281972  (1, 2, 3)
1  0.569091  0.944344  0.861057  (1, 2, 3)
2 -1.742484 -0.077317  0.181656  (1, 2, 3)
3 -1.541244  0.174428  0.660123  (1, 2, 3)

这篇关于 pandas :用一些numpy数组填充一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆