对 Pandas DataFrame 的矢量化更新? [英] Vectorized update to pandas DataFrame?

查看:30
本文介绍了对 Pandas DataFrame 的矢量化更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想用数组中的一些值更新一列.但是,该数组与数据帧的长度不同,但我有要更新的数据帧行的索引.

I have a dataframe for which I'd like to update a column with some values from an array. The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update.

我可以通过行(如下)循环来做到这一点,但我希望有一种更有效的方法通过矢量化方法来做到这一点,但我似乎无法获得正确的语法.

I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct.

在下面的示例中,我只是用 nan 填充列,然后直接通过循环使用索引.

In the example below I just fill the column with nan and then use the indices directly through a loop.

df['newcol'] = np.nan

j = 0
for i in update_idx:
    df['newcol'][i] = new_values[j]
    j+=1

推荐答案

如果你已经有一个索引列表那么你可以使用 loc 来执行标签(行)选择,你可以通过新的列名称,如果您现有的行未被选中,这些将分配 NaN:

if you have a list of indices already then you can use loc to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN assigned:

df.loc[update_idx, 'new_col'] = new_value

示例:

In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df

Out[4]:
   a         b
a  0  1.800300
b  1  0.351843
c  2  0.278122
d  3  1.387417
e  4  1.202503

In [5]:    
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df

Out[5]:
   a         b   c
a  0  1.800300 NaN
b  1  0.351843   0
c  2  0.278122 NaN
d  3  1.387417   1
e  4  1.202503   2

这篇关于对 Pandas DataFrame 的矢量化更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆