对 Pandas DataFrame 的矢量化更新? [英] Vectorized update to pandas DataFrame?
问题描述
我有一个数据框,我想用数组中的一些值更新一列.但是,该数组与数据帧的长度不同,但我有要更新的数据帧行的索引.
I have a dataframe for which I'd like to update a column with some values from an array. The array is of a different lengths to the dataframe however, but I have the indices for the rows of the dataframe that I'd like to update.
我可以通过行(如下)循环来做到这一点,但我希望有一种更有效的方法通过矢量化方法来做到这一点,但我似乎无法获得正确的语法.
I can do this with a loop through the rows (below) but I expect there is a much more efficient way to do this via a vectorized approach, but I can't seem to get the syntax correct.
在下面的示例中,我只是用 nan
填充列,然后直接通过循环使用索引.
In the example below I just fill the column with nan
and then use the indices directly through a loop.
df['newcol'] = np.nan
j = 0
for i in update_idx:
df['newcol'][i] = new_values[j]
j+=1
推荐答案
如果你已经有一个索引列表那么你可以使用 loc
来执行标签(行)选择,你可以通过新的列名称,如果您现有的行未被选中,这些将分配 NaN
:
if you have a list of indices already then you can use loc
to perform label (row) selection, you can pass the new column name, where your existing rows are not selected these will have NaN
assigned:
df.loc[update_idx, 'new_col'] = new_value
示例:
In [4]:
df = pd.DataFrame({'a':np.arange(5), 'b':np.random.randn(5)}, index = list('abcde'))
df
Out[4]:
a b
a 0 1.800300
b 1 0.351843
c 2 0.278122
d 3 1.387417
e 4 1.202503
In [5]:
idx_list = ['b','d','e']
df.loc[idx_list, 'c'] = np.arange(3)
df
Out[5]:
a b c
a 0 1.800300 NaN
b 1 0.351843 0
c 2 0.278122 NaN
d 3 1.387417 1
e 4 1.202503 2
这篇关于对 Pandas DataFrame 的矢量化更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!