在pandas DataFrame中更改每个组的第一个元素 [英] Change first element of each group in pandas DataFrame
问题描述
我想确保与每个vintage
相对应的val2
的第一个值是NaN
.当前有两个已经是NaN
,但是我想确保0.53
也更改为NaN
.
I want to ensure that the first value of val2
corresponding to each vintage
is NaN
. Currently two are already NaN
, but I want to ensure that 0.53
also changes to NaN
.
df = pd.DataFrame({
'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
})
这是我到目前为止尝试过的:
Here's what I've tried so far:
df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below
vintage
2017-01-01 0.66
2017-02-01 0.53
2017-03-01 NaN
df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2
0 NaN
1 0.66
2 0.81
3 0.53
4 0.62
5 NaN
推荐答案
您不能将其分配给聚合结果,first
也会忽略现有的NaN
,您可以执行的操作是调用head(1)
返回每个组的第一行,然后将索引传递给loc
以掩盖orig df以覆盖这些列值:
You can't assign to the result of an aggregation, also first
ignores existing NaN
, what you can do is call head(1)
which will return the first row for each group, and pass the indices to loc
to mask the orig df to overwrite those column values:
In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:
Out[91]:
date val1 val2 vintage
0 2017-01-01 0.59 NaN 2017-01-01
1 2017-02-01 0.68 0.66 2017-01-01
2 2017-03-01 0.80 0.81 2017-01-01
3 2017-02-01 0.54 NaN 2017-02-01
4 2017-03-01 0.61 0.62 2017-02-01
5 2017-03-01 0.60 NaN 2017-03-01
在这里您可以看到head(1)
返回每个组的第一行:
here you can see that head(1)
returns the first row for each group:
In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]:
0 NaN
3 0.53
5 NaN
Name: val2, dtype: float64
与first
对比,除非该组只有NaN
个值,否则它将返回第一个非NaN:
contrast with first
which will return the first non-NaN unless there is only NaN
values for that group:
In[95]:
df.groupby('vintage')['val2'].first()
Out[95]:
vintage
2017-01-01 0.66
2017-02-01 0.53
2017-03-01 NaN
Name: val2, dtype: float64
这篇关于在pandas DataFrame中更改每个组的第一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!