在pandas DataFrame中更改每个组的第一个元素 [英] Change first element of each group in pandas DataFrame

查看:94
本文介绍了在pandas DataFrame中更改每个组的第一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想确保与每个vintage相对应的val2的第一个值是NaN.当前有两个已经是NaN,但是我想确保0.53也更改为NaN.

I want to ensure that the first value of val2 corresponding to each vintage is NaN. Currently two are already NaN, but I want to ensure that 0.53 also changes to NaN.

df = pd.DataFrame({
        'vintage': ['2017-01-01', '2017-01-01', '2017-01-01', '2017-02-01', '2017-02-01', '2017-03-01'],
        'date': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-03-01'],
        'val1': [0.59, 0.68, 0.8, 0.54, 0.61, 0.6],
        'val2': [np.nan, 0.66, 0.81, 0.53, 0.62, np.nan]
    })

这是我到目前为止尝试过的:

Here's what I've tried so far:

df.groupby('vintage').first().val2 #This gives the first non-NaN values, as shown below

vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN

df.groupby('vintage').first().val2 = np.nan #This doesn't change anything
df.val2

0     NaN
1    0.66
2    0.81
3    0.53
4    0.62
5     NaN

推荐答案

您不能将其分配给聚合结果,first也会忽略现有的NaN,您可以执行的操作是调用head(1)返回每个组的第一行,然后将索引传递给loc以掩盖orig df以覆盖这些列值:

You can't assign to the result of an aggregation, also first ignores existing NaN, what you can do is call head(1) which will return the first row for each group, and pass the indices to loc to mask the orig df to overwrite those column values:

In[91]
df.loc[df.groupby('vintage')['val2'].head(1).index, 'val2'] = np.NaN
df:

Out[91]: 
         date  val1  val2     vintage
0  2017-01-01  0.59   NaN  2017-01-01
1  2017-02-01  0.68  0.66  2017-01-01
2  2017-03-01  0.80  0.81  2017-01-01
3  2017-02-01  0.54   NaN  2017-02-01
4  2017-03-01  0.61  0.62  2017-02-01
5  2017-03-01  0.60   NaN  2017-03-01

在这里您可以看到head(1)返回每个组的第一行:

here you can see that head(1) returns the first row for each group:

In[94]:
df.groupby('vintage')['val2'].head(1)
Out[94]: 
0     NaN
3    0.53
5     NaN
Name: val2, dtype: float64

first对比,除非该组只有NaN个值,否则它将返回第一个非NaN:

contrast with first which will return the first non-NaN unless there is only NaN values for that group:

In[95]:
df.groupby('vintage')['val2'].first()

Out[95]: 
vintage
2017-01-01    0.66
2017-02-01    0.53
2017-03-01     NaN
Name: val2, dtype: float64

这篇关于在pandas DataFrame中更改每个组的第一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆