在Python中的Stack操作期间保留少量NA并删除其余NA [英] Retain few NA's and drop rest of NA's during Stack operation in Python

查看:125
本文介绍了在Python中的Stack操作期间保留少量NA并删除其余NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a dataframe like shown below

df2 = pd.DataFrame({'person_id':[1],'H1_date' : ['2006-10-30 00:00:00'], 'H1':[2.3],'H2_date' : ['2016-10-30 00:00:00'], 'H2':[12.3],'H3_date' : ['2026-11-30 00:00:00'], 'H3':[22.3],'H4_date' : ['2106-10-30 00:00:00'], 'H4':[42.3],'H5_date' : [np.nan], 'H5':[np.nan],'H6_date' : ['2006-10-30 00:00:00'], 'H6':[2.3],'H7_date' : [np.nan], 'H7':[2.3],'H8_date' : ['2006-10-30 00:00:00'], 'H8':[np.nan]})

如上面的屏幕快照所示,我的源datframe(df2)包含少量NA's

As shown in my screenshot above, my source datframe (df2) contains few NA's

当我执行df2.stack()时,我会丢失数据中的所有NA.

When I do df2.stack(), I lose all the NA's from the data.

但是,我想保留H7_dateH8的NA,因为它们具有对应的值/日期对.对于H7_date,我有一个有效值H7,对于H8,我有对应的H8_date.

However I would like to retain NA for H7_date and H8 because they have got their corresponding value / date pair. For H7_date, I have a valid value H7 and for H8, I have got it's corresponding H8_date.

仅当两个值(H5_dateH5)均为NA时,我才想删除记录.

I would like to drop records only when both the values (H5_date,H5) are NA.

请注意,这里我只有很少的列,而我的真实数据有150多个列,并且列名是事先未知的.

Please note I have got only few columns here and my real data has more than 150 columns and column names aren't known in advance.

我希望我的输出如下图所示,尽管它们是NA的,但没有H5_dateH5

I expect my output to be like as shown below which doesn't have H5_date,H5 though they are NA's

推荐答案

尝试 pd.DataFrame.melt

df = pd.melt(df2, id_vars='person_id', var_name='col', value_name='dates')
df['col2'] = df['col'].str.split("_").str[0]
df['count'] = df.groupby(['col2'])['dates'].transform(pd.Series.count)
df = df[df['count'] != 0]
df.drop(['col2', 'count'], axis=1, inplace=True)
print(df)

    person_id      col                dates
0           1  H1_date  2006-10-30 00:00:00
1           1       H1                  2.3
2           1  H2_date  2016-10-30 00:00:00
3           1       H2                 12.3
4           1  H3_date  2026-11-30 00:00:00
5           1       H3                 22.3
6           1  H4_date  2106-10-30 00:00:00
7           1       H4                 42.3
10          1  H6_date  2006-10-30 00:00:00
11          1       H6                  2.3
12          1  H7_date                  NaN
13          1       H7                  2.3
14          1  H8_date  2006-10-30 00:00:00
15          1       H8                  NaN

这篇关于在Python中的Stack操作期间保留少量NA并删除其余NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆