Pandas 数据框合并行以删除 NaN [英] Pandas dataframe merging rows to remove NaN
问题描述
我有一个包含一些 NaN 的数据框:
I have a dataframe with some NaNs:
hostname period Teff
51 Peg 4.2293 5773
51 Peg 4.231 NaN
51 Peg 4.23077 NaN
55 Cnc 44.3787 NaN
55 Cnc 44.373 NaN
55 Cnc 44.4175 NaN
55 Cnc NaN 5234
61 Vir NaN 5577
61 Vir 38.021 NaN
61 Vir 123.01 NaN
具有相同主机名"的行都指向同一个对象,但正如您所见,某些条目在不同列下具有 NaN.我想合并同一主机名下的所有行,以便保留每列中的第一个有限值(如果所有值都是 NaN,则删除该行).所以结果应该是这样的:
The rows with the same "hostname" all refer to the same object, but as you can see, some entries have NaNs under various columns. I'd like to merge all the rows under the same hostname such that I retain the first finite value in each column (drop the row if all values are NaN). So the result should look like this:
hostname period Teff
51 Peg 4.2293 5773
55 Cnc 44.3787 5234
61 Vir 38.021 5577
你会怎么做?
推荐答案
Use groupby.first
;它需要 第一个非 NA 值:
df.groupby('hostname')[['period', 'Teff']].first().reset_index()
# hostname period Teff
#0 Cnc 44.3787 5234
#1 Peg 4.2293 5773
#2 Vir 38.0210 5577
或者使用自定义聚合函数手动执行此操作:
Or manually do this with a custom aggregation function:
df.groupby('hostname')[['period', 'Teff']].agg(lambda x: x.dropna().iat[0]).reset_index()
这要求每组至少有一个非 NA 值.
This requires each group has at least one non NA value.
编写自己的函数来处理边缘情况:
Write your own function to handle the edge case:
def first_(g):
non_na = g.dropna()
return non_na.iat[0] if len(non_na) > 0 else pd.np.nan
df.groupby('hostname')[['period', 'Teff']].agg(first_).reset_index()
# hostname period Teff
#0 Cnc 44.3787 5234
#1 Peg 4.2293 5773
#2 Vir 38.0210 5577
这篇关于Pandas 数据框合并行以删除 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!