如何在数据框中合并两行大 pandas [英] How to merge two rows in a dataframe pandas

查看:264
本文介绍了如何在数据框中合并两行大 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧有两行,我想将两行合并到一行。
df看起来如下:

  PC评级CY评级PY HT 
0 DE101 NaN AA GV
0 DE101 AA + NaN GV

我已经尝试创建两个独立的数据框,并将它们与df .merge(df2)没有成功。结果应该是以下

  PC评级CY评级PY HT 
0 DE101 AA + AA GV

任何想法?感谢提前
可以df.update是一个可能的解决方案吗?



编辑:

  df.head(1).combine_first(df.tail(1))

这适用于上面的例子。然而,对于包含数值的列,此方法不产生所需的输出,例如。对于

  PC评级CY评级PY HT MV1 MV2 
0 DE101 NaN AA GV 0 20
0 DE101 AA + NaN GV 10 0

输出应为:

  PC评级CY评级PY HT MV1 MV2 
0 DE101 AA + AA GV 10 20

上面的公式并不总结最后两列中的值,而是将值放在数据框的第一行。

  PC评级CY评级PY HT MV1 MV2 
0 DE101 AA + AA GV 0 20

如何解决这个问题?

解决方案

您可以使用






Incase有混合数据类型的列,将它们分成它的组成部分 dtype 列,然后执行

  obj_df = df.select_dtypes(include = [np.object]) 
num_df = df.select_dtypes(exclude = [np.object])

obj_df.head(1).combine_first(obj_df.tail(1))。join(num_df.head(1).add(num_df.tail(1)))


I have a dataframe with two rows and I'd like to merge the two rows to one row. The df Looks as follows:

              PC           Rating CY   Rating PY    HT
0             DE101           NaN            AA     GV
0             DE101           AA+           NaN     GV

I have tried to create two seperate dataframes and Combine them with df.merge(df2) without success. The result should be the following

              PC           Rating CY   Rating PY    HT
0             DE101           AA+            AA     GV

Any ideas? Thanks in advance Could df.update be a possible solution?

EDIT:

df.head(1).combine_first(df.tail(1))

This works for the example above. However, for columns containing numerical values, this approach doesn't yield the desired output, e.g. for

              PC           Rating CY   Rating PY    HT    MV1   MV2
0             DE101           NaN            AA     GV    0     20 
0             DE101           AA+           NaN     GV    10    0

The output should be:

              PC           Rating CY   Rating PY    HT   MV1    MV2
0             DE101           AA+            AA     GV   10     20

The formula above doesn't sum up the values in the last two columns, but takes the values in the first row of the dataframe.

              PC           Rating CY   Rating PY    HT   MV1    MV2
0             DE101           AA+            AA     GV   0     20

How could this problem be fixed?

解决方案

You can make use of DF.combine_first() method after separating the DF into 2 parts where the null values in the first half would be replaced with the finite values in the other half while keeping it's other finite values untouched:

df.head(1).combine_first(df.tail(1))
# Practically this is same as → df.head(1).fillna(df.tail(1))


Incase there are columns of mixed datatype, partitioning them into it's constituent dtype columns and then performing various operations on it would be feasible by chaining them across.

obj_df = df.select_dtypes(include=[np.object])
num_df = df.select_dtypes(exclude=[np.object])

obj_df.head(1).combine_first(obj_df.tail(1)).join(num_df.head(1).add(num_df.tail(1)))

这篇关于如何在数据框中合并两行大 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆