有效地串联多个 pandas 系列 [英] Concatenate multiple pandas series efficiently

查看:71
本文介绍了有效地串联多个 pandas 系列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我可以使用combine_first合并两个系列:

I understand that I can use combine_first to merge two series:

series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
series2 = pd.Series([1,2,3,4,5],index=['f','g','h','i','j'])
series3 = pd.Series([1,2,3,4,5],index=['k','l','m','n','o'])

Combine1 = series1.combine_first(series2)
print(Combine1

输出:

a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
f    1.0
g    2.0
h    3.0
i    4.0
j    5.0
dtype: float64

如果我需要合并3个或更多系列怎么办?

What if I need to merge 3 or more series?

我了解使用以下代码:print(series1 + series2 + series3)收益:

I understand that using the following code: print(series1 + series2 + series3)yields:

a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
f   NaN
...
dtype: float64

是否可以有效地合并多个系列,而无需多次使用combine_first?

Can I merge multiple series efficiently without using combine_first multiple times?

谢谢

推荐答案

具有不重叠索引的组合系列

要垂直组合系列,请使用pd.concat.

# Setup
series_list = [
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('fghij')),
    pd.Series(range(1, 6), index=list('klmno'))
]

pd.concat(series_list)

a    1
b    2
c    3
d    4
e    5
f    1
g    2
h    3
i    4
j    5
k    1
l    2
m    3
n    4
o    5
dtype: int64


具有重叠索引的组合

series_list = [
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('kbmdf'))
]

如果系列的索引重叠,则可以组合(添加)键,

If the Series have overlapping indices, you can either combine (add) the keys,

pd.concat(series_list, axis=1, sort=False).sum(axis=1)

a     2.0
b     6.0
c     6.0
d    12.0
e    10.0
k     1.0
m     3.0
f     5.0
dtype: float64

或者,如果只想获取第一个/最后一个值(当有重复项时),只需在索引上删除重复项值即可.

Alternatively, just drop duplicates values on the index if you want to take only the first/last value (when there are duplicates).

res = pd.concat(series_list, axis=0)
# keep first value
res[~res.index.duplicated(keep='first')]
# keep last value
res[~res.index.duplicated(keep='last')]

这篇关于有效地串联多个 pandas 系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆