根据合并的数据框的列合并然后排序数据框的列 [英] Merge and then sort columns of a dataframe based on the columns of the merging dataframe

查看:107
本文介绍了根据合并的数据框的列合并然后排序数据框的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧,都用时间戳编制索引.我想保留合并的第一个数据框中的列顺序.

I have two dataframes, both indexed with timestamps. I would like to preserve the order of the columns in the first dataframe that is merged.

例如:

#required packages
import pandas as pd
import numpy as np

# defining stuff
num_periods_1 = 11
num_periods_2 = 4

# create sample time series
dates1 = pd.date_range('1/1/2000 00:00:00', periods=num_periods_1, freq='10min')
dates2 = pd.date_range('1/1/2000 01:30:00', periods=num_periods_2, freq='10min')

column_names_1 = ['C', 'B', 'A']
column_names_2 = ['B', 'C', 'D']

df1 = pd.DataFrame(np.random.randn(num_periods_1, len(column_names_1)), index=dates1, columns=column_names_1)
df2 = pd.DataFrame(np.random.randn(num_periods_2, len(column_names_2)), index=dates2, columns=column_names_2)

df3 = df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=['_1', '_2'])
print("\nData Frame Three:\n", df3)

上面的代码生成两个数据帧,第一个数据帧具有C,B和A列.第二个数据帧具有列B,C和D. C_1,B_1,A,B_2,C_2,D.我希望合并输出中的列为C_1,C_2,B_1,B_2,A_1,D_2.列的顺序从第一个数据帧保留下来,并且类似于第二个数据帧的任何数据都将添加到相应的数据旁边.

The above code generates two data frames the first with columns C, B, and A. The second dataframe has columns B, C, and D. The current output has the columns in the following order; C_1, B_1, A, B_2, C_2, D. What I want the columns from the output of the merge to be C_1, C_2, B_1, B_2, A_1, D_2. The order of the columns is preserved from the first data frame and any data similar to the second data frame is added next to the corresponding data.

合并中是否有设置,或者我可以使用sort_index来做到这一点?

Could there be a setting in merge or can I use sort_index to do this?

也许在排序过程中使用一个更好的方法是将其称为未排序.每列放在一起的位置,依此类推.

Maybe a better way to phrase the sorting process would be to call it uncollated. Where each column is put together and so on.

推荐答案

在您

Using an OrderedDict, as you suggested.

from collections import OrderedDict
from itertools import chain

c = df3.columns.tolist()
o = OrderedDict()

for x in c:
    o.setdefault(x.split('_')[0], []).append(x)

c = list(chain.from_iterable(o.values()))
df3 = df3[c]


一种替代方法,涉及提取前缀并在索引上调用sorted then .

# https://stackoverflow.com/a/46839182/4909087
p = [s[0] for s in c]
c = sorted(c, key=lambda x: (p.index(x[0]), x))
df = df[c]

这篇关于根据合并的数据框的列合并然后排序数据框的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆