将任何其他列追加到前三列,并指明它来自的三列 [英] Append any further columns to the first three columns AND indicate the triple column it comes from

查看:71
本文介绍了将任何其他列追加到前三列,并指明它来自的三列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对追加的后续问题前三列中的任何其他列.

我从大约120列开始.总是属于彼此的三列.与其并排放置120列,不应该将它们堆叠在一起,因此我们最终得到了三列.这已经解决了(请参阅上面的链接).

I start out with about 120 columns. It is always three columns that belong to each other. Instead of being 120 columns side by side, they should be stacked on top of each other, so we end up with three columns. This has already been solved (see link above).

样本数据:

df = pd.DataFrame({
    "1": np.random.randint(900000000, 999999999, size=5),
    "2": np.random.choice( ["A","B","C", np.nan], 5),
    "3": np.random.choice( [np.nan, 1], 5),

    "4": np.random.randint(900000000, 999999999, size=5),
    "5": np.random.choice( ["A","B","C", np.nan], 5),
    "6": np.random.choice( [np.nan, 1], 5)
})

Jezrael建议的第一个问题的工作解决方案:

Working solution for initial question as suggested by Jezrael:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']

这改变了这个:

           1    2    3          4  5    6
0  960189042    B  NaN  991581392  A  1.0
1  977655199  nan  1.0  964195250  A  1.0
2  961771966    A  NaN  969007327  B  1.0
3  955308022    C  1.0  973316485  A  NaN
4  933277976    A  1.0  976749175  A  NaN

对此:

           A    B    C
0  960189042    B  NaN
1  977655199  nan  1.0
2  961771966    A  NaN
3  955308022    C  1.0
4  933277976    A  1.0
5  991581392    A  1.0
6  964195250    A  1.0
7  969007327    B  1.0
8  973316485    A  NaN
9  976749175    A  NaN

后续问题: 现在,如果我需要一个指标,每个区块来自哪个三元组,那该怎么办呢?因此结果可能看起来像:

Follow Up Question: Now, if I'd need an indicator from which triple each block comes from, how could this be done? So a result could look like:

           A    B    C D
0  960189042    B  NaN 0
1  977655199  nan  1.0 0
2  961771966    A  NaN 0
3  955308022    C  1.0 0
4  933277976    A  1.0 0
5  991581392    A  1.0 1
6  964195250    A  1.0 1
7  969007327    B  1.0 1
8  973316485    A  NaN 1
9  976749175    A  NaN 1

这些块的长度可以不同!所以我不能简单地添加一个计数器.

These blocks can be of different lengths! So I cannot simply add a counter.

推荐答案

使用 reset_index 仅删除第一级,第二级MultiIndex转换为列:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(level=0, drop=True).reset_index()
df.columns = ['D','A','B','C']
print (df)
   D          A    B    C
0  0  960189042    B  NaN
1  0  977655199  nan  1.0
2  0  961771966    A  NaN
3  0  955308022    C  1.0
4  0  933277976    A  1.0
5  1  991581392    A  1.0
6  1  964195250    A  1.0
7  1  969007327    B  1.0
8  1  973316485    A  NaN
9  1  976749175    A  NaN

然后,如果需要更改列的顺序:

Then if need change order of columns:

cols = df.columns[1:].tolist() + df.columns[:1].tolist()
df = df[cols]
print (df)
           A    B    C  D
0  960189042    B  NaN  0
1  977655199  nan  1.0  0
2  961771966    A  NaN  0
3  955308022    C  1.0  0
4  933277976    A  1.0  0
5  991581392    A  1.0  1
6  964195250    A  1.0  1
7  969007327    B  1.0  1
8  973316485    A  NaN  1
9  976749175    A  NaN  1

这篇关于将任何其他列追加到前三列,并指明它来自的三列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆