在 pandas 的多行中连接几列 [英] Concatenate several columns across more than one row in pandas

查看:38
本文介绍了在 pandas 的多行中连接几列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框.我想基于一个标识符将一列连接成几列.可能有不止一列的列需要串联.我只使用字符串.

I have a pandas dataframe. I would like to concatenate several columns based on one column with an identifier. There may be more than one row of columns that need to be concatenated. I am working with strings only.

例如,我有一个看起来像这样的数据集:

So for instance, I have a dataset that looks like this:

 Identifier     Op1 Op2 Op3
 A     str_1    str_2   str_3
 B     str_4    str_5   str_6
 B     str_7    str_8   str_9
 B     str_10   str_11  str_12
 C     str_13   str_14  str_15 
 C     str_16   str_17  str_18

我需要Op1,Op2和Op3中的每个人都连接在一起.如果同一标识符"在多行中,则需要将Op1,Op2和Op3列连接起来,然后再与第一列连接.

I need everyone in Op1, Op2, and Op3 concatenated. If the same "identifier" is on more than one row, I need Op1, Op2, and Op3 columns concatenated and then concatenated with the first column.

所以我的最终结果应该像这样:

So my end result should look like this:

 Identifier Ops
 A  str_1 str_2 str_3
 B  str_4 str_5 str_6 str_7 str_8 str_9 str_10 str_11 str_12
 C  str_13 str_14 str_15 str_16 str_17 str_18

每个事物"之间也应该有一个空格.所以像'str_8 str_9'而不是'str_8str_9'

There should be a space in between each "thing" as well. So like 'str_8 str_9' instead of 'str_8str_9'

如果它比熊猫更容易使用,我在sqlite3中也有此表.

I also have this table in sqlite3 if that is easier to work with than pandas.

我该怎么做?

谢谢

推荐答案

将您的输入数据转换为csv文件,我执行了以下操作,并且效果很好.

Turning your input data into a csv file, I did the following, and it works well.

import pandas as pd

DF = pd.read_csv("CombinerData.csv")

print DF
print 

def combine_Columns_Into_New_Column(DF, columns_To_Combine, new_Column_Name):
    DF[new_Column_Name] = ''
    for Col in columns_To_Combine:
        DF[new_Column_Name] += DF[Col].map(str) + ' '
    DF = DF.drop(columns_To_Combine, axis=1)
    DF = DF.groupby(by=['Identifier']).sum()

    return DF

DF = combine_Columns_Into_New_Column(DF, ['Op1','Op2','Op3'],'Ops')

print DF

输出:

                                                          Ops
Identifier                                                   
A                                          str_1 str_2 str_3 
B           str_4 str_5 str_6 str_7 str_8 str_9 str_10 str...
C                 str_13 str_14 str_15  str_16 str_17 str_18 

输入文件:

Identifier,Op1,Op2,Op3
A,str_1,str_2,str_3
B,str_4,str_5,str_6
B,str_7,str_8,str_9
B,str_10,str_11,str_12
C,str_13,str_14,str_15 
C,str_16,str_17,str_18

这篇关于在 pandas 的多行中连接几列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆