如何按行连接包含字符串的几列? [英] How to row-wise concatenate several columns containing strings?

查看:108
本文介绍了如何按行连接包含字符串的几列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列特定的数据集,其格式如下:

I have a specific series of datasets which come in the following general form:

import pandas as pd
import random
df = pd.DataFrame({'n': random.sample(xrange(1000), 3), 't0':['a', 'b', 'c'], 't1':['d','e','f'], 't2':['g','h','i'], 't3':['i','j', 'k']})

tn 列的数量( t0,t1,t2 ... tn )取决于数据集,但始终为< 30. 我的目的是为每行合并 tn 列的内容,以便实现此结果(请注意,为了提高可读性,我需要保留元素之间的空白):

The number of tn columns (t0, t1, t2 ... tn) varies depending on the dataset, but is always <30. My aim is to merge content of the tn columns for each row so that I achieve this result (note that for readability I need to keep the whitespace between elements):

df['result'] = df.t0 +' '+df.t1+' '+df.t2+' '+ df.t3

到目前为止,一切都很好.这段代码可能很简单,但是一旦我收到另一个 tn 列数增加的数据集,它就会变得笨拙而不灵活.这是我的问题所在:

So far so good. This code may be simple but it becomes clumsy and inflexible as soon as I receive another dataset, where the number of tn columns goes up. This where my question comes in:

是否有其他语法可将内容合并到多个列中?与数字列无关的某件事,类似于:

Is there any other syntax to merge the content across multiple columns? Something agnostic to the number columns, akin to:

df['result'] = ' '.join(df.ix[:,1:])

基本上,我想实现与以下链接中的OP相同的功能,但字符串之间具有空格: R-在数据框的特定列之间逐行连接

Basically I want to achieve the same as the OP in the link below, but with whitespace between the strings: R - concatenate row-wise across specific columns of dataframe

推荐答案

在字符串 en 的列(系列)中进行操作的关键是

The key to operate in columns (Series) of strings en mass is the Series.str accessor.

我可以想到两种.str方法来完成您想要的事情.

I can think of two .str methods to do what you want.

第一个是 str.cat .您必须从一个系列开始,但是您可以传递一个系列列表(不幸的是您不能传递一个数据框)以与一个可选的分隔符连接.以您的示例为例:

The first is str.cat. You have to start from a series, but you can pass a list of series (unfortunately you can't pass a dataframe) to concatenate with an optional separator. Using your example:

column_names = df.columns[1:]  # skipping the first, numeric, column
series_list = [df[c] for c in column_names[1:]]
# concatenate:
df['result'] = series_list[0].str.cat(series_list[1:], sep=' ')

或者,一行:

df['result'] = df[df.columns[1]].str.cat([df[c] for c in df.columns[2:]], sep=' ')

str.join()

第二个是 .str.join() 方法,其工作方式类似于标准的Python方法 string.join() ,但您需要具有一列可迭代对象(系列),例如,一列元组,我们可以通过将tuples按行应用于列的子数据帧中来获得该列对以下内容感兴趣:

str.join()

The second is the .str.join() method, which works like the standard Python method string.join(), but for which you need to have a column (Series) of iterables, for example, a column of tuples, which we can get by applying tuples row-wise to a sub-dataframe of the columns you're interested in:

tuple_series = df[column_names].apply(tuple, axis=1)
df['result'] = tuple_series.str.join(' ')

或者,一行:

df['result'] = df[df.columns[1:]].apply(tuple, axis=1).str.join(' ')

顺便说一句,不要用list而不是tuple尝试以上操作.从pandas-0.20.1开始,如果传递给Dataframe.apply()方法的函数返回list,并且返回的列表具有与原始(子)数据帧的列相同的编号条目,则Dataframe.apply()代替地返回Dataframe Series.

BTW, don't try the above with list instead of tuple. As of pandas-0.20.1, if the function passed into the Dataframe.apply() method returns a list and the returned list has the same number entries as the columns of the original (sub)dataframe, Dataframe.apply() returns a Dataframe instead of a Series.

这篇关于如何按行连接包含字符串的几列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆