如何使用 pandas 基于多个字符串索引拆分列 [英] How to split a column based on several string indices using pandas

查看:108
本文介绍了如何使用 pandas 基于多个字符串索引拆分列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于几个索引将每一行分成新的列:

I would like to split each row into new columns based on several indices:

6ABCDE0218594STRING

6 ABCDE 021 8594 STRING

这似乎至少已经被问过一次,但是我一直在寻找该问题的唯一变体(如将pandas数据框字符串条目分开行).

This seems like it'd have been asked at least once before, but I keep finding only variations on the question (separating by a delimiter as in pandas: How do I split text in a column into multiple rows?, separating into new rows using rather than new columns, again with a delimiter: Split pandas dataframe string entry to separate rows).

如果这是重复的话,我提前致歉!

I apologize in advance if this is a duplicate!

推荐答案

一种方法是使用正则表达式和

One way is to use a regex and str.extract to pull out the columns:

In [11]: df = pd.DataFrame([['6ABCDE0218594STRING']])

您可以使用索引来完成它,所以就像这样:

You could just do it with index, so something like this:

In [12]: df[0].str.extract('(.)(.{5})(.{3})(.{4})(.*)')
Out[12]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

或者您可能会更加谨慎,并确保每一列都是正确的形式:

Or you could be a bit more cautious and ensure each column is the correct form:

In [13]: df[0].str.extract('(\d)(.{5})(\d{3})(\d{4})(.*)')
Out[13]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

注意:您还可以使用命名组(请参见 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆