pandas 将逗号分隔的字符串拆分为2个单独的列,并向下爆炸行 [英] Pandas split comma separated string into 2 separate columns and explode rows down

查看:96
本文介绍了 pandas 将逗号分隔的字符串拆分为2个单独的列,并向下爆炸行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据当前采用这种格式

My data is currently in this format

Uid     Postcodelist
1       NE11 7HS,NE5 8MN,NE1 7UJ,NE14 8YU
2       LS6 8PJ
3       M6 7JH,M14 1HF
4       B17 8KA,LE5 7UZ,LE9 9GF

我有一个问题,我需要将列表分为两列,并采用字符串的以下顺序位置(例如uid = 1)

I have a problem where i need to split the list into two columns and take the following ordinal positions of the strings (for example uid=1)

[1,2,3,4]

[1,2,3,4]

将被拆分为

[1],[2]

[3],[4]

这是我需要的预期输出

Uid     P1           P2
1       NE11 7HS     NE5 8MN
1       NE1 7UJ      NE14 8YU
2       LS6 8PJ      Null
3       M6 7JH       M14 1HF
4       B17 8KA      LE5 7UZ
4       LE9 9GF      Null

如果有4个以上的项目,则需要拆分并进一步爆炸(我认为不会有定义的上限,但是对于字符串列表中的每2个项目,它都需要爆炸)

If there were more than 4 items it would need to split and explode further (i don't think there would be a defined upper limit but it would need to explode for every 2 items in the string list)

我尝试了一段时间,有人回答并给出了以下解决方案,这使我有所了解.现在,越来越需要这种数据具有上述格式.下面提供了一些方法的代码. (我正在使用此处介绍的爆炸功能. Split(爆炸)熊猫数据框字符串条目以分隔行

I tried this a while back and someone answered and gave the following solution which got me some of the way there. There is an increased need now for this data to be in the above described format. The code which got me some of the way there is below. (I'm using the explode function described here. Split (explode) pandas dataframe string entry to separate rows

df[['P1','P2']] = df.pop('PreviousPostCodes').str.split(',\s*', n=1, expand=True)
df['P2'] = df['P2'].fillna('').str.split(',\s*', expand=False)
df = explode(df, lst_cols='P2')

这让我(例如uid = 1)

this gets me (again for example uid=1)

[1,2,3,4]

[1,2,3,4]

已转换为

[1],[2]

[1],[3]

[1],[4]

谢谢.

推荐答案

您可以使用:

df = df.set_index('Uid').pop('PreviousPostCodes').str.split(',\s*', expand=True)
df.columns = [df.columns % 2 + 1, df.columns // 2]
df = df.stack().add_prefix('P').reset_index(level=1, drop=True).reset_index()
print (df)
   Uid        P1        P2
0    1  NE11 7HS   NE5 8MN
1    1   NE1 7UJ  NE14 8YU
2    2   LS6 8PJ      None
3    3    M6 7JH   M14 1HF
4    4   B17 8KA   LE5 7UZ
5    4   LE9 9GF      None

说明:

  1. 通过 set_index split DataFrame
  2. 按楼层和模除数在列中创建MultiIndex
  3. 通过 stack
  4. 通过 add_suffix 更改列名称a>
  5. 最后一个 reset_index index
  6. 中删除第一级的MultiIndex和列
  1. Create index from column Uid by set_index and split to DataFrame
  2. Create MultiIndex in columns by floor and modulo division
  3. Reshape by stack
  4. Change columns names by add_suffix
  5. Last reset_index for remove first level of MultiIndex and column from index

这篇关于 pandas 将逗号分隔的字符串拆分为2个单独的列,并向下爆炸行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆