pandas 将逗号分隔的字符串拆分为2个单独的列,并向下爆炸行 [英] Pandas split comma separated string into 2 separate columns and explode rows down
问题描述
我的数据当前采用这种格式
My data is currently in this format
Uid Postcodelist
1 NE11 7HS,NE5 8MN,NE1 7UJ,NE14 8YU
2 LS6 8PJ
3 M6 7JH,M14 1HF
4 B17 8KA,LE5 7UZ,LE9 9GF
我有一个问题,我需要将列表分为两列,并采用字符串的以下顺序位置(例如uid = 1)
I have a problem where i need to split the list into two columns and take the following ordinal positions of the strings (for example uid=1)
[1,2,3,4]
[1,2,3,4]
将被拆分为
[1],[2]
[3],[4]
这是我需要的预期输出
Uid P1 P2
1 NE11 7HS NE5 8MN
1 NE1 7UJ NE14 8YU
2 LS6 8PJ Null
3 M6 7JH M14 1HF
4 B17 8KA LE5 7UZ
4 LE9 9GF Null
如果有4个以上的项目,则需要拆分并进一步爆炸(我认为不会有定义的上限,但是对于字符串列表中的每2个项目,它都需要爆炸)
If there were more than 4 items it would need to split and explode further (i don't think there would be a defined upper limit but it would need to explode for every 2 items in the string list)
我尝试了一段时间,有人回答并给出了以下解决方案,这使我有所了解.现在,越来越需要这种数据具有上述格式.下面提供了一些方法的代码. (我正在使用此处介绍的爆炸功能. Split(爆炸)熊猫数据框字符串条目以分隔行
I tried this a while back and someone answered and gave the following solution which got me some of the way there. There is an increased need now for this data to be in the above described format. The code which got me some of the way there is below. (I'm using the explode function described here. Split (explode) pandas dataframe string entry to separate rows
df[['P1','P2']] = df.pop('PreviousPostCodes').str.split(',\s*', n=1, expand=True)
df['P2'] = df['P2'].fillna('').str.split(',\s*', expand=False)
df = explode(df, lst_cols='P2')
这让我(例如uid = 1)
this gets me (again for example uid=1)
[1,2,3,4]
[1,2,3,4]
已转换为
[1],[2]
[1],[3]
[1],[4]
谢谢.
推荐答案
您可以使用:
df = df.set_index('Uid').pop('PreviousPostCodes').str.split(',\s*', expand=True)
df.columns = [df.columns % 2 + 1, df.columns // 2]
df = df.stack().add_prefix('P').reset_index(level=1, drop=True).reset_index()
print (df)
Uid P1 P2
0 1 NE11 7HS NE5 8MN
1 1 NE1 7UJ NE14 8YU
2 2 LS6 8PJ None
3 3 M6 7JH M14 1HF
4 4 B17 8KA LE5 7UZ
5 4 LE9 9GF None
说明:
- 通过
set_index
和split
到DataFrame
- 按楼层和模除数在列中创建
MultiIndex
- 通过
stack
- 通过
add_suffix
更改列名称a> - 最后一个
reset_index
从index
中删除第一级的
MultiIndex
和列
- Create index from column
Uid
byset_index
andsplit
toDataFrame
- Create
MultiIndex
in columns by floor and modulo division - Reshape by
stack
- Change columns names by
add_suffix
- Last
reset_index
for remove first level ofMultiIndex
and column fromindex
这篇关于 pandas 将逗号分隔的字符串拆分为2个单独的列,并向下爆炸行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!