在 pandas 系列中保留具有模式的元素,而无需将其转换为列表 [英] Keep elements with pattern in pandas series without converting them to list
问题描述
我有以下数据框:
df = pd.DataFrame(["Air type:1, Space kind:2, water", "something, Space blu:3, somethingelse"], columns = ['A'])
,我想创建一个新列,该列为每行包含所有带有:"元素的元素.在他们之中.因此,例如在第一行中,我想返回"type:1,kind:2"对于第二行,我想要"blu:3".我通过以下方式使用列表推导进行管理:
and I want to create a new column that contains for each row all the elements that have a ":" in them. So for example in the first row I want to return "type:1, kind:2" and for the second row I want "blu:3". I managed by using a list comprehension in the following way:
df['new'] = [[y for y in x if ":" in y] for x in df['A'].str.split(",")]
但是我的问题是新列包含列表元素.
But my issue is that the new column contains list elements.
A new
0 Air type:1, Space kind:2, water [Air type:1, Space kind:2]
1 something at the start:4, Space blu:3, somethingelse [something at the start:4, Space blu:3]
我没有大量使用Python,因此我是否100%是否想念更多特定于Pandas的方式来做到这一点.如果有的话,很乐意学习和使用它.如果这是正确的方法,我如何将元素转换回字符串以对它们执行正则表达式?我尝试了如何将列表中的项目连接到单个字符串?,但这不能正常工作.
I have not used Python a lot so I am not 100% whether I am missing a more Pandas specific way to do this. If there is one, more than happy to learn about it and use it. If this is a correct approach how can I convert the elements back into strings in order to do regexes on them? I tried How to concatenate items in a list to a single string? but this is not working as I would like it to.
推荐答案
You can use pd.Series.str.findall
here.
df['new'] = df['A'].str.findall('\w+:\w+')
A new
0 type:1, kind:2, water [type:1, kind:2]
1 something, blu:3, somethingelse [blu:3]
编辑:
当有多个单词时,请尝试
When there are multiple words then try
df['new'] = df['A'].str.findall('[^\s,][^:,]+:[^:,]+').str.join(', ')
A new
0 Air type:1, Space kind:2, water Air type:1, Space kind:2
1 something, Space blu:3, somethingelse Space blu:3
这篇关于在 pandas 系列中保留具有模式的元素,而无需将其转换为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!