从pandas数据框列中查找所有正则表达式匹配项 [英] finding all regex matches from a pandas dataframe column

查看：1050 发布时间：2020/5/24 2:23:34 python regex pandas

本文介绍了从pandas数据框列中查找所有正则表达式匹配项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从数据框中提取一些数据，但是以下查询仅提取第一个匹配项，而忽略其余匹配项，例如，如果整个数据为:

i am trying to extract some data from a dataframe, however following query only extract the first match and ignores the rest of the matches, for example if the entire data is:

df['value']=
           0   123 blah blah blah, 456 blah blah blah, 129kfj blah blah
           1   237 blah blah blah, 438 blah blah blah, 365kfj blah blah 
           ...

正则表达式为:

df['newCol']=df['value'].str.extract("[0-9]{3}")

我希望结果为新的列名"newCol"，例如:

i want the result to be a new column name "newCol" as:

newCol
------
123,456,129
237,438,365
...

但是我得到的实际结果只是第一个数字:

but the actual result i get is only the first number:

newCol
------
123
237

这是怎么了? :(

谢谢

更新:

感谢MaxU，我找到了解决方案，仅提出了几点建议.我有Pandas 0.18.1，所以直到我将Pandas更新到0.19之前，extractall才对我不起作用，所以如果您遇到Extractall的问题，请记住检查您的熊猫版本...第二，apply('，'.join)没有之所以对我有用，是因为我有一些非字符串值(Null值)并且它无法处理它，所以我使用了Lambda并最终对MaxU解决方案进行了少量修改.

thanks to MaxU I found the solution, just couple of suggestions. I had Pandas 0.18.1 so extractall didn't work for me untill i updated pandas to 0.19, so remember to check your pandas version if you have issue with Extractall...second, apply(','.join) didn't work for me because I had some non string values (Null values) and it couldn't handle it so I used Lambda and it finally worked with a small modification of MaxU solution.

x['value'].str.extractall(r'(\d{3})').unstack().apply(lambda x:','.join(x.dropna()), axis=1)

推荐答案

，您可以使用 更新:

In [77]: x
Out[77]:
                                                      value
0  123 blah blah blah, 456 blah blah blah, 129kfj blah blah
1  237 blah blah blah, 438 blah blah blah, 365kfj blah blah

In [78]: x['value'].str.extractall(r'(\d{3})').unstack().apply(','.join, 1)
Out[78]:
0    123,456,129
1    237,438,365
dtype: object

这篇关于从pandas数据框列中查找所有正则表达式匹配项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从pandas数据框列中查找所有正则表达式匹配项 [英] finding all regex matches from a pandas dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从pandas数据框列中查找所有正则表达式匹配项 [英] finding all regex matches from a pandas dataframe column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭