在pandas df中提取带有子字符串的行,该子字符串包含空格 [英] Extracting rows with substring containing whitespace after + in pandas df
问题描述
我想获取 df
中所有行,这些行的 path
列包含一个子字符串 new +文件夹
。此问题从pandas DataFrame中按部分字符串选择,然后cs95的答案对于 new +
或 fol
之类的子字符串非常有用,但搜索时结果不正确
I want to get all the rows in df
whose path
column contains a substring new+ folder
. This question Select by partial string from a pandas DataFrame and the answer by cs95 has been very helpful for substrings like new+
or fol
but the results are not correct when I search
new +文件夹
。
>>>dft = pandas.DataFrame([[ '/new+folder/'], ['/new+ folder/']], columns=['a'])
index path
0 `/new+folder/`
1 `/new+ folder/`
现在使用查询
>>>print(dft.query('a.str.contains("new+")', engine='python').head())
a
0 new+folder
1 new+ folder
print(dft.query('a.str.contains("new+ ")', engine='python').head())
Empty DataFrame
Columns: [a]
Index: []
>>>print(dft.query('a.str.contains("new+ f")', engine='python').head())
Empty DataFrame
Columns: [a]
Index: []
使用进行测试:
>>>dft[dft['a'].str.contains('new+')]
a
0 new+folder
1 new+ folder
>>>dft[dft['a'].str.contains('new+ ')]
a
>>>dft[dft['a'].str.contains('new+ f')]
a
如何解决出现时出现的错误
后加上 +
还是感觉特殊?
How can I get the error resolved that comes when there is a after a
+
or I feel special characters?
熊猫0.24.2
Python 3.7.3 64位
Pandas 0.24.2 Python 3.7.3 64-bit
推荐答案
是的, +
是特殊的正则表达式字符,如果需要使用 query
的有效解决方案,则需要对其进行转义:
Yes, +
is special regex character, need escape it if need working solution with query
:
print(dft.query('a.str.contains("new\+ ")', engine='python').head())
a
1 /new+ folder/
解决方案 regex = False
在这里不起作用:
Solution with regex=False
here not working:
print(dft.query('a.str.contains("new+ ", regex=False)', engine='python').head())
AttributeError:'dict'对象没有属性'append'
AttributeError: 'dict' object has no attribute 'append'
如果想要通过 boolean indexing
同时使用这两种解决方案。
If want filtering by boolean indexing
working both solutions.
这篇关于在pandas df中提取带有子字符串的行,该子字符串包含空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!