如何基于作为输入参数传递给数据框列的基于unix的正则表达式过滤行 [英] how to filter rows based on unix based regular expressions passed as an input argument to a data frame column
问题描述
我有以下数据框
import numpy as np
import pandas as pd
import os
csvFile = "csv.csv"
csvDelim = '@@@'
df = pd.read_csv(csvFile, engine="python", index_col=False, delimiter= csvDelim)
df.head()
ID col_1
0 ACLKB
1 CLKAA
2 AACLK
3 BBBCLK
要传递的正则表达式为 CLK ,列名称为'col_1'
The regular expression to be passed is CLK and the column name is 'col_1'
text = '*CLK*'
findtext = 'r'+text+".*"
colName = 'Signal'
df[colName].str.match(text)
我得到以下不正确的结果.
I am getting the following results which are incorrect.
0 False
1 False
2 False
3 False
4 False
The expected output is
0 True
1 True
2 True
3 True
4 True
Can someone help me to filter rows based on regular expression passed as above
error Traceback (most recent call last)
<ipython-input-110-8d1c1b6b2d15> in <module>()
----> 1 df['Signal'].str.match(findtext)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in match(self, pat, case, flags, na, as_indexer)
1571 def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
1572 result = str_match(self._data, pat, case=case, flags=flags, na=na,
-> 1573 as_indexer=as_indexer)
1574 return self._wrap_result(result)
1575
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in str_match(arr, pat, case, flags, na, as_indexer)
495 flags |= re.IGNORECASE
496
--> 497 regex = re.compile(pat, flags=flags)
498
499 if (as_indexer is False) and (regex.groups > 0):
~\AppData\Local\Continuum\anaconda3\lib\re.py in compile(pattern, flags)
231 def compile(pattern, flags=0):
232 "Compile a regular expression pattern, returning a pattern object."
--> 233 return _compile(pattern, flags)
234
235 def purge():
~\AppData\Local\Continuum\anaconda3\lib\re.py in _compile(pattern, flags)
299 if not sre_compile.isstring(pattern):
300 raise TypeError("first argument must be string or compiled pattern")
-> 301 p = sre_compile.compile(模式,标志) 302(如果不是)(标志和调试): 303 if len(_cache)> = _MAXCACHE:
--> 301 p = sre_compile.compile(pattern, flags) 302 if not (flags & DEBUG): 303 if len(_cache) >= _MAXCACHE:
~\AppData\Local\Continuum\anaconda3\lib\sre_compile.py in compile(p, flags)
560 if isstring(p):
561 pattern = p
-> 562 p = sre_parse.parse(p,标志) 563其他: 564模式=无
--> 562 p = sre_parse.parse(p, flags) 563 else: 564 pattern = None
~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in parse(str, flags, pattern)
853
854 try:
-> 855 p = _parse_sub(源,模式,标志和SRE_FLAG_VERBOSE,0) 856(详细除外): 857#VERBOSE标志在样式内已打开.成为
--> 855 p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) 856 except Verbose: 857 # the VERBOSE flag was switched on inside the pattern. to be
~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse_sub(source, state, verbose, nested)
414 while True:
415 itemsappend(_parse(source, state, verbose, nested + 1,
-> 416不嵌套且不是项)) 417,如果不是sourcematch("|"): 418休息
--> 416 not nested and not items)) 417 if not sourcematch("|"): 418 break
~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
614 if not item or (_len(item) == 1 and item[0][0] is AT):
615 raise source.error("nothing to repeat",
-> 616 source.tell()-此处+ len(this)) 617如果_REPEATCODES中的项目[0] [0]: 618提高source.error(多次重复",
--> 616 source.tell() - here + len(this)) 617 if item[0][0] in _REPEATCODES: 618 raise source.error("multiple repeat",
error: nothing to repeat at position 0
此外,正则表达式也可以是^ CLK或?CLK或任何其他正则表达式 表达什么是通用的解决方案来解决此问题,当任何字符串与 正则表达式已传递
Also, the regular expression can also be ^CLK or?CLK or any other regular expression what is a generic solution to fix the issue when any string with regular expression has been passed
推荐答案
删除星号(*),并使用.contains
方法而不是.match
方法.使用case=False
查找大写和小写字母
remove the asterisks (*) and use the .contains
method instead of the .match
method. Use case=False
to find upper and lowercase letters
查看此代码:
text = 'CLK'
findtext = 'r'+text+".*"
colName = 'Signal'
df[colName].str.contains(text, case=False)
这篇关于如何基于作为输入参数传递给数据框列的基于unix的正则表达式过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!