如何基于作为输入参数传递给数据框列的基于unix的正则表达式过滤行 [英] how to filter rows based on unix based regular expressions passed as an input argument to a data frame column

查看:69
本文介绍了如何基于作为输入参数传递给数据框列的基于unix的正则表达式过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框

import numpy as np
import pandas as pd
import os

csvFile = "csv.csv"
csvDelim = '@@@'
df = pd.read_csv(csvFile, engine="python", index_col=False, delimiter= csvDelim)
df.head()


ID  col_1   
0   ACLKB
1   CLKAA
2   AACLK
3   BBBCLK

要传递的正则表达式为 CLK ,列名称为'col_1'

The regular expression to be passed is CLK and the column name is 'col_1'

text = '*CLK*'
findtext = 'r'+text+".*"
colName = 'Signal'

df[colName].str.match(text)

我得到以下不正确的结果.

I am getting the following results which are incorrect.

 0     False
 1     False
 2     False
 3     False
 4     False
The expected output is  
 0     True
 1     True
 2     True
 3     True
 4     True

 Can someone help me to filter rows based on regular expression passed as above  
         error                                     Traceback (most recent call last)
        <ipython-input-110-8d1c1b6b2d15> in <module>()
     ----> 1 df['Signal'].str.match(findtext)

              ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in match(self, pat, case, flags, na, as_indexer)
              1571     def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None):
              1572         result = str_match(self._data, pat, case=case, flags=flags, na=na,
        ->    1573                            as_indexer=as_indexer)
              1574         return self._wrap_result(result)
               1575 

            ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\strings.py in str_match(arr, pat, case, flags, na, as_indexer)
       495         flags |= re.IGNORECASE
       496 
    --> 497     regex = re.compile(pat, flags=flags)
      498 
      499     if (as_indexer is False) and (regex.groups > 0):

     ~\AppData\Local\Continuum\anaconda3\lib\re.py in compile(pattern, flags)
     231 def compile(pattern, flags=0):
     232     "Compile a regular expression pattern, returning a pattern object."
  --> 233     return _compile(pattern, flags)
     234 
     235 def purge():

  ~\AppData\Local\Continuum\anaconda3\lib\re.py in _compile(pattern, flags)
   299     if not sre_compile.isstring(pattern):
   300         raise TypeError("first argument must be string or compiled pattern")

-> 301 p = sre_compile.compile(模式,标志) 302(如果不是)(标志和调试): 303 if len(_cache)> = _MAXCACHE:

--> 301 p = sre_compile.compile(pattern, flags) 302 if not (flags & DEBUG): 303 if len(_cache) >= _MAXCACHE:

 ~\AppData\Local\Continuum\anaconda3\lib\sre_compile.py in compile(p, flags)
   560     if isstring(p):
   561         pattern = p

-> 562 p = sre_parse.parse(p,标志) 563其他: 564模式=无

--> 562 p = sre_parse.parse(p, flags) 563 else: 564 pattern = None

    ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in parse(str, flags, pattern)
    853 
    854     try:

-> 855 p = _parse_sub(源,模式,标志和SRE_FLAG_VERBOSE,0) 856(详细除外): 857#VERBOSE标志在样式内已打开.成为

--> 855 p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) 856 except Verbose: 857 # the VERBOSE flag was switched on inside the pattern. to be

  ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse_sub(source, state, verbose, nested)
     414     while True:
     415         itemsappend(_parse(source, state, verbose, nested + 1,

-> 416不嵌套且不是项)) 417,如果不是sourcematch("|"): 418休息

--> 416 not nested and not items)) 417 if not sourcematch("|"): 418 break

    ~\AppData\Local\Continuum\anaconda3\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
     614             if not item or (_len(item) == 1 and item[0][0] is AT):
     615                 raise source.error("nothing to repeat",

-> 616 source.tell()-此处+ len(this)) 617如果_REPEATCODES中的项目[0] [0]: 618提高source.error(多次重复",

--> 616 source.tell() - here + len(this)) 617 if item[0][0] in _REPEATCODES: 618 raise source.error("multiple repeat",

     error: nothing to repeat at position 0

此外,正则表达式也可以是^ CLK或?CLK或任何其他正则表达式 表达什么是通用的解决方案来解决此问题,当任何字符串与 正则表达式已传递

Also, the regular expression can also be ^CLK or?CLK or any other regular expression what is a generic solution to fix the issue when any string with regular expression has been passed

推荐答案

删除星号(*),并使用.contains方法而不是.match方法.使用case=False查找大写和小写字母

remove the asterisks (*) and use the .contains method instead of the .match method. Use case=False to find upper and lowercase letters

查看此代码:

text = 'CLK'
findtext = 'r'+text+".*"
colName = 'Signal'

df[colName].str.contains(text, case=False)

这篇关于如何基于作为输入参数传递给数据框列的基于unix的正则表达式过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆