Python Regex忽略特定的字符串以查找下一个示例 [英] Python Regex ignore specific string to find next example
问题描述
我有以下代码运行并剥离当前列中的数据,并创建一个仅用括号括起来的代码的第二列,这在示例2和示例2中非常有用. 3.但是,在示例一中,我看到由于日期也在括号中而导致日期被提取的情况.有没有办法重新编写代码以忽略括号中带有日期戳的任何内容并继续在该记录中查找其他内容,例如在场景1中,扫描记录一,忽略(2018-03)并选择(256) .我们值得拥有的数据集具有3、4、5和其他各种记录代码,但是此日期类型是唯一的,可以删除.
I have the following code that runs through and strips the data in the current column and creates a secondary column with just the code in parentheses and this works wonderfully in example 2 & 3. However in example one, i am seeing situations where the date is being picked up because it is also in parentheses. Is there a way to rework the code to ignore anything within the parenthesis that has a datestamp and continue to look for something else within that record, for example in scenario 1, scan record one, ignore(2018-03) and select (256). The datasets we worth with have 3,4,5 and other various of record codes, but this date type is unique and can be removed.
代码:
df1['Doc ID'] = df['Folder Path'].str.extract('.*\((.*)\).*',expand=True)
数据表:
current column new column
1 /reports/support + admin. (256)/ Global (2018-03) (2018-03)
2 /reports/limit/sector(139)/2017 (139)
3 /reports/sector/region(147,189 and 132)/2018 (147,189 and 132)
推荐答案
您可以使用
df['Folder Path'].str.extract(r'\((?!\d{4}-\d{2}\)|Data Only\))([^()]*)\)',expand=True)
正则表达式匹配
-
\(
-右括号 -
(?!\d{4}-\d{2}\)|Data Only\))
-负前瞻,如果存在,则匹配失败
\(
- an open parenthesis(?!\d{4}-\d{2}\)|Data Only\))
- a negative lookahead that fails the match if there is
-
\d{4}-\d{2}\)
-4位数字,连字符,2个连字符,)
-
|
-或 -
Data Only\)
-Data Only)
substrinbg
\d{4}-\d{2}\)
- 4 digits, hyphen, 2 hyphens,)
|
- orData Only\)
-Data Only)
substrinbg
([^()]*)
-组1:除开/关括号外的任何0个或多个字符
([^()]*)
- Group 1: any 0 or more chars other than open/close parentheses
请参见 regex演示.
这篇关于Python Regex忽略特定的字符串以查找下一个示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!