带有正则表达式过滤意外字符的 Python str.strip() [英] Python str.strip() with regex filtering unexpected characters

查看:46
本文介绍了带有正则表达式过滤意外字符的 Python str.strip()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个我希望很简单的问题,但是我在试图解决这个问题时遇到了障碍.我试图从文件中每一行的开头删除 DateTime 时间戳,但是返回的信息正在切断我想保留的一些字符.我相当确定我的正则表达式没问题,并且基于 regex.group() 输出,它看起来不错.我发现带有字母c"和e"的行似乎将它们的字符剪掉了,而其他行则按预期工作.

Python 2.7.6(默认,2015 年 6 月 22 日,17:58:13)

[GCC 4.8.2] 在 linux2 上

<预><代码>>>>进口重新>>>>>>line2 = '[2010 年 12 月 1 日星期三 10:24:24] ceeeeest'>>>a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line2, re.I)>>>一组()'[2010 年 12 月 1 日星期三 10:24:24]'>>>a.groups()('[星期三', '十二月', ' ', '01', ' ', '10', ':', '24', ':', '24', '2010', ']')>>>b = a.group()>>>乙'[2010 年 12 月 1 日星期三 10:24:24]'>>>c = line2.strip(b)>>>C'英石'>>>

我希望 C 是ceeeeeest"

<预><代码>>>>line = '[Wed Dec 01 10:24:24 2010] testc'>>>a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line, re.I)>>>一组()'[2010 年 12 月 1 日星期三 10:24:24]'>>>a.groups()('[星期三', '十二月', ' ', '01', ' ', '10', ':', '24', ':', '24', '2010', ']')>>>b = a.group()>>>c = line.strip(b)>>>C'测试'>>>

我希望 c 是testc"

我在这里遗漏了一些非常基本的东西吗?请赐教.谢谢.

解决方案

str.strip 将删除参数中字符串开头和结尾的所有字符.您可能想要使用 str.replace 代替.

<预><代码>>>>line = '[Wed Dec 01 10:24:24 2010] testc'>>>line.replace('[Wed Dec 01 10:24:24 2010]', '')'测试'

您可以使用 去掉前导空格str.lstrip,或者使用 str.strip 如果你也想去掉尾随的空格(默认参数是空格).

I'm running into an issue that I hope is simple, however I've run into a wall trying to figure it out. I'm attempting to strip the DateTime timestamp from the beginning of each line in a file, however the returned information is cutting off some of the characters that I'd like to keep. I was fairly sure my regex is OK, and based on the regex.group() output, it looks good. I find that lines with the letters "c" and "e" seem to get their characters trimmed off, while other lines work as expected.

Python 2.7.6 (default, Jun 22 2015, 17:58:13)

[GCC 4.8.2] on linux2

>>> import re
>>>
>>> line2 = '[Wed Dec 01 10:24:24 2010] ceeeeest'
>>> a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line2, re.I)
>>> a.group()
'[Wed Dec 01 10:24:24 2010]'
>>> a.groups()
('[Wed ', 'Dec', ' ', '01', ' ', '10', ':', '24', ':', '24', ' 2010', ']')
>>> b = a.group()
>>> b
'[Wed Dec 01 10:24:24 2010]'
>>> c = line2.strip(b)
>>> c
'st'
>>>

I expect C to be "ceeeeest"

OR

>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> a = re.match(r'(\[[A-Za-z]{3}\s)?([A-Za-z]{3})(\s+)([0-9]{1,4})(\s+)([0-9]{2})(:)([0-9]{2})(:)([0-9]{2})(\s[0-9]{1,4})?(\])?', line, re.I)
>>> a.group()
'[Wed Dec 01 10:24:24 2010]'
>>> a.groups()
('[Wed ', 'Dec', ' ', '01', ' ', '10', ':', '24', ':', '24', ' 2010', ']')
>>> b = a.group()
>>> c = line.strip(b)
>>> c
'test'
>>>

I expect c to be "testc"

Is there something very basic I am missing here? Please enlighten me. Thank you.

解决方案

The method str.strip will remove all characters from the beginning and end of the string that are in the argument. You probably want to use str.replace instead.

>>> line = '[Wed Dec 01 10:24:24 2010] testc'
>>> line.replace('[Wed Dec 01 10:24:24 2010]', '')
' testc'

You can get rid of the leading white space by using str.lstrip, or use str.strip if you want to get rid of trailing white space too (the default arguments are white space).

这篇关于带有正则表达式过滤意外字符的 Python str.strip()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆