正则表达式行尾和特定字符 [英] Regex End of Line and Specific Chracters
问题描述
所以我正在编写一个 Python 程序,它读取串行数据行,并将它们与行代码字典进行比较,以确定正在传输哪些特定行.我正在尝试使用正则表达式来过滤掉串行读取字符串上的额外垃圾行,但我遇到了一些问题.
So I'm writing a Python program that reads lines of serial data, and compares them to a dictionary of line codes to figure out which specific lines are being transmitted. I am attempting to use a Regular Expression in order to filter out the extra garbage line serial read string has on it, but I'm having a bit of an issue.
我字典中的每一个代码都是这样的:T12F8B0A22**F8
.星号是区分每个字符串代码的两个字母数字部分.
Every single code in my dictionary looks like this: T12F8B0A22**F8
. The asterisks are the two alpha numeric pieces that differentiate each string code.
这是我目前的正则表达式:'/^T12F8B0A22[A-Z0-9]{2}F8$/'
This is what I have so far as my regex: '/^T12F8B0A22[A-Z0-9]{2}F8$/'
但是我遇到了一些错误.我的第一个错误是,有些字符是我仍然需要删除的字符串的结尾,这很奇怪,因为我认为 $/
在正则表达式中表示该行的结尾.但是,当我通过调试器运行我的代码时,我注意到在运行以下代码后:
I am getting a few errors with this however. My first error, is that there are some characters are the end of the string I still need to get rid of, which is odd because I thought $/
denoted the end of the line in regex. However when I run my code through the debugger I notice that after running through the following code:
#regexString contains the serial read line data
regexString = re.sub('/^T12F8B0A22[A-Z0-9]{2}F8$/', '', regexString)
我的字符串看起来像这样:'T12F8B0A2200F8\\r'
My string looks something like this: 'T12F8B0A2200F8\\r'
我需要去掉\\r
.
如果由于某种原因我无法用正则表达式摆脱它,那么在 python 中你如何通过参数发送特定的字符串字符?在这种情况下,我想它是长度 - 3?
If for some reason I can't get rid of this with regex, how in python do you send specific string character through an argument? In this case I suppose it would be length - 3?
推荐答案
您的问题有三个:
1) 您的字符串在 \n
(换行符)之前包含额外的 \r
(回车符);这在 Windows 和网络通信协议中很常见;最好从字符串中删除任何尾随空格:
1) your string contains extra \r
(Carriage Return character) before \n
(New Line character); this is common in Windows and in network communication protocols; it is probably best to remove any trailing whitespace from your string:
regexString = regexString.rstrip()
2) 正如 Wiktor Stribiżew 所提到的,您的 regexp 不必要地被 /
字符包围 - 某些语言,如 Perl,将 regexp 定义为由 /
字符分隔的字符串,但 Python 不是其中之一;
2) as mentioned by Wiktor Stribiżew, your regexp is unnecessarily surrounded with /
characters - some languages, like Perl, define regexp as a string delimited by /
characters, but Python is not one of them;
3) 您使用 re.sub
的指令实际上是用空字符串替换 regexString
的匹配部分 - 我相信这与您想要的完全相反(您想保持匹配并删除其他所有内容,对吗?);这就是为什么修复正则表达式会使事情更糟".
3) your instruction using re.sub
is actually replacing the matching part of regexString
with an empty string - I believe this is the exact opposite of what you want (you want to keep the match and remove everything else, right?); that's why fixing the regexp makes things "even worse".
总而言之,我认为您应该使用它而不是您当前的代码:
To summarize, I think you should use this instead of your current code:
m = re.match('T12F8B0A22[A-Z0-9]{2}F8', regexString)
regexString = m.group(0)
这篇关于正则表达式行尾和特定字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!