为什么 Python 正则表达式通配符只匹配 newLine [英] Why is Python Regex Wildcard only matching newLine

查看:29
本文介绍了为什么 Python 正则表达式通配符只匹配 newLine的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序来使用 Python RegEx 解析日志消息.我已经把所有东西都放在了日志消息之前.这可以是任意数量的字符类型,因此我假设 .* 通配符将是此问题的最佳解决方案.它匹配除新行之外的所有内容.

I am writing a program to parse through log messages using Python RegEx. I've gotten everything situated up until the message of the log. This could be any number of types of characters so I'm assuming the .* wildcard symbol would be the best solution for this problem. It matches everything except for a new line.

但是,当我使用通配符时,唯一返回的是本例中的新行.有任何想法吗?这是代码和输出:

However, when I'm using the wildcard the only thing returning is the new line in this instance. Any ideas? Here's the code and the output:

import os
import re
#Change to and print correct file path
os.chdir('/Users/MacUser/Desktop/regExPython')
print(os.getcwd())

#Iterate and read from syslogexample.txt then print results
line_number = 0
with open('syslogexample.txt', 'r') as syslog:
    log_lines = syslog.readlines()
    for line in log_lines:
        line_number += 1
        print('{:>4} {}'.format(line_number, line.rstrip()))


#Build regex to parse through the data
DATE_RE = r'(\w{3}\s+\d{2})'
TIME_RE = r'(\d{2}:\d{2}:\d{2})'
DEVICE_RE = r'(\S+)'
PROCESS_RE = r'(\S+\s+\S+:)'
MESSAGE_RE = r'(.*)'
CD_RE = r'(\s+)'

Syslog_RE = DATE_RE + CD_RE + \
            TIME_RE + CD_RE + \
            DEVICE_RE + CD_RE + \
            PROCESS_RE + CD_RE + \
            MESSAGE_RE

#Use regex to parse through the data
for line in log_lines:
    m = re.match(Syslog_RE, line)
    if m:
        print(m.groups())

#Printed log Files
      1 apr 29 08:22:13 mac-users-macbook-8 syslogd[49]: asl sender statistics
   2 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
   3 service "com.apple.emond.aslmanager" tried to hijack endpoint "com.apple.aslmanager" from owner:
   4 com.apple.aslmanager
   5 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
   6 service "com.apple.emond.aslmanager" tried to hijack endpoint
   7 "com.apple.activity_tracing.cache-delete" from owner: com.apple.aslmanager
   8 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):
   9 endpoint has been activated through legacy launch(3) apis. please switch to xpc or
  10 bootstrap_check_in(): com.apple.bsd.dirhelper
  11 apr 29 08:22:19 mac-users-macbook-8 com.apple.xpc.launchd[1]
  12 (com.apple.imfoundation.imremoteurlconnectionagent): unknown key for integer:
  13 _dirtyjetsammemorylimit

Parsed Log Files
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):', '\n', '')

Process finished with exit code 0

正如你在最后看到的,MESSAGE_RE 是唯一打印的字符是 \n 换行符,我认为它根本不会打印.

As you can see at the end where MESSAGE_RE is the only printed characters are the \n newline characters which I thought wouldn't be printing at all.

谢谢大家!

推荐答案

in http://www.regex101.com 正则表达式无法正常工作,因为 .* 只捕获 until 换行符,这意味着在换行符从即第 3 行到第 4 行时它停止匹配.也许尝试 re.compile() 并在 re.match() 之前编译正则表达式.在 python 正则表达式模块中,有 DOTALL 标志可以使 . 匹配换行符以及 http://docs.python.org/2/library/re.html

in http://www.regex101.com the regex does not work correctly because .* only captures until newline character, meaning at the linebreak from i.e. line 3 to 4 it stops matching. maybe try re.compile() and compile the regex before re.match(). in python regex module there is the DOTALL flag that enables . to match newline characters as well http://docs.python.org/2/library/re.html

这篇关于为什么 Python 正则表达式通配符只匹配 newLine的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆