使用正则表达式解析Whatsapp聊天日志 [英] Whatsapp chat log parsing with regex

查看：121 发布时间：2020/7/1 4:52:30 python regex regex-lookarounds regex-group python-regex

本文介绍了使用正则表达式解析Whatsapp聊天日志的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用正则表达式解析whatsapp聊天日志.我有一个适用于大多数情况的解决方案，但我正在寻求改进，但不知道如何解决，因为我是regex的新手.

I'm trying to parse a whatsapp chat log using regex. I have a solution that works for most cases but I'm looking to improve it but don't know how to since I am quite new to regex.

chat.txt文件如下所示:

The chat.txt file looks like this:

[06.12.16, 16:46:19] Person One: Wow thats amazing
[06.12.16, 16:47:13] Person Two: Good morning and this goes over multiple
lines as it is a very long message
[06.12.16, 16:47:22] Person Two: ::

尽管到目前为止，我的解决方案可以正确解析其中的大多数消息，但是我有数百种情况，其中消息以冒号开头，例如上面的最后一个示例.这将导致发送方Person Two: :成为不必要的值.

While my solution so far would parse most of these messages correctly, however I have a few hundred cases where the message starts with a colon, like the last example above. This leads to an unwanted value of Person Two: : as the sender.

这是到目前为止我正在使用的正则表达式:

Here is the regex I am working with so far:

pattern = re.compile(r'\[(?P<date>\d{2}\.\d{2}\.\d{2}),\s(?P<time>\d{2}:\d{2}:\d{2})]\s(?P<sender>(?<=\s).*(?::\s*\w+)*(?=:)):\s(?P<message>(?:.+|\n+(?!\[\d{2}\.\d{2}\.\d{2}))+)')

任何有关如何解决此错误的建议将不胜感激！

Any advice on how I could go around this bug would be appreciated!

推荐答案

在应用正则表达式之前，我将对列表进行预处理以删除连续的冒号.因此，对于每一行，例如

i would pre-process the list to remove the consecutive colons before applying the regex. So for each line e.g

 line = [06.12.16, 16:47:22] Person Two: ::
 line = line.replace("::","")

这将给:

[06.12.16, 16:47:22] Person Two:

然后您可以对预处理后的数据调用正则表达式.

You can then call your regex function on the pre-processed data.

这篇关于使用正则表达式解析Whatsapp聊天日志的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用正则表达式解析Whatsapp聊天日志 [英] Whatsapp chat log parsing with regex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用正则表达式解析Whatsapp聊天日志 [英] Whatsapp chat log parsing with regex

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭