正则表达式以匹配whatsapp聊天记录 [英] Regex to match whatsapp chat log

查看:138
本文介绍了正则表达式以匹配whatsapp聊天记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试为WhatsApp聊天记录创建正则表达式.

I've been trying to create Regex for WhatsApp chat log.

到目前为止,我已经能够实现这一目标

So far I've been able to achieve this

单击此处以获得测试链接

通过创建以下正则表达式:

By creating the following Regex:

(?P<datetime>\d{2}\/\d{2}\/\d{4},\s\d(?:\d)?:\d{2} [pa].m.)\s-\s(?P<name>[^:]*):(?P<message>.*)

此正则表达式的问题在于,它无法匹配跨越多行且带有换行符的大消息.您可以在上面提供的链接中看到问题.

The problem with this regex is, it is not able to match big messages which span multiple lines with line breaks. You can see the issue in the link provided above.

我们将不胜感激.

谢谢.

推荐答案

去那里:

^
(?P<datetime>\d{2}/\d{2}/\d{4}[^-]+)\s+-\s+
(?P<name>[^:]+):\s+
(?P<message>[\s\S]+?)
(?=^\d{2}|\Z)

在regex101.com上查看修改后的 演示 .


本质上,我添加了锚点,简化了日期时间部分,并插入了[\s\S]+?,这意味着:延迟匹配任何内容(包括换行符),直到以下条件可以提前.前瞻确保在换行符之后(可能会变紧!)或字符串的最末尾有另外两位数字.

See your modified demo on regex101.com.


Essentially, I added anchors, simplified your datetime part and inserted a [\s\S]+? which means: match anything lazily (including newlines) up to the following condition which is a lookahead. The lookahead makes sure there's either another two digits right after a newline (could be tightened!) or the very end of the string.

这篇关于正则表达式以匹配whatsapp聊天记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆