如何使用正则表达式使点匹配换行符 [英] How to make dot match newline characters using regular expressions

查看:167
本文介绍了如何使用正则表达式使点匹配换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,该字符串在和之间包含普通字符,白色字符集和换行符.此正则表达式不起作用:/<div>(.*)<\/div>.这是因为.*与换行符不匹配.我的问题是,该怎么做?

I have a string that contains normal characters, white charsets and newline characters between and . This regular expression doesn't work: /<div>(.*)<\/div>. It is because .* doesn't match newline characters. My question is, how to do this?

推荐答案

您需要使用修饰符.

'/<div>(.*)<\/div>/s'

这可能无法完全满足您的需求,因为您的贪婪匹配.您可以改为尝试非贪婪的匹配:

This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:

'/<div>(.*?)<\/div>/s'

您也可以通过匹配除'<'以外的所有内容来解决此问题.如果没有其他标签:

You could also solve this by matching everything except '<' if there aren't other tags:

'/<div>([^<]*)<\/div>/'

另一个观察结果是您不需要使用/作为正则表达式定界符.使用另一个字符意味着您不必在</div>中转义/,从而提高了可读性.这适用于所有上述正则表达式.如果您使用'#'而不是'/',则如下所示:

Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':

'#<div>([^<]*)</div>#'

但是,所有这些解决方案都可能由于嵌套的div,多余的空格,HTML注释和其他各种原因而失败. HTML太复杂,无法使用Regex进行解析,因此您应该考虑使用HTML解析器.

However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.

这篇关于如何使用正则表达式使点匹配换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆