Python正则表达式后顾需要固定宽度模式 [英] Python regex look-behind requires fixed-width pattern

查看：2225 发布时间：2018/6/14 20:07:51 python html regex

本文介绍了Python正则表达式后顾需要固定宽度模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 （？< p>< p> =< title。*>）（[\\\\ s] *）（？=< / title>）

这将提取文档中标签之间的所有内容，并忽略标签本身。然而，当试图在Python中使用这个正则表达式时，会引发下面的异常：

  Traceback（最近一次调用最后一次）：
在< module>文件中的test.py，第21行， 
 pattern = re.compile（'（？<= 在编译
文件C：\Python31\lib\re.py，第205行，返回_compile（pattern，flags）
文件C：\Python31\lib\re .py，第273行，在_compile 
p = sre_compile.compile（pattern，flags）文件
C：\Python31\lib\sre_compile.py，第495行，编译
 code = _code（p，flags）文件C：\Python31\lib\sre_compile.py，行480，在_code 
 _compile（code，p.data，flags）文件C： \Python31\lib\sre_compile.py，第115行，在_compile 
中引发错误（look-behind requires fixed-width pattern）
 sre_constants.error：look-behind需要fixed-宽度模式

我使用的代码是：

<？p $ p> pattern = re.compile（'（？<？< title。*>）（[\ s\S] *）（？=< / title> ;）'） m = pattern.search（f）

pattern = re.compile（'（？< =< title（>）（[\\\\\\\\\\\] *）（？=< / title>）'） m = pattern.search（f）
但是，这并不会考虑潜在的html标题，因为某些原因它们具有属性或类似的特征。

任何人都知道这个问题很好的解决方法？如果您只想获得标题标签，

$ b
解决方案
$ b
html = urllib2.urlopen（http：// somewhere）.read（） for html.split（< / title> ）：如果< title> in item： print item [item.find（< title>）+ 7：]

When trying to extract the title of a html-page I have always used the following regex:
(?<=<title.*>)([\s\S]*)(?=</title>)
Which will extract everything between the tags in a document and ignore the tags themselves. However, when trying to use this regex in Python it raises the following Exception:
Traceback (most recent call last): File "test.py", line 21, in <module> pattern = re.compile('(?<=<title.*>)([\s\S]*)(?=</title>)') File "C:\Python31\lib\re.py", line 205, in compile return _compile(pattern, flags) File "C:\Python31\lib\re.py", line 273, in _compile p = sre_compile.compile(pattern, flags) File "C:\Python31\lib\sre_compile.py", line 495, in compile code = _code(p, flags) File "C:\Python31\lib\sre_compile.py", line 480, in _code _compile(code, p.data, flags) File "C:\Python31\lib\sre_compile.py", line 115, in _compile raise error("look-behind requires fixed-width pattern") sre_constants.error: look-behind requires fixed-width pattern
The code I am using is:
pattern = re.compile('(?<=<title.*>)([\s\S]*)(?=</title>)') m = pattern.search(f)
if I do some minimal adjustments it works:
pattern = re.compile('(?<=<title>)([\s\S]*)(?=</title>)') m = pattern.search(f)
This will, however, not take into account potential html titles that for some reason have attributes or similar.

Anyone know a good workaround for this issue? Any tips are appreciated.
解决方案
If you just want to get the title tag,
html=urllib2.urlopen("http://somewhere").read() for item in html.split("</title>"): if "<title>" in item: print item[ item.find("<title>")+7: ]

这篇关于Python正则表达式后顾需要固定宽度模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python正则表达式后顾需要固定宽度模式 [英] Python regex look-behind requires fixed-width pattern

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python正则表达式后顾需要固定宽度模式 [英] Python regex look-behind requires fixed-width pattern

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭