Python字符串操作，提取html标签之间的文本 [英] Python string operation, extract text between html tags

查看：1008 发布时间：2018/6/14 19:43:31 python html string parsing

本文介绍了Python字符串操作，提取html标签之间的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串：

 < font face =ARIAL，HELVETICAsize = -  2> 
 JUL 28< / font>

（它输出两行，所以必须有\\\
。

我希望提取 标签之间的字符串，在这种情况下，它是JUL 28，但它可能是另一个日期或其他数字。

1）从字体标签之间提取值的最佳方法是什么？我想我可以提取> 和< / 之间的所有内容。

编辑：移除第二个问题。
解决方案
尽管可以通过常规解析任意HTML表达式，它通常是一个死亡陷阱。有很多用于解析HTML的工具，包括 BeautifulSoup ，它是一个可以处理 broken 以及良好HTML的Python库。
;>>>从BeautifulSoup导入BeautifulSoup as BSHTML
>>>> BS = BSHTML（
...
... JUL 28
...）
>>> BS.font.contents [0] .strip（）
u'JUL 28'

然后你只需要解析日期：
>>>>> datetime.strptime（BS.font.contents [0] .strip（），'％ b％d'） >>> datetime.datetime（1900，7，28，0，0） datetime.datetime（1900，7，28，0，0）

I have a string:
 JUL 28 
(it outputs over two lines, so there must be a \n in there.

I wish to extract the string that's in between the  tags. In this case, it's JUL 28, but it might be another date or some other number.

1) The best way to extract the value from between the font tags? I was thinking I could extract everything in between "> and </.

edit: second question removed.
解决方案
While it may be possible to parse arbitrary HTML with regular expressions, it's often a death trap. There are great tools out there for parsing HTML, including BeautifulSoup, which is a Python lib that can handle broken as well as good HTML fairly well.
>>> from BeautifulSoup import BeautifulSoup as BSHTML >>> BS = BSHTML(""" ... ... JUL 28 """ ... ) >>> BS.font.contents[0].strip() u'JUL 28'
Then you just need to parse the date:
>>> datetime.strptime(BS.font.contents[0].strip(), '%b %d') >>> datetime.datetime(1900, 7, 28, 0, 0) datetime.datetime(1900, 7, 28, 0, 0)

这篇关于Python字符串操作，提取html标签之间的文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python字符串操作，提取html标签之间的文本 [英] Python string operation, extract text between html tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python字符串操作，提取html标签之间的文本 [英] Python string operation, extract text between html tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭