两个标签之间的Python HTML解析 [英] Python HTML Parsing Between two tags

查看:79
本文介绍了两个标签之间的Python HTML解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我正在研究一个小型文件上传器,并且从API页面获得了以下响应.

Today I was looking into a small file uploader and I got the following response from the API page.

upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html

我需要得到两个<br>标记之间的部分.我正在使用Beautifulsoup和这段代码,但是它返回None.

I need to get the part between the two <br> tags. I am using Beautifulsoup and this code but it returns None.

fpbs = BeautifulSoup(filepup.text)
finallink = fpbs.find('br', 'br')
print(finallink)

推荐答案

您不能在两个标签之间搜索文本,不能.您可以找到第一个<br>标签,然后使用其下一个兄弟姐妹,但是:

You cannot search for text between two tags, no. You can locate the first <br> tag, then take its next sibling, however:

>>> soup = BeautifulSoup('upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html')
>>> soup.find('br')
<br/>
>>> soup.find('br').next_sibling
u'http://www.filepup.net/files/R6wVq1405781467.html'

可以使用 CSS选择器搜索来搜索相邻的兄弟,然后抢前一个同级;使用CSS时,只有标签是同级标签,而使用BeautifulSoup时,文本节点也要计数.

You could use a CSS selector search to search for an adjacent sibling, then grab the preceding sibling; to CSS only the tags are siblings, but to BeautifulSoup the text nodes count too.

两个CSS选择器之间的相邻选择是+,并选择两个中的第二个; br + br会选择第二位的任何br标签.

The adjacent select is + between two CSS selectors, and selects the second of the two; br + br would select any br tag that comes second.

与父节点(例如特定的ID或类)一起使用,可能是非常强大的组合:

Together with a parent node (say a specific id or class) that can be a very powerful combination:

>>> soup = BeautifulSoup('''\
... <div id="div1">
...     some text
...     <br/>
...     some target text
...     <br/>
...     foo bar
... </div>
... <div id="div2">
...     some more text
...     <br/>
...     select me, ooh, pick me!
...     <br/>
...     fooed the bar!
... </div>
... ''')
>>> soup.select('#div2 br + br')[0]
<br/>
>>> soup.select('#div2 br + br')[0].previous_sibling
u'\n    select me, ooh, pick me!\n    '

这在特定的<div>标签中的两个<br>标签之间选择了一个非常特定的文本节点.

This picked a very specific text node between two <br> tags, in a specific <div> tag.

这篇关于两个标签之间的Python HTML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆