两个标签之间的Python HTML解析 [英] Python HTML Parsing Between two tags
问题描述
今天,我正在研究一个小型文件上传器,并且从API页面获得了以下响应.
Today I was looking into a small file uploader and I got the following response from the API page.
upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html
我需要得到两个<br>
标记之间的部分.我正在使用Beautifulsoup和这段代码,但是它返回None.
I need to get the part between the two <br>
tags. I am using Beautifulsoup and this code but it returns None.
fpbs = BeautifulSoup(filepup.text)
finallink = fpbs.find('br', 'br')
print(finallink)
推荐答案
您不能在两个标签之间搜索文本,不能.您可以找到第一个<br>
标签,然后使用其下一个兄弟姐妹,但是:
You cannot search for text between two tags, no. You can locate the first <br>
tag, then take its next sibling, however:
>>> soup = BeautifulSoup('upload_success<br>http://www.filepup.net/files/R6wVq1405781467.html<br>http://www.filepup.net/delete/Jp3q5w1405781467/R6wVq1405781467.html')
>>> soup.find('br')
<br/>
>>> soup.find('br').next_sibling
u'http://www.filepup.net/files/R6wVq1405781467.html'
您可以使用 CSS选择器搜索来搜索相邻的兄弟,然后抢前一个同级;使用CSS时,只有标签是同级标签,而使用BeautifulSoup时,文本节点也要计数.
You could use a CSS selector search to search for an adjacent sibling, then grab the preceding sibling; to CSS only the tags are siblings, but to BeautifulSoup the text nodes count too.
两个CSS选择器之间的相邻选择是+
,并选择两个中的第二个; br + br
会选择第二位的任何br
标签.
The adjacent select is +
between two CSS selectors, and selects the second of the two; br + br
would select any br
tag that comes second.
与父节点(例如特定的ID或类)一起使用,可能是非常强大的组合:
Together with a parent node (say a specific id or class) that can be a very powerful combination:
>>> soup = BeautifulSoup('''\
... <div id="div1">
... some text
... <br/>
... some target text
... <br/>
... foo bar
... </div>
... <div id="div2">
... some more text
... <br/>
... select me, ooh, pick me!
... <br/>
... fooed the bar!
... </div>
... ''')
>>> soup.select('#div2 br + br')[0]
<br/>
>>> soup.select('#div2 br + br')[0].previous_sibling
u'\n select me, ooh, pick me!\n '
这在特定的<div>
标签中的两个<br>
标签之间选择了一个非常特定的文本节点.
This picked a very specific text node between two <br>
tags, in a specific <div>
tag.
这篇关于两个标签之间的Python HTML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!