BeautifulSoup find_all()是否保留标签顺序? [英] Does BeautifulSoup find_all() preserve tag order?
问题描述
我希望使用BeautifulSoup解析一些HMTL.我有一张有几行的桌子.我试图找到满足某些条件(某些属性值)的行,并稍后在我的代码中使用该行的索引.
I wish to use BeautifulSoup to parse some HMTL. I have a table with several rows. I'm trying to find a row that meets certain conditions (certain attribute values) and use the index of that row later on in my code.
问题是:find_all()
是否在返回的结果集中保留行的顺序?
The question is: does find_all()
preserve the order of my rows in the result set that it returns?
我在 docs 中找不到此文件,而Google却找到了我仅针对此答案:
I didn't find this in the docs and Googling got me only to this answer:
"BeautifulSoup标签不会在页面中跟踪其顺序,不."
'BeautifulSoup tags don't track their order in the page, no.'
但是他没有说他从哪里得到这些信息.
but he does not say where he got that information from.
我对答案很满意,但对一些解释该问题的文档的指针感到更加满意.
I'd be happy with an answer, but even more happy with a pointer to some documentation that explains this.
dstudeba使用next_sibling
向我指出了这种解决方法"的方向.
dstudeba pointed me in the direction of this 'workaround' using next_sibling
.
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('./mytable.html'), 'html.parser')
row = soup.find('tr', {'class':'something', 'someattr':'somevalue'})
myvalues = []
while True:
cell = row.find('td', {'someattr':'cellspecificvalue'})
myvalues.append(cell.get_text())
row = row.find_next_sibling('tr', {'class':'something', 'someattr':'somevalue'})
if not row:
break
这使我可以按需要在我的html文件中显示的顺序显示单元格内容.
This gets me the cell contents I need in the order they appear in my html file.
但是,我仍然想知道在BeautifulSoup文档中的什么地方我可以找到find_all()
是否保留顺序.这就是为什么我不接受dstudeba的回答. (我的投票未显示,代表人数还不够:P)
However I'd still like to know where in the BeautifulSoup docs I could find whether find_all()
preserves order or not. This is why I'm not accepting dstudeba's answer. (my upvote doesn't show, not enough rep yet :P)
推荐答案
根据我的经验,find_all
确实保留了顺序.但是,请确保可以使用find_all_next
方法,该方法使用find_next
方法来保留订单. 此处是链接到文档.
It is my experience that find_all
does preserve order. However to make sure you can use the find_all_next
method which uses the find_next
method which will preserve the order. Here is a link to the documentation.
这篇关于BeautifulSoup find_all()是否保留标签顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!