Beautiful Soup 并通过 ID 提取 div 及其内容 [英] Beautiful Soup and extracting a div and its contents by ID
问题描述
soup.find("tagName", { "id" : "articlebody" })
为什么这不返回 <div id="articlebody">... </div>
标签和中间的东西?它什么都不返回.我知道它存在的事实是因为我正盯着它看
Why does this NOT return the <div id="articlebody"> ... </div>
tags and stuff in between? It returns nothing. And I know for a fact it exists because I'm staring right at it from
soup.prettify()
soup.find("div", { "id" : "articlebody" })
也不起作用.
(我发现 BeautifulSoup 没有正确解析我的页面,这可能意味着我试图解析的页面没有在 SGML 或其他格式中正确格式化)
( I found that BeautifulSoup wasn't correctly parsing my page, which probably meant the page I was trying to parse isn't properly formatted in SGML or whatever)
推荐答案
你应该发布你的示例文档,因为代码工作正常:
You should post your example document, because the code works fine:
在
Finding <div>
s inside <div>
s works as well:
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>
这篇关于Beautiful Soup 并通过 ID 提取 div 及其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!