美丽的汤和提取的div和ID的内容 [英] Beautiful Soup and extracting a div and its contents by ID
问题描述
soup.find("tagName", { "id" : "articlebody" })
为什么这个不返回< DIV ID =articlebody> ...< / DIV>
标签和东西之间?它没有返回值。而且我知道它的存在,因为我在这盯着,从权利的事实。
Why does this NOT return the <div id="articlebody"> ... </div>
tags and stuff in between? It returns nothing. And I know for a fact it exists because I'm staring right at it from
soup.prettify()
soup.find(格,{ID:articlebody})。
也不起作用。
编辑:目前还没有答案这个职位 - 我怎么删除呢?我发现,BeautifulSoup不正确解析,这可能实际上意味着我试图解析的格式不正确的SGML或任何页面。
There is no answer to this post - how do I delete it? I found that BeautifulSoup is not parsing correctly, which probably actually means the page I'm trying to parse isn't properly formatted in SGML or whatever.
推荐答案
您应该张贴您的示例文档,因为code正常工作:
You should post your example document, because the code works fine:
>>> import BeautifulSoup
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div id="articlebody"> ... </div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>
查找&LT; DIV&GT;
的内线&LT; DIV&GT;
作品还有:
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>
这篇关于美丽的汤和提取的div和ID的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!