美丽的汤和提取的div和ID的内容 [英] Beautiful Soup and extracting a div and its contents by ID

查看:128
本文介绍了美丽的汤和提取的div和ID的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

soup.find("tagName", { "id" : "articlebody" })

为什么这个不返回< D​​IV ID =articlebody> ...< / DIV> 标签和东西之间?它没有返回值。而且我知道它的存在,因为我在这盯着,从权利的事实。

Why does this NOT return the <div id="articlebody"> ... </div> tags and stuff in between? It returns nothing. And I know for a fact it exists because I'm staring right at it from

soup.prettify()

soup.find(格,{ID:articlebody})。也不起作用。

编辑:目前还没有答案这个职位 - 我怎么删除呢?我发现,BeautifulSoup不正确解析,这可能实际上意味着我试图解析的格式不正确的SGML或任何页面。

There is no answer to this post - how do I delete it? I found that BeautifulSoup is not parsing correctly, which probably actually means the page I'm trying to parse isn't properly formatted in SGML or whatever.

推荐答案

您应该张贴您的示例文档,因为code正常工作:

You should post your example document, because the code works fine:

>>> import BeautifulSoup
>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div id="articlebody"> ... </div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>

查找&LT; D​​IV&GT; 的内线&LT; D​​IV&GT; 作品还有:

>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html')
>>> soup.find("div", {"id": "articlebody"})
<div id="articlebody"> ... </div>

这篇关于美丽的汤和提取的div和ID的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆