Beautiful Soup 并通过 ID 提取 div 及其内容 [英] Beautiful Soup and extracting a div and its contents by ID

查看:37
本文介绍了Beautiful Soup 并通过 ID 提取 div 及其内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

soup.find("tagName", { "id" : "articlebody" })

为什么这不返回 <div id="articlebody">... </div> 标签和中间的东西?它什么都不返回.我知道它存在的事实是因为我正盯着它看

Why does this NOT return the <div id="articlebody"> ... </div> tags and stuff in between? It returns nothing. And I know for a fact it exists because I'm staring right at it from

soup.prettify()

soup.find("div", { "id" : "articlebody" }) 也不起作用.

(我发现 BeautifulSoup 没有正确解析我的页面,这可能意味着我试图解析的页面没有在 SGML 或其他格式中正确格式化)

( I found that BeautifulSoup wasn't correctly parsing my page, which probably meant the page I was trying to parse isn't properly formatted in SGML or whatever)

推荐答案

你应该发布你的示例文档,因为代码工作正常:

You should post your example document, because the code works fine:

s 中查找

Finding <div>s inside <div>s works as well:

s 也可以:

>>> soup = BeautifulSoup.BeautifulSoup('<html><body><div><div id="articlebody"> ... </div></div></body></html') >>> soup.find("div", {"id": "articlebody"}) <div id="articlebody"> ... </div>

这篇关于Beautiful Soup 并通过 ID 提取 div 及其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆