beautifulsoup 第12页 - IT屋-程序员软件开发技术分享社区

使用 BeautifulSoup 提取标题

我有这个来自 urllib 导入请求url = "http://www.bbc.co.uk/news/election-us-2016-35791008"html = request.urlopen(url).read().decode('utf8')html[:60]从 bs4 导入 BeautifulSoupraw = BeautifulSoup(html, 'html.parser'). ..

发布时间：2021-12-23 20:08:10 python-3.x beautifulsoup 其他开发

用beautifulsoup克隆元素

我必须将一个文档的一部分复制到另一个文档，但我不想修改我从中复制的文档. 如果我使用 .extract() 它会从树中删除元素.如果我只是附加像 document2.append(document1.tag) 这样的选定元素，它仍然会从 document1 中删除该元素. 当我使用真实文件时，我不能在修改后保存 document1，但是有没有办法在不损坏文档的情况下做到这一点? ..

发布时间：2021-12-23 20:07:52 python beautifulsoup Python

BeautifulSoup 可以保留 CDATA 部分吗?

我正在使用 BeautifulSoup 读取、修改和写入 XML 文件.我在删除 CDATA 部分时遇到了麻烦.这是一个简化的示例. 罪魁祸首 XML 文件: ?,./;'[]\-=]]> 这是 Python 脚本. from bs4 import BeautifulSoupxmlfile = op ..

发布时间：2021-12-23 20:07:42 python xml beautifulsoup lxml cdata Python

使用 BeautifulSoup 搜索雅虎财经

我正在尝试从“关键统计信息"页面中提取雅虎股票代码的信息(因为 Pandas 库不支持此功能). AAPL 示例: from bs4 import BeautifulSoup进口请求url = 'http://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL'页面 = requests.get(url)汤 = BeautifulSoup( ..

发布时间：2021-12-23 20:07:34 python beautifulsoup yahoo-finance Python

当 <tr> 时我该怎么办?有行跨度

如果该行具有 rowspan 元素，如何使该行与维基百科页面中的表格相对应. from bs4 import BeautifulSoup导入 urllib2从 lxml.html 导入 fromstring进口重新导入 csv将熊猫导入为 pdwiki = "http://en.wikipedia.org/wiki/List_of_England_Test_cricket_records"hea ..

发布时间：2021-12-23 20:07:22 python html pandas beautifulsoup 前端开发

Python 中的 Scraper 给出了“拒绝访问"

我正在尝试用 Python 编写一个抓取工具以从页面中获取一些信息.就像出现在此页面上的优惠标题一样: https://www.justdial.com/Panipat/Saree-Retailers/nct-10420585 现在我使用这个代码: 导入 bs4进口请求定义提取源(网址):来源=requests.get(url).text返回源定义提取数据(来源):汤=bs4.Beaut ..

发布时间：2021-12-23 20:07:11 python beautifulsoup python-requests Python

Beautiful Soup 'ResultSet' 对象没有属性 'text'

from bs4 import BeautifulSoup导入 urllib.request导入 win_unicode_consolewin_unicode_console.enable()链接 = ('https://pietroalbini.io/')req = urllib.request.Request(link, headers={'User-Agent': 'Mozilla/5.0' ..

发布时间：2021-12-23 20:07:03 python beautifulsoup Python

Bs4 select_one vs find

我想知道执行 bs.find('div') 和 bs.select_one('div') 之间有什么区别.find_all 和 select 也是如此. 在性能方面是否有任何差异，或者在特定情况下是否有更好的使用. 解决方案 select() 和 select_one() 为您提供了一种不同的方式来浏览 HTML 树，使用CSS 选择器，语法丰富且方便.虽然，BeautifulSou ..

发布时间：2021-12-23 20:06:59 python beautifulsoup html-parsing bs4 Python

Beautiful Soup 嵌套标签搜索

我正在尝试编写一个 Python 程序来计算网页上的字数.我使用 Beautiful Soup 4 来抓取页面，但是我在访问嵌套的 HTML 标签时遇到困难(例如: 在 ). 每次我尝试使用 page.findAll()(页面是包含整个页面的 Beautiful Soup 对象)方法查找这样的标签时，它根本找不到任何标签，尽管有.有什么简单的方法或 ..

发布时间：2021-12-23 20:06:49 python html beautifulsoup 前端开发

复杂的 Beautiful Soup 查询

这是我正在使用 Beautiful Soup 探索的 HTML 文件的片段. 网站我想为任何具有并且位于的行获取代码>. 是否可以使用 Beautiful Soup 查询 HTML 文件中的多个条件? 解决方案 BeautifulSoup 的搜索机 ..

发布时间：2021-12-23 20:06:38 python beautifulsoup Python

BeautifulSoup:提取 img alt 数据

我有以下图片 html，我正在尝试解析 alt 中的信息.目前我能够成功提取图像. html(我目前解析的内容 ..

发布时间：2021-12-23 20:06:28 python html beautifulsoup scrape 前端开发

将 BeautifulSoup 元素解析为 Selenium

我想获取一个使用selenium的网站的源代码；使用 BeautifulSoup 查找特定元素；然后将其作为 selenium.webdriver.remote.webelement 对象解析回 selenium.像这样: driver.get("www.google.com")汤 = BeautifulSoup(driver.source)元素 = 汤.find(title="搜索")elem ..

发布时间：2021-12-23 20:06:16 python html selenium beautifulsoup 前端开发

找到带有beautifulsoup的特定链接

嗨，我不知道如何在我的一生中找到以某些文本开头的链接.findall('a') 工作正常，但它太多了.我只想列出所有以http://www.nhl.com/ice/boxscore.htm?id= 有人可以帮我吗? 非常感谢解决方案先设置一个测试文档，用BeautifulSoup打开解析器: >>>从 BeautifulSoup 导入 BeautifulSoup>>>do ..

发布时间：2021-12-23 20:06:10 python regex beautifulsoup Python

Python beautifulsoup - 获取输入值

我有很多这样的表格行: 100 200 迭代: table = BeautifulSoup(response).find(id="sometable") # 做汤.for row in table.find_all("tr")[1:]: # 查找行.cells ..

发布时间：2021-12-23 20:05:57 python beautifulsoup Python

用 Python 抓取雅虎财务损益表

我正在尝试使用 Python 从雅虎财经的损益表中抓取数据.具体来说，假设我想要最新的净收入数据 Apple. 数据由一堆嵌套的 HTML 表格构成.我正在使用 requests 模块来访问和检索HTML. 我正在使用 BeautifulSoup 4 来筛选 HTML-结构，但我不知道如何得到这个数字. 这里是 Firefox 分析的截图. 到目前为止我的代码: fr ..

发布时间：2021-12-23 20:05:46 python html beautifulsoup yahoo-finance 前端开发

BeautifulSoup - 从 JS 中提取 json

我在玩 BeautilfulSoup，我正在寻找一种方法来在 JS 元素中获取特定的 json 字符串. 这是JS: window.pinball = window.pinball ||[];window.pinball.push(['add', {"srp_cleanup":"inactive","book_visit":"inactive","my_visits":"inactive" ..

发布时间：2021-12-23 20:05:42 python html json beautifulsoup html-parsing 前端开发

使用 BeautifulSoup 解析嵌套的 HTML 列表

我需要解析嵌套的 HTML 列表并将其转换为父子字典.鉴于此列表: 操作系统 Linux Debian Fedora Ubuntu 窗口 OS X 编程语言 Python C# 红宝石我想把它转换成这样的字典: ..

发布时间：2021-12-23 20:05:34 python dictionary html-parsing beautifulsoup Python

用漂亮的汤和熊猫刮桌子时如何保留链接

使用Beautiful Soup 和Pandas 抓取网络以获得一张桌子.其中一列有一些网址.当我将 html 传递给 Pandas 时，href 丢失了. 有没有办法只为该列保留 url 链接? 示例数据(为更好地适应情况而进行了编辑): 客户国家区域网站链接 ..

发布时间：2021-12-23 20:05:30 python pandas beautifulsoup Python

使用 BeautifulSoup 在 HTML 中搜索和替换

我想使用 BeautfulSoup 来搜索并将其替换为 .我知道如何使用 urllib2 打开，然后解析以提取所有标签.我想要做的是搜索并用结束标记和中断替换结束标记.任何帮助，非常感谢. 编辑我认为它类似于: soup.findAll('a'). 在文档中，有一个: find(text="ahh").replaceWith('万岁') 所以我认为它会是这样的: so ..

发布时间：2021-12-23 20:05:25 python beautifulsoup Python

BeautifulSoup .prettify() 的自定义缩进宽度

有没有办法为 .prettify() 函数定义自定义缩进宽度?从我可以从它的来源中获得 - def prettify(self, encoding=None, formatter="minimal"):如果编码为无:返回 self.decode(True, formatter=formatter)别的:返回 self.encode(encoding, True, formatter=format ..

发布时间：2021-12-23 20:05:11 python beautifulsoup indentation code-formatting pretty-print Python

beautifulsoup相关内容