在BeautifulSoup中使用dict解析脚本标签 [英] Parsing a script tag with dicts in BeautifulSoup

查看：181 发布时间：2020/9/20 6:30:35 python python-3.x beautifulsoup tags

本文介绍了在BeautifulSoup中使用dict解析脚本标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对这个问题的部分答案，我来了在bs4.element.Tag上，这是一堆嵌套的字典和列表(下面是s).

Working on a partial answer to this question, I came across a bs4.element.Tag that is a mess of nested dicts and lists (s, below).

是否可以使用re.find_all返回s 中没有的url列表?有关此标签结构的其他注释也很有帮助.

Is there a way to return a list of urls contained in s without using re.find_all? Other comments regarding the structure of this tag are helpful too.

from bs4 import BeautifulSoup
import requests

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)
soup = BeautifulSoup(r.text, 'html.parser')

s = soup.find('script', type='application/ld+json')

## the first bit of s:
# s
# Out[116]: 
# <script type="application/ld+json">
# {"@context":"http://schema.org","@type":"ItemList","numberOfItems":50,

我尝试过的:

随机浏览s上带有制表符补全的方法.
通过文档进行选择.

randomly perusing through methods with tab completion on s.
picking through the docs.

我的问题是s仅具有1个属性(type)，并且似乎没有任何子标记.

My problem is that s only has 1 attribute (type) and doesn't seem to have any child tags.

推荐答案

您可以使用s.text获取脚本的内容.它是JSON，因此您可以使用json.loads对其进行解析.从那里开始，它是简单的字典访问:

You can use s.text to get the content of the script. It's JSON, so you can then just parse it with json.loads. From there, it's simple dictionary access:

import json

from bs4 import BeautifulSoup
import requests

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
r = requests.get(link)

soup = BeautifulSoup(r.text, 'html.parser')

s = soup.find('script', type='application/ld+json')

urls = [el['url'] for el in json.loads(s.text)['itemListElement']]

print(urls)

这篇关于在BeautifulSoup中使用dict解析脚本标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在BeautifulSoup中使用dict解析脚本标签 [英] Parsing a script tag with dicts in BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在BeautifulSoup中使用dict解析脚本标签 [英] Parsing a script tag with dicts in BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭