在 Python 中如何抓取页面每个链接上的哪些内容会发生变化? [英] In Python how scrape page what content changes on each link?

查看:16
本文介绍了在 Python 中如何抓取页面每个链接上的哪些内容会发生变化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 3 中,我需要在 此页面这个

In Python 3 I need to scrape a table on this page or this

该表包含Descrição"、Tipo"和Valor do Bem"列

It is the table that has the columns "Descrição", "Tipo" and "Valor do Bem"

我做了一个检查元素,表格是:

I did an inspect element and the table is:

<table class="table table-stripped dvg-table responsive">

但是显示请求的内容时,没有出现此项

But when showing the content of the requests, this item does not appear

这是一个有政治资料的网站,所以标题会相对固定.永远改变的底面

It is a site with political profiles, so the header will be relatively fixed. The underside that will always change

显然,标头中的站点链接就是请求找到的内容.但是表格内容的访问方式不同.对于每个政治家,该网站是否会查看指向该表格的另一个链接?

Apparently the site link in the header is what the requests found. But the contents of the table are accessed differently. For each politician does the site look in another link to the table?

我这样做了:

from bs4 import BeautifulSoup
import requests

requisicao = requests.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2010/14417/AC/10000000001/bens')
# requisicao.content
sopa = BeautifulSoup(requisicao.content, "html.parser")
sopa.find("table", {"class": "table table-stripped dvg-table responsive"})

请问,有人知道我如何访问这张表吗?

Please, does anyone know how I could access this table?

推荐答案

您可以通过以下请求获取所需数据:

You can get required data with below request:

import requests
import json

url = "http://divulgacandcontas.tse.jus.br/divulga/rest/v1/candidatura/buscar/2016/71072/2/candidato/250000004975"
response = requests.get(url)
print(response.json())

您可以获得更多具体信息

More specific info you can get as

print(response.json()['bens'])

print(response.json()['partido'])

等等...

这篇关于在 Python 中如何抓取页面每个链接上的哪些内容会发生变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆