如何在 Python 3 中使用请求绕过单选按钮抓取数据? [英] How to scrape data bypassing radio button using request in Python 3?

查看:33
本文介绍了如何在 Python 3 中使用请求绕过单选按钮抓取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从

返回的数据是一个漂亮的 json(dictionary) :)

数据是字典和列表.所以你可以使用你的 Python 技能来获取变量.例如data['d']['results'] :) 希望对你有所帮助.

I want to scrape data from this website. After visiting, we need to select radio button criteria as 'TIN', then enter the TIN no. as '27680809621V' & click on submit button. I don't know how to do I'm stuck, as there is no name or value.

import requests
from bs4 import BeautifulSoup

s = requests.session()
req = s.get('https://mahagst.gov.in/en/know-your-taxpayer')
soup = BeautifulSoup(req.text,'lxml')

dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')}

Someone please help me.

解决方案

The selection makes a GET request with selected Tin :) This is how you will get the json response back, and therefore, no need for BeautifulSoup.

from requests import Session

s = Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
                         'AppleWebKit/537.36 (KHTML, like Gecko) '\
                         'Chrome/75.0.3770.80 Safari/537.36',
          'Accept': 'application/json'
}
# Add headers
s.headers.update(headers)


BASE_URL = 'https://mahagst.gov.in/sap/opu/odata/sap/ZMSTD_KYT_SRV/TinDetailSet'

params = {
    "$filter": "(Tin eq '27680809621V')"
}

r = s.get(BASE_URL, params=params)

data = r.json()
print(data)

This is how I found out the URL and params

And the data return is a beautiful json(dictionary) :)

The data is a dictionary and list. So you can use your Python skills to get the variables out. e.g. data['d']['results'] :) Hope this will help you.

这篇关于如何在 Python 3 中使用请求绕过单选按钮抓取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆