在 url 中循环或从 Url 的变化中抓取数据 [英] Looping in a url or scrape data from variation in Url

查看:59
本文介绍了在 url 中循环或从 Url 的变化中抓取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是将加拿大的所有纬度和经度范围自动输入到下面的代码中,并自动抓取出现的位置.我知道加拿大的范围是纬度 42°N 到 83°N,经度 53°W 到 141°W.我知道如何抓取这种类型的数据,但从来没有在 url 中循环信息.我担心我会以某种方式创建一个循环,除了让我禁止访问网站之外什么都不做.所以任何帮助都会很棒!

My goal is to have all latitudes and longitude range for canada being automatically inputted into the code below and it scraping the locations that come up automatically. I know canada range is latitudes of 42°N to 83°N and longitude of 53°W to 141°W. I understand how to scrape this type of data but never had to loop information within a url.I have a fear I will somehow make a loop that does nothing but get me ban from the website. So any help would be great!

import requests

url = "https://www.circlek.com/stores_new.php?lat=43.6529&lng=-79.3849&services=&region=global"

payload={}
headers = {
  'Connection': 'keep-alive',
  'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
  'Accept': '*/*',
  'X-Requested-With': 'XMLHttpRequest',
  'sec-ch-ua-mobile': '?0',
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36',
  'Sec-Fetch-Site': 'same-origin',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Dest': 'empty',
  'Referer': 'https://www.circlek.com/store-locator?Canada&lat=43.6529&lng=-79.3849',
  'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
  'dnt': '1'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

推荐答案

正如你所评论的,你可以像这样放置你的代码,我猜你的不同纬度和经度存储在这样的列表中,如果不与差异共享 lat_lng 的范围

As you commented you can put your code like this , i am guessing your different latitude and longitude store in list like this if not share the range of lat_lng with difference

lat_lng = [(lat,long) for lat,long in zip(range(43,83),range(-141,-53))] #store or create range of latitude and longitude 

for latitude,longitude in lat_lng:
  url = f"https://www.circlek.com/stores_new.php?lat={latitude}&lng={longitude}&services=&region=global"
  payload={}
  headers = {
    'Connection': 'keep-alive',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'Accept': '*/*',
    'X-Requested-With': 'XMLHttpRequest',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://www.circlek.com/store-locator?Canada&lat=43.6529&lng=-79.3849',
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
    'dnt': '1'
  }

  response = requests.request("GET", url, headers=headers, data=payload)

  print(response.json())

你也在函数中环绕.

正如你所评论的,对于负排列 range 应该是这样的,它正在工作

as you commented , for negative arrange range should be like this , it is working

lat_lng = [(lat,long) for lat,long in zip(range(43,83),range(-141,-53))]

#[(43, -141), (44, -140), (45, -139), (46, -138), (47, -137), (48, -136),.....]

在上面的输出中要注意,在 zip 中我们有一对一的方式,比如一个纬度到一个经度,但是如果你想要一对多,请看itertools 模块会有所帮助.

In above output to have notice that in zip we have one to one like one latitude point to one longitude but if you want one to many see itertools module it will help.

为了更准确的使用,我建议查看 np.arange 你也可以像浮动一样使用

for more accurate use i will suggest see np.arange you can use for float also like

np.arange(43,83,0.001)
#array([43.   , 43.001, 43.002, ..., 82.997, 82.998, 82.999])

这篇关于在 url 中循环或从 Url 的变化中抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆