循环创建URL [英] Creating URLs in a loop

查看:50
本文介绍了循环创建URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用for循环创建URL列表.它会打印所有正确的URL,但不会将它们保存在列表中.最终,我想使用 urlretrieve下载多个文件 .

 用于zip中的i,j(范围(0,17),范围(1,18)):如果我<8或j<10:url ="https://这里是URL/P200 {}".format(i)+-0 {}".format(j)+.xls"打印(URL)如果i == 9并且j == 10:url ="https://这里是URL/P200 {}".format(i)+-{}".format(j)+.xls"打印(URL)如果我>9:如果我>9或j<8:url ="https://这里是URL/P20 {}".format(i)+-{}".format(j)+.xls"打印(URL) 

以上代码的输出为:

  https://这里是URL/P2000-01.xlshttps://这里是URL/P2001-02.xlshttps://这里是URL/P2002-03.xlshttps://这里是URL/P2003-04.xlshttps://这里是URL/P2004-05.xlshttps://这里是URL/P2005-06.xlshttps://这里是URL/P2006-07.xlshttps://这里是URL/P2007-08.xlshttps://这里是URL/P2008-09.xlshttps://这里是URL/P2009-10.xlshttps://这里是URL/P2010-11.xlshttps://这里是URL/P2011-12.xlshttps://这里是URL/P2012-13.xlshttps://这里是URL/P2013-14.xlshttps://这里是URL/P2014-15.xlshttps://这里是URL/P2015-16.xlshttps://这里是URL/P2016-17.xls 

但是这个:

  url 

仅给予:

 'https://这里是URL/P2016-17.xls' 

如何获取所有URL,而不仅仅是最终URL?

解决方案

有几件事可以大大简化您的代码.首先,这是

 "https://这里是URL/P200 {}".format(i)+"-0 {}".format(j)+".xls" 

可以简化为:

 "https://这里是URL/P200 {}-0 {}.xls" .format(i,j) 

如果您至少具有Python 3.6,则可以使用 f字符串相反:

  f"https://这里是URL/P200 {i} -0 {j} .xls" 

第二,Python字符串具有内置的 zfill 方法,该方法自动处理左侧的零填充至指定长度.此外,默认情况下, range 从零开始.>

因此您的整个原始代码等效于:

 范围(17)中的num:首先= str(num).zfill(2)秒= str(num + 1).zfill(2)print(f'https://这里是URL/P20 {first}-{second} .xls') 

现在,您想实际使用这些URL,而不仅仅是打印出来.您提到了建立列表,可以这样进行:

  urls = []对于范围内的num(17):首先= str(num).zfill(2)秒= str(num + 1).zfill(2)urls.append(f'https://这里是URL/P20 {first}-{second} .xls') 


根据您的评论此处和您的其他问题,您似乎对此感到困惑您需要这些URL以什么形式出现.像这样的字符串已经已经是您所需要的了. urlretrieve 接受URL 作为字符串,因此您无需进行任何进一步处理.请参阅文档中的示例:

  local_filename,标头= urllib.request.urlretrieve('http://python.org/')html =打开(local_filename)html.close() 

但是,出于两个原因,我建议不要使用 urlretrieve .

  1. 如文档所述, urlretrieve 是一种可能会被弃用的旧方法.如果要使用 urllib ,请使用 urlopen 方法代替.

  2. 但是,正如保罗·贝科特(Paul Becotte)在答案中提到的那样,您对其他问题的看法是:获取网址,我建议安装并使用请求,而不是 urllib .更加人性化.

无论选择哪种方法,字符串都是可以的.这是使用请求"将每个指定的电子表格下载到当前目录的代码:

 导入请求base_url ='https://这里是URL/'对于范围内的num(17):首先= str(num).zfill(2)秒= str(num + 1).zfill(2)filename = f'P20 {first}-{second} .xls'xls = requests.get(base_url +文件名)使用open(filename,'wb')as f:f.write(xls.content) 

I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve.

for i, j in zip(range(0, 17), range(1, 18)):
    if i < 8 or j < 10:
        url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"
        print(url)
    if i == 9 and j == 10:
        url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls"
        print(url)
    if i > 9:
        if i > 9 or j < 8:
            url = "https://Here is a URL/P20{}".format(i) + "-{}".format(j) + ".xls"
            print(url)

Output of above code is:

https://Here is a URL/P2000-01.xls
https://Here is a URL/P2001-02.xls
https://Here is a URL/P2002-03.xls
https://Here is a URL/P2003-04.xls
https://Here is a URL/P2004-05.xls
https://Here is a URL/P2005-06.xls
https://Here is a URL/P2006-07.xls
https://Here is a URL/P2007-08.xls
https://Here is a URL/P2008-09.xls
https://Here is a URL/P2009-10.xls
https://Here is a URL/P2010-11.xls
https://Here is a URL/P2011-12.xls
https://Here is a URL/P2012-13.xls
https://Here is a URL/P2013-14.xls
https://Here is a URL/P2014-15.xls
https://Here is a URL/P2015-16.xls
https://Here is a URL/P2016-17.xls

But this:

url

gives only:

'https://Here is a URL/P2016-17.xls'

How do I get all the URLs, not just the final one?

解决方案

There are several things that could significantly simplify your code. First of all, this:

"https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls"

could be simplified to this:

"https://Here is a URL/P200{}-0{}.xls".format(i, j)

And if you have at least Python 3.6, you could use an f-string instead:

f"https://Here is a URL/P200{i}-0{j}.xls"

Second of all, Python strings have a builtin zfill method that automatically handles filling in zeroes on the left to a specified length. Additionally, range starts from zero by default.

So your entire original code is equivalent to:

for num in range(17):
    first = str(num).zfill(2)
    second = str(num + 1).zfill(2)
    print(f'https://Here is a URL/P20{first}-{second}.xls')

Now, you want to actually use these URLs, not just print them out. You mentioned building a list, which can be done like so:

urls = []
for num in range(17):
    first = str(num).zfill(2)
    second = str(num + 1).zfill(2)
    urls.append(f'https://Here is a URL/P20{first}-{second}.xls')


Based on your comments here and on your other question, you seem to be confused about what form you need these URLs to be in. Strings like this are already what you need. urlretrieve accepts the URL as a string, so you don't need to do any further processing. See the example in the docs:

local_filename, headers = urllib.request.urlretrieve('http://python.org/')
html = open(local_filename)
html.close()

However, I would recommend not using urlretrieve, for two reasons.

  1. As the documentation mentions, urlretrieve is a legacy method that may become deprecated. If you're going to use urllib, use the urlopen method instead.

  2. However, as Paul Becotte mentioned in an answer to your other question: if you're looking to fetch URLs, I would recommend installing and using Requests instead of urllib. It's more user-friendly.

Regardless of which method you choose, again, strings are fine. Here's code that that uses Requests to download each of the specified spreadsheets to your current directory:

import requests

base_url = 'https://Here is a URL/'

for num in range(17):
    first = str(num).zfill(2)
    second = str(num + 1).zfill(2)
    filename = f'P20{first}-{second}.xls'
    xls = requests.get(base_url + filename)
    with open(filename, 'wb') as f:
        f.write(xls.content)

这篇关于循环创建URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆