Python 保存网页 [英] Python to Save Web Pages

查看:39
本文介绍了Python 保存网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一项非常简单的任务,但我找不到任何帮助.我有一个采用 www.xyz.com/somestuff/ID 形式的网站.我有一个我需要从中获取信息的 ID 列表.我希望有一个简单的脚本来访问站点并以简单的形式下载每个 ID 的(完整)网页 ID_whatever_the_default_save_name_is 位于特定文件夹中.

This is probably a very simple task, but I cannot find any help. I have a website that takes the form www.xyz.com/somestuff/ID. I have a list of the IDs I need information from. I was hoping to have a simple script to go one the site and download the (complete) web page for each ID in a simple form ID_whatever_the_default_save_name_is in a specific folder.

我可以运行一个简单的 python 脚本来为我做这件事吗?我可以手工完成,只有 75 个不同的页面,但我希望将来能用它来学习如何做这样的事情.

Can I run a simple python script to do this for me? I can do it by hand, it is only 75 different pages, but I was hoping to use this to learn how to do things like this in the future.

推荐答案

您是否只想要网站的 html 代码?如果是这样,只需使用主机站点创建一个 url 变量并随时添加页码.我将以 http://www.notalwaysright.com

Do you want just the html code for the website? If so, just create a url variable with the host site and add the page number as you go. I'll do this for an example with http://www.notalwaysright.com

import urllib.request

url = "http://www.notalwaysright.com/page/"

for x in range(1, 71):
    newurl = url + x
    response = urllib.request.urlopen(newurl)
    with open("Page/" + x, "a") as p:
        p.writelines(reponse.read())

这篇关于Python 保存网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆