下载网页的本地工作副本 [英] Download a working local copy of a webpage

查看:29
本文介绍了下载网页的本地工作副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载一个网页的本地副本并获取所有的 css、图像、javascript 等.

I would like to download a local copy of a web page and get all of the css, images, javascript, etc.

在之前的讨论中(例如 此处这里,都是两年多),一般提出两个建议:wget -phttrack.然而,这些建议都失败了.我非常感谢使用这些工具中的任何一个来完成任务的帮助;替代品也很可爱.

In previous discussions (e.g. here and here, both of which are more than two years old), two suggestions are generally put forward: wget -p and httrack. However, these suggestions both fail. I would very much appreciate help with using either of these tools to accomplish the task; alternatives are also lovely.


选项 1:wget -p

wget -p 成功下载了网页的所有先决条件(css、图像、js).但是,当我在 Web 浏览器中加载本地副本时,该页面无法加载先决条件,因为这些先决条件的路径尚未从网络上的版本修改.

wget -p successfully downloads all of the web page's prerequisites (css, images, js). However, when I load the local copy in a web browser, the page is unable to load the prerequisites because the paths to those prerequisites haven't been modified from the version on the web.

例如:

  • 在页面的 html 中,<link rel="stylesheet href="/stylesheets/foo.css"/> 需要更正以指向 foo.css
  • 在 css 文件中,background-image: url(/images/bar.png) 同样需要调整.
  • In the page's html, <link rel="stylesheet href="/stylesheets/foo.css" /> will need to be corrected to point to the new relative path of foo.css
  • In the css file, background-image: url(/images/bar.png) will similarly need to be adjusted.

有没有办法修改 wget -p 使路径正确?

Is there a way to modify wget -p so that the paths are correct?


选项 2:httrack

httrack 似乎是镜像整个网站的好工具,但我不清楚如何使用它来创建单个页面的本地副本.httrack 论坛中有很多关于这个话题的讨论(例如 这里) 但似乎没有人有防弹解决方案.

httrack seems like a great tool for mirroring entire websites, but it's unclear to me how to use it to create a local copy of a single page. There is a great deal of discussion in the httrack forums about this topic (e.g. here) but no one seems to have a bullet-proof solution.


选项 3:另一种工具?

有些人建议使用付费工具,但我不敢相信没有免费的解决方案.

Some people have suggested paid tools, but I just can't believe there isn't a free solution out there.

推荐答案

wget 能够满足您的要求.只需尝试以下操作:

wget is capable of doing what you are asking. Just try the following:

wget -p -k http://www.example.com/

-p 将为您提供正确查看站点所需的所有元素(css、图像等).-k 将更改所有链接(包括 CSS 和图像的链接),以允许您离线查看页面,就像它在线一样.

The -p will get you all the required elements to view the site correctly (css, images, etc). The -k will change all links (to include those for CSS & images) to allow you to view the page offline as it appeared online.

来自 Wget 文档:

From the Wget docs:

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

    The links to files that have been downloaded by Wget will be changed to refer
    to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
    downloaded, then the link in doc.html will be modified to point to
    ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
    combinations of directories.

    The links to files that have not been downloaded by Wget will be changed to
    include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
    ../bar/img.gif), then the link in doc.html will be modified to point to
    http://hostname/bar/img.gif. 

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads. 

这篇关于下载网页的本地工作副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆