下载网页的本地工作副本 [英] Download a working local copy of a webpage
问题描述
我想下载一个网页的本地副本并获取所有的 css、图像、javascript 等.
I would like to download a local copy of a web page and get all of the css, images, javascript, etc.
在之前的讨论中(例如 此处 和 这里,都是两年多),一般提出两个建议:wget -p
和 httrack.然而,这些建议都失败了.我非常感谢使用这些工具中的任何一个来完成任务的帮助;替代品也很可爱.
In previous discussions (e.g. here and here, both of which are more than two years old), two suggestions are generally put forward: wget -p
and httrack. However, these suggestions both fail. I would very much appreciate help with using either of these tools to accomplish the task; alternatives are also lovely.
选项 1:wget -p
wget -p
成功下载了网页的所有先决条件(css、图像、js).但是,当我在 Web 浏览器中加载本地副本时,该页面无法加载先决条件,因为这些先决条件的路径尚未从网络上的版本修改.
wget -p
successfully downloads all of the web page's prerequisites (css, images, js). However, when I load the local copy in a web browser, the page is unable to load the prerequisites because the paths to those prerequisites haven't been modified from the version on the web.
例如:
- 在页面的 html 中,
<link rel="stylesheet href="/stylesheets/foo.css"/>
需要更正以指向foo.css
- 在 css 文件中,
background-image: url(/images/bar.png)
同样需要调整.
- In the page's html,
<link rel="stylesheet href="/stylesheets/foo.css" />
will need to be corrected to point to the new relative path offoo.css
- In the css file,
background-image: url(/images/bar.png)
will similarly need to be adjusted.
有没有办法修改 wget -p
使路径正确?
Is there a way to modify wget -p
so that the paths are correct?
选项 2:httrack
httrack
似乎是镜像整个网站的好工具,但我不清楚如何使用它来创建单个页面的本地副本.httrack 论坛中有很多关于这个话题的讨论(例如 这里) 但似乎没有人有防弹解决方案.
httrack
seems like a great tool for mirroring entire websites, but it's unclear to me how to use it to create a local copy of a single page. There is a great deal of discussion in the httrack forums about this topic (e.g. here) but no one seems to have a bullet-proof solution.
选项 3:另一种工具?
有些人建议使用付费工具,但我不敢相信没有免费的解决方案.
Some people have suggested paid tools, but I just can't believe there isn't a free solution out there.
推荐答案
wget 能够满足您的要求.只需尝试以下操作:
wget is capable of doing what you are asking. Just try the following:
wget -p -k http://www.example.com/
-p
将为您提供正确查看站点所需的所有元素(css、图像等).-k
将更改所有链接(包括 CSS 和图像的链接),以允许您离线查看页面,就像它在线一样.
The -p
will get you all the required elements to view the site correctly (css, images, etc).
The -k
will change all links (to include those for CSS & images) to allow you to view the page offline as it appeared online.
来自 Wget 文档:
From the Wget docs:
‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.
Each link will be changed in one of the two ways:
The links to files that have been downloaded by Wget will be changed to refer
to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
downloaded, then the link in doc.html will be modified to point to
‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
combinations of directories.
The links to files that have not been downloaded by Wget will be changed to
include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
../bar/img.gif), then the link in doc.html will be modified to point to
http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.
这篇关于下载网页的本地工作副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!