下载网页的正式本地副本 [英] Download a working local copy of a webpage

查看:134
本文介绍了下载网页的正式本地副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载一个网页的本地副本,并获取所有的CSS,图像,JavaScript等。 b
$ b

在以前的讨论中(例如这里这里,两者都是两岁以上),通常提出两个建议: wget -p httrack 。但是,这些建议都失败了。我非常感谢使用这些工具来完成任务的帮助;替代品也很可爱。





选项1: wget -p / p>

wget -p 成功下载所有网页的先决条件(css,图像,js)。但是,当我在Web浏览器中加载本地副本时,页面无法加载先决条件,因为这些先决条件的路径尚未从Web上的版本进行修改。



例如:




  • 在页面的html中,< link rel =stylesheet href =/ stylesheets / foo.css/> 将需要更正,以指向新的相对路径 foo.css

  • 在css文件中, background-image:url(/images/bar.png)将同样需要调整。



有没有办法修改 wget -p 所以路径是正确的?





选项2: httrack



httrack 似乎很棒用于镜像整个网站的工具,但是我不清楚如何使用它创建单个页面的本地副本。在httrack论坛上有关于此主题的大量讨论(例如这里),但没有人似乎有一个防弹解决方案。





选项3:另一个工具?



有些人建议付费工具,但我不敢相信没有'



非常感谢!

解决方案

wget能够做你所要求的事情。只需尝试以下操作:

  wget -p -k http://www.example.com/ 

-p 将为您提供查看网站所需的所有元素正确(css,图像等)。
-k 将更改所有链接(包括CSS和图像的链接),以允许您在线显示离线观看页面。



从Wget文档:

 ' -  k'
' --convert-links'
下载完成后,转换文档中的链接,使其
适合本地查看。这不仅影响可见的超链接,而且还会影响链接到外部内容的文档的任何部分,例如嵌入图像,
链接到样式表,超链接到非html内容等

每个链接将以两种方式之一进行更改:

Wget下载的文件的链接将被更改为将
引用到它们指向的文件作为相对链接。

示例:如果下载的文件/foo/doc.html链接到/bar/img.gif,也
下载,那么doc.html中的链接将被修改为指向
'../bar/img.gif'。这种转换对任意
组合的目录可靠地工作。

Wget尚未下载的文件的链接将更改为
,包括主机名和指向的位置的绝对路径。

示例:如果下载的文件/foo/doc.html链接到/bar/img.gif(或
../bar/img.gif),那么文档中的链接.html将被修改为指向
http://hostname/bar/img.gif。

因此,本地浏览工作可靠:如果链接文件被下载,
链接将引用其本地名称;如果没有下载,链接将
引用其完整的Internet地址,而不是显示一个断开的链接。事实
,以前的链接转换为相对链接确保您可以将
下载的层次结构移动到另一个目录。

请注意,只有在下载结束时,Wget才知道哪些链接已经被
下载。因此,-k完成的工作将在所有下载的最后
执行。


I would like to download a local copy of a web page and get all of the css, images, javascript, etc.

In previous discussions (e.g. here and here, both of which are more than two years old), two suggestions are generally put forward: wget -p and httrack. However, these suggestions both fail. I would very much appreciate help with using either of these tools to accomplish the task; alternatives are also lovely.


Option 1: wget -p

wget -p successfully downloads all of the web page's prerequisites (css, images, js). However, when I load the local copy in a web browser, the page is unable to load the prerequisites because the paths to those prerequisites haven't been modified from the version on the web.

For example:

  • In the page's html, <link rel="stylesheet href="/stylesheets/foo.css" /> will need to be corrected to point to the new relative path of foo.css
  • In the css file, background-image: url(/images/bar.png) will similarly need to be adjusted.

Is there a way to modify wget -p so that the paths are correct?


Option 2: httrack

httrack seems like a great tool for mirroring entire websites, but it's unclear to me how to use it to create a local copy of a single page. There is a great deal of discussion in the httrack forums about this topic (e.g. here) but no one seems to have a bullet-proof solution.


Option 3: another tool?

Some people have suggested paid tools, but I just can't believe there isn't a free solution out there.

Thanks so much!

解决方案

wget is capable of doing what you are asking. Just try the following:

wget -p -k http://www.example.com/

The -p will get you all the required elements to view the site correctly (css, images, etc). The -k will change all links (to include those for CSS & images) to allow you to view the page offline as it appeared online.

From the Wget docs:

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

    The links to files that have been downloaded by Wget will be changed to refer
    to the file they point to as a relative link.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
    downloaded, then the link in doc.html will be modified to point to
    ‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
    combinations of directories.

    The links to files that have not been downloaded by Wget will be changed to
    include host name and absolute path of the location they point to.

    Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
    ../bar/img.gif), then the link in doc.html will be modified to point to
    http://hostname/bar/img.gif. 

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads. 

这篇关于下载网页的正式本地副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆