如何完全镜像网页? [英] How do I completely mirror a web page?

查看：95 发布时间：2021/6/15 20:34:33 perl wget mirror

本文介绍了如何完全镜像网页?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在几个不同的网站上有几个网页，我想完全镜像它们.这意味着我将需要图像、CSS 等，并且需要转换链接.此功能类似于使用 Firefox 来将页面另存为"并选择网页，完成".我想将文件和相应的目录命名为合理的名称(例如 myfavpage1.html,myfavpage1.dir).

我无权访问服务器，它们也不是我的页面.这是一个示例链接:点击我！

再澄清一点……我有大约 100 个页面要镜像(很多来自慢速服务器)，我将在 Solaris 10 上进行这项工作，并每小时将结果转储到人们的 samba 安装上查看.而且，是的，我显然已经尝试过 wget 与几个不同的标志，但我还没有得到我正在寻找的结果.因此，指向 GNU wget 页面并不是很有帮助.让我从一个简单的例子开始.

<前>wget --mirror -w 2 -p --html-extension --tries=3 -k -P stackperl.html "https://stackoverflow.com/tags/perl"

从这里，我应该看到 stackper.html 文件中的 https://stackoverflow.com/tags/perl 页面，如果我有正确的标志.

解决方案

如果您只想运行命令并获取网站副本，请使用其他人建议的工具，例如 wget, curl 或一些 GUI 工具.我使用自己的个人工具，我称之为 webreaper(这不是 Windows WebReaper.我知道一些 Perl 程序，包括 webmirror 以及您可以在 CPAN.

如果您想在正在编写的 Perl 程序中执行此操作(因为您的答案中有perl"标签)，CPAN 可以在每一步为您提供帮助:

下载内容:LWP::Simple、LWP::UserAgent, WWW::机械化
链接提取:HTML::LinkExtor、HTML::SimpleLinkExtor
链接重写:HTML::Parser

祝你好运:)

I have several web pages on several different sites that I want to mirror completely. This means that I will need images, CSS, etc, and the links need to be converted. This functionality would be similar to using Firefox to "Save Page As" and selecting "Web Page, complete". I'd like to name the files and corresponding directories as something sensible (e.g. myfavpage1.html,myfavpage1.dir).

I do not have access to the servers, and they are not my pages. Here is one sample link: Click Me!

A little more clarification... I have about 100 pages that I want to mirror (many from slow servers), I will be cron'ing the job on Solaris 10 and dumping the results every hour to a samba mount for people to view. And, yes, I have obviously tried wget with several different flags but I haven't gotten the results for which I am looking. So, pointing to the GNU wget page is not really helpful. Let me start with where I am with a simple example.

 wget --mirror -w 2 -p --html-extension --tries=3 -k -P stackperl.html "https://stackoverflow.com/tags/perl"

From this, I should see the https://stackoverflow.com/tags/perl page in the stackper.html file, if I had the flags correct.

解决方案

If your just looking to run a command and get a copy of a web site, use the tools that others have suggested, such as wget, curl, or some of the GUI tools. I use my own personal tool that I call webreaper (that's not the Windows WebReaper though. There are a few Perl programs I know about, including webmirror and a few others you can find on CPAN.

If you're looking to do this inside a Perl program you are writing (since you have the "perl" tag on your answer), there are many tools in CPAN that can help you at each step:

Downloading content: LWP::Simple, LWP::UserAgent, WWW::Mechanize
Link extraction: HTML::LinkExtor, HTML::SimpleLinkExtor
Link rewriting: HTML::Parser

Good luck, :)

这篇关于如何完全镜像网页?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何完全镜像网页? [英] How do I completely mirror a web page?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何完全镜像网页? [英] How do I completely mirror a web page?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭