捕获通过页面重定向的下载链接（WGET） [英] Capture a download link redirected by a page (WGET)

查看：162 发布时间：2020/10/26 1:15:03 windows url redirect download wget

本文介绍了捕获通过页面重定向的下载链接（WGET）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的问题。

我目前正在为我编写脚本，该脚本可以自动下载一些我用来清理计算机的软件。

I am currently working on a script for me that automates the download of some software that I use to "clean" my computer.

我已经能够使用这样的下载URL进行下载： https://www.driverscloud.com/plugins/DriversCloud_Win.exe ，但不包含经过短暂等待后重定向到下载URL的URL： https://www.ccleaner.com/fr-fr/ccleaner/download/standard 。

I have been able to make downloads with download URLs like this one: "https://www.driverscloud.com/plugins/DriversCloud_Win.exe" but not with URLs that redirect to a download URL after a short time of waiting like this one: "https://www.ccleaner.com/fr-fr/ccleaner/download/standard".

我可以看到问题是我没有将直接下载地址提供给Wget，但我希望能够做到这一点地址 https://www.ccleaner.com/fr-fr/ccleaner / download / standard ，因为Piriform（开发者Ccleaner的loper）会定期更新软件，下载地址会根据版本号进行更改（例如： https： //download.ccleaner.com/ccsetup547.exe -> https：//download.ccleaner .com / ccsetup548.exe ）。

I can see that the problem is that I don't give a direct download address to Wget but I would like to be able to do it with the address "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" because Piriform (the developer of Ccleaner) updates the software quite regularly and the download address changes according to the version number (example: https://download.ccleaner.com/ccsetup547.exe -> https://download.ccleaner.com/ccsetup548.exe).

所以我如何要求Wget获取页面中包含的下载链接而不下载页面本身（因为我得到了一个名为 standard的文件，例如URL https：结尾的文件： //www.ccleaner.com/fr-fr/ccleaner/download/standard ？

So how can I ask Wget to take the download link contained in the page and not download the page itself (because I get a file called "standard" like at the end of the URL "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" ?

如果您对我有解决方案，我会很高兴Wget或其他工具，例如Curl :)。

I would be delighted if you have a solution for me with Wget or other tools like Curl :) .

谢谢您。

推荐答案

您不需要PHP。仅 wget 足以完成此简单的工作：）

You don't need PHP for that. wget alone is powerful enough to do this simple job :)

这是您需要的命令（我会给出

Here's the command you need (I'll give a breakdown below):

$ wget -r -l 1 --span-hosts --accept-regex='.*download.ccleaner.com/.*.exe' -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard

现在，请按以下步骤操作：

Now, for a breakdown of what this does:

-r ：启用递归，因为我们要在提供的页面上点击链接

-l 1 ：由于所需的网址在同一页面上，因此我们只希望递归深一层

-跨主机 ：所需文件与我们提供的原始网址位于不同的主机上。因此，我们要求wget在使用递归时使用主机

-accept-regex = ... ：这指定了一个正则表达式通过递归访问的链接。由于我们只需要一个文件并知道模式，因此我们制作了非常具体的正则表达式。

-erobots = off ： download.ccleaner.com 主机具有 robots.txt ，它禁止所有用户代理。但是我们不会抓取域，因此请禁用对机械手文件的授予权限

-nH ：不要创建主机专用目录。这意味着该exe文件将立即直接下载到您当前的文件夹中。

-r: Enables recursion since we want to follow a link on the provided page
-l 1: We want to recurse only one level deep since the required URL is on the same page
--span-hosts: The required file is on a different host than the original URL we provide. So we ask wget to go across hosts when using recursion
--accept-regex=...: This specifies a regular expression of the links that will be accessed through recursion. Since we only want one file and know the pattern, we make pretty specific regex.
-erobots=off: The download.ccleaner.com host has a robots.txt that forbids all user-agents. But we're not crawling the domain, so disable honoring the robots file
-nH: Don't create host specific directories. This means the exe will be downloaded directly into your current folder now.

如果您想要更多自动化功能，还可以附加一个&& rm -r fr-fr / 删除上面下载的基本页面以获取正确的链接。

If you want a little more automation, you can also append a && rm -r fr-fr/ to the above command to remove the base page that you downloaded in order to get the right link.

享受！

编辑：由于OP在Windows上，因此这是专门用于Windows上运行的更新命令。它不会单引号正则表达式字符串，因为这会导致Windows Shell将正则表达式作为带单引号的字符串传递。

Since OP is on Windows, here is an updated command specifically for running on Windows. It doesn't single-quote the regex string since that causes the Windows shell to pass the regex as a string with single quotes.

$ wget -r -l 1 --span-hosts --accept-regex=.*download.ccleaner.com/.*.exe -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard

这篇关于捕获通过页面重定向的下载链接（WGET）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

捕获通过页面重定向的下载链接（WGET） [英] Capture a download link redirected by a page (WGET)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

捕获通过页面重定向的下载链接（WGET） [英] Capture a download link redirected by a page (WGET)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭