捕获通过页面重定向的下载链接(WGET) [英] Capture a download link redirected by a page (WGET)

查看:162
本文介绍了捕获通过页面重定向的下载链接(WGET)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的问题。

我目前正在为我编写脚本,该脚本可以自动下载一些我用来清理计算机的软件。

I am currently working on a script for me that automates the download of some software that I use to "clean" my computer.

我已经能够使用这样的下载URL进行下载: https://www.driverscloud.com/plugins/DriversCloud_Win.exe ,但不包含经过短暂等待后重定向到下载URL的URL: https://www.ccleaner.com/fr-fr/ccleaner/download/standard

I have been able to make downloads with download URLs like this one: "https://www.driverscloud.com/plugins/DriversCloud_Win.exe" but not with URLs that redirect to a download URL after a short time of waiting like this one: "https://www.ccleaner.com/fr-fr/ccleaner/download/standard".

我可以看到问题是我没有将直接下载地址提供给Wget,但我希望能够做到这一点地址 https://www.ccleaner.com/fr-fr/ccleaner / download / standard ,因为Piriform(开发者Ccleaner的loper)会定期更新软件,下载地址会根据版本号进行更改(例如: https: //download.ccleaner.com/ccsetup547.exe -> https://download.ccleaner .com / ccsetup548.exe )。

I can see that the problem is that I don't give a direct download address to Wget but I would like to be able to do it with the address "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" because Piriform (the developer of Ccleaner) updates the software quite regularly and the download address changes according to the version number (example: https://download.ccleaner.com/ccsetup547.exe -> https://download.ccleaner.com/ccsetup548.exe).

所以我如何要求Wget获取页面中包含的下载链接而不下载页面本身(因为我得到了一个名为 standard的文件,例如URL https:结尾的文件: //www.ccleaner.com/fr-fr/ccleaner/download/standard

So how can I ask Wget to take the download link contained in the page and not download the page itself (because I get a file called "standard" like at the end of the URL "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" ?

如果您对我有解决方案,我会很高兴Wget或其他工具,例如Curl :)。

I would be delighted if you have a solution for me with Wget or other tools like Curl :) .

谢谢您。

推荐答案

您不需要PHP。仅 wget 足以完成此简单的工作:)

You don't need PHP for that. wget alone is powerful enough to do this simple job :)

这是您需要的命令(我会给出

Here's the command you need (I'll give a breakdown below):

$ wget -r -l 1 --span-hosts --accept-regex='.*download.ccleaner.com/.*.exe' -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard

现在,请按以下步骤操作:

Now, for a breakdown of what this does:


  • -r :启用递归,因为我们要在提供的页面上点击链接

  • -l 1 :由于所需的网址在同一页面上,因此我们只希望递归深一层

  • -跨主机 :所需文件与我们提供的原始网址位于不同的主机上。因此,我们要求wget在使用递归时使用主机

  • -accept-regex = ... :这指定了一个正则表达式通过递归访问的链接。由于我们只需要一个文件并知道模式,因此我们制作了非常具体的正则表达式。

  • -erobots = off download.ccleaner.com 主机具有 robots.txt ,它禁止所有用户代理。但是我们不会抓取域,因此请禁用对机械手文件的授予权限

  • -nH :不要创建主机专用目录。这意味着该exe文件将立即直接下载到您当前的文件夹中。

  • -r: Enables recursion since we want to follow a link on the provided page
  • -l 1: We want to recurse only one level deep since the required URL is on the same page
  • --span-hosts: The required file is on a different host than the original URL we provide. So we ask wget to go across hosts when using recursion
  • --accept-regex=...: This specifies a regular expression of the links that will be accessed through recursion. Since we only want one file and know the pattern, we make pretty specific regex.
  • -erobots=off: The download.ccleaner.com host has a robots.txt that forbids all user-agents. But we're not crawling the domain, so disable honoring the robots file
  • -nH: Don't create host specific directories. This means the exe will be downloaded directly into your current folder now.

如果您想要更多自动化功能,还可以附加一个&& rm -r fr-fr / 删除上面下载的基本页面以获取正确的链接。

If you want a little more automation, you can also append a && rm -r fr-fr/ to the above command to remove the base page that you downloaded in order to get the right link.

享受!

编辑:由于OP在Windows上,因此这是专门用于Windows上运行的更新命令。它不会单引号正则表达式字符串,因为这会导致Windows Shell将正则表达式作为带单引号的字符串传递。

Since OP is on Windows, here is an updated command specifically for running on Windows. It doesn't single-quote the regex string since that causes the Windows shell to pass the regex as a string with single quotes.

$ wget -r -l 1 --span-hosts --accept-regex=.*download.ccleaner.com/.*.exe -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard

这篇关于捕获通过页面重定向的下载链接(WGET)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆