捕获通过页面重定向的下载链接(WGET) [英] Capture a download link redirected by a page (WGET)
问题描述
这是我的问题。
我目前正在为我编写脚本,该脚本可以自动下载一些我用来清理计算机的软件。
I am currently working on a script for me that automates the download of some software that I use to "clean" my computer.
我已经能够使用这样的下载URL进行下载: https://www.driverscloud.com/plugins/DriversCloud_Win.exe ,但不包含经过短暂等待后重定向到下载URL的URL: https://www.ccleaner.com/fr-fr/ccleaner/download/standard 。
I have been able to make downloads with download URLs like this one: "https://www.driverscloud.com/plugins/DriversCloud_Win.exe" but not with URLs that redirect to a download URL after a short time of waiting like this one: "https://www.ccleaner.com/fr-fr/ccleaner/download/standard".
我可以看到问题是我没有将直接下载地址提供给Wget,但我希望能够做到这一点地址 https://www.ccleaner.com/fr-fr/ccleaner / download / standard ,因为Piriform(开发者Ccleaner的loper)会定期更新软件,下载地址会根据版本号进行更改(例如: https: //download.ccleaner.com/ccsetup547.exe -> https://download.ccleaner .com / ccsetup548.exe )。
I can see that the problem is that I don't give a direct download address to Wget but I would like to be able to do it with the address "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" because Piriform (the developer of Ccleaner) updates the software quite regularly and the download address changes according to the version number (example: https://download.ccleaner.com/ccsetup547.exe -> https://download.ccleaner.com/ccsetup548.exe).
所以我如何要求Wget获取页面中包含的下载链接而不下载页面本身(因为我得到了一个名为 standard的文件,例如URL https:结尾的文件: //www.ccleaner.com/fr-fr/ccleaner/download/standard ?
So how can I ask Wget to take the download link contained in the page and not download the page itself (because I get a file called "standard" like at the end of the URL "https://www.ccleaner.com/fr-fr/ccleaner/download/standard" ?
如果您对我有解决方案,我会很高兴Wget或其他工具,例如Curl :)。
I would be delighted if you have a solution for me with Wget or other tools like Curl :) .
谢谢您。
推荐答案
您不需要PHP。仅 wget
足以完成此简单的工作:)
You don't need PHP for that. wget
alone is powerful enough to do this simple job :)
这是您需要的命令(我会给出
Here's the command you need (I'll give a breakdown below):
$ wget -r -l 1 --span-hosts --accept-regex='.*download.ccleaner.com/.*.exe' -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard
现在,请按以下步骤操作:
Now, for a breakdown of what this does:
-
-r
:启用递归,因为我们要在提供的页面上点击链接 -
-l 1
:由于所需的网址在同一页面上,因此我们只希望递归深一层 -
-跨主机
:所需文件与我们提供的原始网址位于不同的主机上。因此,我们要求wget在使用递归时使用主机 -
-accept-regex = ...
:这指定了一个正则表达式通过递归访问的链接。由于我们只需要一个文件并知道模式,因此我们制作了非常具体的正则表达式。 -
-erobots = off
:download.ccleaner.com
主机具有robots.txt
,它禁止所有用户代理。但是我们不会抓取域,因此请禁用对机械手文件的授予权限 -
-nH
:不要创建主机专用目录。这意味着该exe文件将立即直接下载到您当前的文件夹中。
-r
: Enables recursion since we want to follow a link on the provided page-l 1
: We want to recurse only one level deep since the required URL is on the same page--span-hosts
: The required file is on a different host than the original URL we provide. So we ask wget to go across hosts when using recursion--accept-regex=...
: This specifies a regular expression of the links that will be accessed through recursion. Since we only want one file and know the pattern, we make pretty specific regex.-erobots=off
: Thedownload.ccleaner.com
host has arobots.txt
that forbids all user-agents. But we're not crawling the domain, so disable honoring the robots file-nH
: Don't create host specific directories. This means the exe will be downloaded directly into your current folder now.
如果您想要更多自动化功能,还可以附加一个&& rm -r fr-fr /
删除上面下载的基本页面以获取正确的链接。
If you want a little more automation, you can also append a && rm -r fr-fr/
to the above command to remove the base page that you downloaded in order to get the right link.
享受!
编辑:由于OP在Windows上,因此这是专门用于Windows上运行的更新命令。它不会单引号正则表达式字符串,因为这会导致Windows Shell将正则表达式作为带单引号的字符串传递。
Since OP is on Windows, here is an updated command specifically for running on Windows. It doesn't single-quote the regex string since that causes the Windows shell to pass the regex as a string with single quotes.
$ wget -r -l 1 --span-hosts --accept-regex=.*download.ccleaner.com/.*.exe -erobots=off -nH https://www.ccleaner.com/fr-fr/ccleaner/download/standard
这篇关于捕获通过页面重定向的下载链接(WGET)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!