从网站中挑选文件名以在powershell中下载 [英] Picking file names out of a website to download in powershell

查看:131
本文介绍了从网站中挑选文件名以在powershell中下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我正在制作一个PowerShell脚本,它将下载网站源代码,找到所有文件目标,然后下载所述目标。我现在可以认证,所以在我的测试网站,我启用匿名身份验证,启用目录浏览,并禁用所有其他默认页面,所以我得到的是我的网站上的文件列表。我到目前为止是这样:

Problem: I'm working on making a PowerShell script that will download the sites source code, find all the file targets, and then download said targets. I'm alright for authentication for the moment, so on my test website, I enabled anonymous authentication, enabled directory browsing, and disabled all other default pages, so all I get is a list of files on my site. What I have so far is this:

$source = "http://testsite/testfolder/"
$webclient = New-Object system.net.webclient
$destination = "c:/users/administrator/desktop/test/"
$webclient.downloadstring($source)

$ webclient.downloadstring 将基本返回我的网站的源代码,我可以看到我想包装在其余的代码的文件。我的问题给你们是什么是隔离我想要的链接的最好和/或最简单的方法,所以我可以做一个foreach命令下载所有这些?

The $webclient.downloadstring will return basically the source code of my site, and I can see the files I want wrapped in the rest of the code. My question to you guys is what is the best and/or easiest ways of isolating the links I want so I can do a foreach command to download all of them?

,为了额外的信用,我将如何添加代码下载文件夹和这些文件夹中的文件从我的网站?我可以至少使用单独的脚本来从每个子文件夹中提取文件,但显然,在一个脚本中获取它是更好的。

Also, for extra credit, how would I go about adding in code to download folders and the files within those folders from my site? I can at least make seperate scripts to pull the files from each subfolder, but obviously it would be much nicer to get it all in one script.

推荐答案

如果你在PowerShell v3上, Invoke-WebRequest cmdlet可能有帮助。

If you are on PowerShell v3 the Invoke-WebRequest cmdlet may be of help.

要获取代表网站的对象:

To get an object representing the website:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell"

该网站中的所有链接:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links

只需获取 href 元素:

And to just get a list of the href elements:

Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links | select href

如果您使用PowerShell v2或更低版本,则必须创建 InternetExplorer.Application COM对象,并使用它来导航页面:

If you are on PowerShell v2 or earlier you'll have to create an InternetExplorer.Application COM object and use that to navigate the page:

$ie = new-object -com "InternetExplorer.Application"
# sleep for a second while IE launches
Start-Sleep -Seconds 1
$ie.Navigate("http://stackoverflow.com/search?tab=newest&q=powershell")
# sleep for a second while IE opens the page
Start-Sleep -Seconds 1
$ie.Document.Links | select IHTMLAnchorElement_href
# quit IE
$ie.Application.Quit()

感谢这篇博文,我在那里学到了 Invoke-WebRequest

Thanks to this blog post where I learnt about Invoke-WebRequest.

更新
也可以下载您发布的网站源,然后从源中提取链接。像这样:

Update: One could also download the website source like you posted and then extract the links from the source. Something like this:

$webclient.downloadstring($source) -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }

-split part沿着以< a 开头,后跟一个或多个空格的行拆分源。输出放在一个数组中,然后通过 foreach-object 块。这里我匹配regexp中的每一行,它提取链接部分并输出它。

The -split part splits the source along lines that start with <a followed by one or more spaces. The output is placed in an array which I then pipe through a foreach-object block. Here I match each line on the regexp which extracts the links part and outputs it.

如果你想对输出做更多的事情,你可以通过另一个块做它。

If you want to do more with the output you can pipe it further through another block which does something with it.

这篇关于从网站中挑选文件名以在powershell中下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆