从网站身份验证自动图片下载 [英] Automate picture downloads from website with authentication
问题描述
我的目的是自动在网站的所有图片的下载需要登录(一个基于web的表单登录我觉得)
My intention is to automate the downloading of all pictures in a website that requires a login (a web-form based login I think)
该网站: http://www.cgwallpapers.com
登录URL: http://www.cgwallpapers.com/login.php
注册会员网址: http://www.cgwallpapers.com/members
一个随机壁纸的URL,这只是入店和可下载的注册会员: 的http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080
A random wallpaper url that is only accesible and downloadable for registered members: http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080
明知在 viewwallpaper.php 发表的数据有两个参数,壁纸的 ID (从X到Y)和壁纸资源我想编写一个作为以生成所有组合自动壁纸下载。
Knowing that the viewwallpaper.php post data takes two parameters, the wallpaper id (from x to y) and the wallpaper res, I would like to write a FOR to generate all the combinations to automate the wallpaper downloads.
这是我想只是使用Web客户端这样的第一件事情:
The first thing that I tried is just use a WebClient in this way:
Dim client As New WebClient()
client.Credentials = New System.Net.NetworkCredential("user", "pass")
client.DownloadFile("http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080", "C:\file.jpg")
但是,这并没有奏效,它返回一个图像的HTML文本内容,而不是,我想是因为我读过,我需要通过登录cookie。
But that didn't worked, it returns the html text contents instead of an image, I think it is because as I've read I need to pass the login cookie.
所以,我已经看到和研究的例子很多了的计算器的等网站如何登录并通过 HttpWebRequests
下载一个文件,因为似乎正确的方法来做到这一点。
So, I've seen and researched many examples over StackOverflow and other sites about how to login and download a file through HttpWebRequests
because seems the proper way to do it.
这是这样,我怎么登录到网站,并获得正确的登录cookie(或者我认为是这样)
This is the way how I login to the website and I get the proper login cookie (or I think so)
Dim logincookie As CookieContainer
Dim url As String = "http://www.cgwallpapers.com/login.php"
Dim postData As String = "action=go&emailMyUsername=&wachtwoord=MyPassword"
Dim tempCookies As New CookieContainer
Dim encoding As New UTF8Encoding
Dim byteData As Byte() = encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.Referer = "http://www.cgwallpapers.com/login.php"
.KeepAlive = True
postReq.CookieContainer = tempCookies
postReq.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
tempCookies.Add(postresponse.Cookies)
logincookie = tempCookies
postresponse.Close()
postreqstream.Close()
在这一点上我坚持,因为我不知道如何使用所获得的登录cookie来下载图片。
At this point I'm stuck because I'm not sure about how to use the obtained login cookie to download the pictures.
我想,以后拿到登录cookie我应该执行另一个请求到所需壁纸的URL使用保存的登录cookie,不是?但我觉得我做错了,下一个是code不工作, postresponse.ContentLength
总是 1 ,所以我不能写入文件。
I suppose that after get the login cookie I just should perform another request to the desired wallpaper url using the saved login cookie, not?, but I think I'm doing it wrong, the next code does not works, postresponse.ContentLength
is always -1 so I can't write to file.
Dim url As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?"
Dim postData As String = "id=1764&res=1920x1080"
Dim byteData As Byte() = Encoding.GetBytes(postData)
Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
With postReq
.Method = "POST"
.Host = "www.cgwallpapers.com"
.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
.Headers.Add("Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3")
.Headers.Add("Accept-Encoding: gzip, deflate")
.ContentType = "application/x-www-form-urlencoded"
.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"
.KeepAlive = True
' .Referer = ""
.CookieContainer = logincookie
.ContentLength = byteData.Length
End With
Dim postreqstream As Stream = postReq.GetRequestStream()
With postreqstream
.Write(byteData, 0, byteData.Length)
.Close()
End With
Dim postresponse As HttpWebResponse = DirectCast(postReq.GetResponse(), HttpWebResponse)
Dim memStream As MemoryStream
Using rdr As Stream = postresponse.GetResponseStream
Dim count As Integer = Convert.ToInt32(postresponse.ContentLength)
Dim buffer As Byte() = New Byte(count) {}
Dim bytesRead As Integer
Do
bytesRead += rdr.Read(buffer, bytesRead, count - bytesRead)
Loop Until bytesRead = count
rdr.Close()
memStream = New MemoryStream(buffer)
End Using
File.WriteAllBytes("c:\wallpaper.jpg", memStream.ToArray)
我该如何解决这些问题,下载壁纸(S)以适当的方式?
How I can fix the issues to download the wallpaper(s) in the proper way?
推荐答案
下面是专门使用一个完整的解决方案,以你的问题的HttpWebRequest
和 HttpWebResponse
请求来模拟浏览器的请求。我有评论大部分的code,以希望给你的想法如何工作的。
Here is a complete solution to your question exclusively using HttpWebRequest
and HttpWebResponse
requests to simulate browser requests. I have commented much of the code as to hopefully give you an idea of how this all works.
您必须修改 sUsername
和 spassword开头
变量,以自己的用户名/密码成功登录到该网站。
You must change the sUsername
and sPassword
variables to your own username/password to successfully log into the site.
这可能要改变可选变量:
Optional variables that you may want to change:
-
sDownloadPath
:目前设置为同一个文件夹中的应用程序的EXE。更改为要下载图像的路径。 -
sImageResolution
:默认为1920×1080
这是你在你原来的问题指定。此值更改为任何在网站上允许的分辨率值。只是,我不是不是100%肯定,如果所有的图像具有相同的决议,以便改变这个值可能会导致某些图像被跳过,如果他们不具备所需的分辨率的图像警告。 -
nMaxErrorsInSuccession
:默认设置为10
。登录后,该应用程序将不断增加的图片ID,并尝试下载一个新的形象。一些的ID不包含的图像,这是正常的,因为图像可能已被删除的服务器上(或可能的图像不在所需的分辨率可提供)。如果应用程序无法下载一排的图像nMaxErrorsInSuccession
倍那么应用程序将停止,因为我们认为我们已经到了最后的图像。这是可能的可能必须有被所选择的分辨率在删除或不提供超过10的图像,以增加此为较高数值在事件。 -
nCurrentID
:默认设置为1
。这是所使用的网站,以确定要提供给客户端哪个图像的图像ID。当图像被下载后,nCurrentID
变量。每次由图像下载尝试增加。根据时间和情况下,你可能无法下载所有的图像在一个会话。如果是这样的话,你还记得你离开它的ID并相应地更新这个变量来启动不同的ID下一次。也有用,当你已经成功地下载所有图像,并希望以后运行的应用程序,下载新的图像。 -
sUserAgent
:可以是任何你想要的用户代理。目前使用Firefox 35.0适用于Windows 7。请注意,一些网站将功能不同,具体取决于您指定所以只有改变这种哪些用户代理,如果你真的需要效仿其他浏览器。
sDownloadPath
: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.sImageResolution
: Defaults to1920x1080
which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.nMaxErrorsInSuccession
: Set to10
by default. Once logged in, the app will continually increment the image id and attempt to download a new image. Some ids do not contain an image and this is normal as the image may have been deleted on the server (or maybe the image is not available in the desired resolution). If the app fails to download an imagenMaxErrorsInSuccession
times in a row then the application will stop as we assume we have reached the last of the images. It is possible that you may have to increase this to a higher number in the event that there are more than 10 images that are deleted or not available in the selected resolution.nCurrentID
: Set to1
by default. This is the image id used by the website to determine which image to serve to the client. As images are downloaded, thenCurrentID
variable is incremented by one each image download attempt. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember which ID you left off on and update this variable accordingly to start on a different id next time. Also useful for when you have successfully downloaded all images and want to run the app later to download newer images.sUserAgent
: Can be any user agent that you want. Currently using Firefox 35.0 for Windows 7. Note that some websites will function differently depending on what user agent you specify so only change this if you really need to emulate another browser.
注意:有策略地插在了code各点3秒的停顿。有些网站有锤子脚本,将阻止,甚至禁止谁是浏览网站太快的用户。虽然删除这些线路将需要下载的所有图像加速时间,我不建议这样做。
NOTE: There is a 3 second pause strategically inserted at various points in the code. Some websites have hammer scripts that will block or even ban users who are browsing a site too quickly. Although removing these lines will speed up the time it takes to download all images, I would not recommend doing so.
Imports System.Net
Imports System.IO
Public Class Form2
Const sUsername As String = "USERNAMEHERE"
Const sPassword As String = "PASSWORDHERE"
Const sImageResolution As String = "1920x1080"
Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
Const sMainURL As String = "http://www.cgwallpapers.com/"
Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
Const sDownloadURLRight As String = "&res="
Private oCookieCollection As CookieCollection = Nothing
Private nMaxErrorsInSuccession As Int32 = 10
Private nCurrentID As Int32 = 1
Private sDownloadPath As String = Application.StartupPath
Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
StartScrape()
End Sub
Private Sub StartScrape()
Try
Dim bContinue As Boolean = True
Dim sPostData(5) As String
sPostData(0) = UrlEncode("action")
sPostData(1) = UrlEncode("go")
sPostData(2) = UrlEncode("email")
sPostData(3) = UrlEncode(sUsername)
sPostData(4) = UrlEncode("wachtwoord")
sPostData(5) = UrlEncode(sPassword)
If GetMethod(sMainURL) = True Then
If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
' Login successful
Dim nErrorsInSuccession As Int32 = 0
Do Until nErrorsInSuccession > nMaxErrorsInSuccession
If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
' Always reset error count when we successfully download
nErrorsInSuccession = 0
Else
' Add one to error count because there was no image at the current id
nErrorsInSuccession += 1
End If
nCurrentID += 1
Threading.Thread.Sleep(3000) ' Wait 3 seconds to prevent loading pages too quickly
Loop
MessageBox.Show("Finished downloading images")
End If
Else
MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
End If
Catch ex As Exception
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub
Private Function GetMethod(ByVal sPage As String) As Boolean
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim stw As StreamReader
Dim bReturn As Boolean = True
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
resp = req.GetResponse ' Get the response from the server
If req.HaveResponse Then
' Save the cookie info
SaveCookies(resp.Headers("Set-Cookie"))
resp = req.GetResponse ' Get the response from the server
stw = New StreamReader(resp.GetResponseStream)
stw.ReadToEnd() ' Read the response from the server, but we do not save it
Else
MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
Dim bReturn As Boolean = False
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim str As StreamWriter
Dim sPostDataValue As String = ""
Dim nInitialCookieCount As Int32 = 0
Try
req = HttpWebRequest.Create(sPage)
req.Method = "POST"
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Referer = sReferer
req.ContentType = "application/x-www-form-urlencoded"
req.Headers.Add("Keep-Alive", "300")
If oCookieCollection IsNot Nothing Then
' Pass cookie info from the login page
req.CookieContainer = SetCookieContainer(sPage)
End If
str = New StreamWriter(req.GetRequestStream)
If sPostData.Count Mod 2 = 0 Then
' There is an even number of post names and values
For i As Int32 = 0 To sPostData.Count - 1 Step 2
' Put the post data together into one string
sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
Next i
sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above
' Post the data to the server
str.Write(sPostDataValue)
str.Close()
' Get the response
nInitialCookieCount = req.CookieContainer.Count
resp = req.GetResponse
If req.CookieContainer.Count > nInitialCookieCount Then
' Login successful
' Save new login cookies
SaveCookies(req.CookieContainer)
bReturn = True
Else
MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
bReturn = False
End If
Else
' Did not specify the correct amount of parameters so we cannot continue
MessageBox.Show("POST error. Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End If
Catch ex As Exception
MessageBox.Show("POST error. " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
Dim req As HttpWebRequest
Dim bReturn As Boolean = False
Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
req.Headers.Add("Accept-Encoding", "gzip, deflate")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
If oCookieCollection IsNot Nothing Then
' Pass cookie info so that we remain logged in
req.CookieContainer = SetCookieContainer(sPage)
End If
' Save file to disk
Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")
If sContentDisposition IsNot Nothing Then
' There is an image to download
Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim
Using responseStream As IO.Stream = oResponse.GetResponseStream
Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
Dim buffer(2047) As Byte
Dim read As Integer
Do
read = responseStream.Read(buffer, 0, buffer.Length)
fs.Write(buffer, 0, read)
Loop Until read = 0
responseStream.Close()
fs.Flush()
fs.Close()
End Using
responseStream.Close()
End Using
bReturn = True
End If
oResponse.Close()
End Using
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
bReturn = False
End Try
Return bReturn
End Function
Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
Dim oCookieContainerObject As New System.Net.CookieContainer
Dim oCookie As System.Net.Cookie
For c As Int32 = 0 To oCookieCollection.Count - 1
If IsDate(oCookieCollection(c).Value) = False Then
oCookie = New System.Net.Cookie
oCookie.Name = oCookieCollection(c).Name
oCookie.Value = oCookieCollection(c).Value
oCookie.Domain = New Uri(sPage).Host
oCookie.Secure = False
oCookieContainerObject.Add(oCookie)
End If
Next
Return oCookieContainerObject
End Function
Private Sub SaveCookies(sCookieString As String)
' Convert cookie string to global cookie collection object
Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())
oCookieCollection = New CookieCollection
For Each sCookie As String In sCookieStrings
If sCookie.Trim <> "" Then
Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)
oCookieCollection.Add(New Cookie(sName, sValue))
End If
Next
End Sub
Private Sub SaveCookies(oCookieContainer As CookieContainer)
' Convert cookie container object to global cookie collection object
oCookieCollection = New CookieCollection
For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
oCookieCollection.Add(oCookie)
Next
End Sub
Private Function UrlEncode(ByRef URLText As String) As String
Dim AscCode As Integer
Dim EncText As String = ""
Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)
Try
For i As Long = 0 To UBound(bStr)
AscCode = bStr(i)
Select Case AscCode
Case 48 To 57, 65 To 90, 97 To 122, 46, 95
EncText = EncText & Chr(AscCode)
Case 32
EncText = EncText & "+"
Case Else
If AscCode < 16 Then
EncText = EncText & "%0" & Hex(AscCode)
Else
EncText = EncText & "%" & Hex(AscCode)
End If
End Select
Next i
Erase bStr
Catch ex As WebException
MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
Return EncText
End Function
End Class
这篇关于从网站身份验证自动图片下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!