下载网站的所有图片 [英] download all Images of a Website
问题描述
所以我昨晚才开始学习 C#.我开始的第一个项目是一个简单的 Image-Downloader,它使用 HtmlElementCollection 下载网站的所有图像.
So I just started learning C# last night. The first project I started was a simple Image-Downloader, which downloads all images of a website using HtmlElementCollection.
这是我目前得到的:
private void dl_Click(object sender, EventArgs e)
{
System.Net.WebClient wClient = new System.Net.WebClient();
HtmlElementCollection hecImages = Browser.Document.GetElementsByTagName("img");
for (int i = 0; i < hecImages.Count - 1; i++)
{
char[] ftype = new char[4];
string gtype;
try
{
//filetype
hecImages[i].GetAttribute("src").CopyTo(hecImages[i].GetAttribute("src").Length -4,ftype,0,4) ;
gtype = new string(ftype);
//copy image to local path
wClient.DownloadFile(hecImages[i].GetAttribute("src"), absPath + i.ToString() + gtype);
}
catch (System.Net.WebException)
{
expand_Exception_Log();
System.Threading.Thread.Sleep(50);
}
基本上它是提前渲染页面并寻找图像.这工作得很好,但由于某种原因,它只下载缩略图,而不是完整的(高分辨率)图像.
Basically it's rendering the page in advance and looking for the images. This works pretty well, but for some reason it only downloads the Thumbnails, but not the full (high-res) image.
其他来源:
WebClient.DownloadFile 上的文档:http:///msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx
Documentation on WebClient.DownloadFile: http://msdn.microsoft.com/en-us/library/ez801hhe(v=vs.110).aspx
DownloadFile 方法从 address 参数中指定的 URI 下载数据到本地文件.
The DownloadFile method downloads to a local file data from the URI specified by in the address parameter.
推荐答案
看看 如何使用 HTML Agility Pack 从网站上检索所有图像?
这使用名为 HTML Agility Pack
的库来下载网站上的所有 <img src="" \>
行.如何才能我使用 HTML Agility Pack 从网站检索所有图像?
This uses a library called HTML Agility Pack
to download all <img src="" \>
lines on a website.
How can I use HTML Agility Pack to retrieve all the images from a website?
如果该主题不知何故消失了,我会将其提供给需要它但无法触及该主题的人.
If that topic somehow disappears, I'm putting this up for those who need it but can't reach that topic.
// Creating a list array
public List<string> ImageList;
public void GetAllImages()
{
// Declaring 'x' as a new WebClient() method
WebClient x = new WebClient();
// Setting the URL, then downloading the data from the URL.
string source = x.DownloadString(@"http://www.google.com");
// Declaring 'document' as new HtmlAgilityPack() method
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
// Loading document's source via HtmlAgilityPack
document.LoadHtml(source);
// For every tag in the HTML containing the node img.
foreach(var link in document.DocumentNode.Descendants("img")
.Select(i => i.Attributes["src"]))
{
// Storing all links found in an array.
// You can declare this however you want.
ImageList.Add(link.Attribute["src"].Value.ToString());
}
}
<小时>
由于您是新手,因此您可以使用 NuGet 轻松添加 HTML Agility Pack.要添加它,您在项目上右键单击
,单击Manage NuGet Packages
,在左侧的 Online 选项卡中搜索 HTML Agility Pack
并单击安装.您需要使用 using HtmlAgilityPack;
Since you are rather new as you stated, you can add HTML Agility Pack easily with NuGet.
To add it, you right-click
on your project, click Manage NuGet Packages
, search the Online tab on the left hand side for HTML Agility Pack
and click install. You need to call it by using using HtmlAgilityPack;
毕竟,您应该可以创建并使用已创建的方法来下载上面创建的 image_list
数组中包含的所有项目.
After all that you should be fine creating and using a method already created to download all items contained in the image_list
array created above.
祝你好运!
添加了解释每个部分的作用的注释.
Added comments explaining what each section does.
更新了代码段以反映用户评论.
Updated snippet to reflect user comment.
这篇关于下载网站的所有图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!