如何使用 HTML Agility Pack 从网站检索所有图像? [英] How can I use HTML Agility Pack to retrieve all the images from a website?

查看:36
本文介绍了如何使用 HTML Agility Pack 从网站检索所有图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚下载了 HTMLAgilityPack,文档中没有任何示例.

I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.

我正在寻找一种从网站下载所有图像的方法.地址字符串,而不是物理图像.

I'm looking for a way to download all the images from a website. The address strings, not the physical image.

<img src="blabalbalbal.jpeg" />

我需要拉取每个 img 标签的来源.我只是想感受一下图书馆以及它可以提供什么.每个人都说这是完成这项工作的最佳工具.

I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer. Everyone said this was the best tool for the job.

编辑

public void GetAllImages()
    {
        WebClient x = new WebClient();
        string source = x.DownloadString(@"http://www.google.com");

        HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
        document.Load(source);

                         //I can't use the Descendants method. It doesn't appear.
        var ImageURLS = document.desc
                   .Select(e => e.GetAttributeValue("src", null))
                   .Where(s => !String.IsNullOrEmpty(s));        
    }

推荐答案

您可以使用 LINQ 执行此操作,如下所示:

You can do this using LINQ, like this:

var document = new HtmlWeb().Load(url);
var urls = document.DocumentNode.Descendants("img")
                                .Select(e => e.GetAttributeValue("src", null))
                                .Where(s => !String.IsNullOrEmpty(s));

EDIT:这段代码现在可以正常工作了;我忘记写document.DocumentNode.

EDIT: This code now actually works; I had forgotten to write document.DocumentNode.

这篇关于如何使用 HTML Agility Pack 从网站检索所有图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆