用C#图片刮板 [英] Image scraper with C#

查看:118
本文介绍了用C#图片刮板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过一个网页源$ C ​​$ C,添加< IMG SRC =htt​​p://www.dot.com/image.jpg HtmlElementCollection 。然后我通过URL试图通过循环中的元素集合与foreach循环的每个元素并下载图像。

I'm trying to go through a web pages source code, add the <img src="http://www.dot.com/image.jpg" to an HtmlElementCollection. Then I'm attempting to cycle through each element in the element collection with a foreach loop and download the images through the url.

下面是我到目前为止所。我的问题现在的问题是什么是下载的,我不认为我的元素被适当地标记名称加入。如果他们是我似乎无法引用它们的下载。

Here's what I have so far. My problem right now is nothing is downloading, and I don't think my elements are being added properly by tag name. If they are I can't seem to reference them for the download.

public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
    }

    public void button1_Click(object sender, EventArgs e)
    {
        string url = urlTextBox.Text;
        string sourceCode = WorkerClass.ScreenScrape(url);
        StreamWriter sw = new StreamWriter("sourceScraped.html");
        sw.Write(sourceCode);
    }

    private void button2_Click(object sender, EventArgs e)
    {
        string url = urlTextBox.Text;
        WebBrowser browser = new WebBrowser();
        browser.Navigate(url);
        HtmlElementCollection collection;
        List<HtmlElement> imgListString = new List<HtmlElement>();
        if (browser != null)
        {
            if (browser.Document != null)
            {
                collection = browser.Document.GetElementsByTagName("img");
                if (collection != null)
                {
                    foreach (HtmlElement element in collection)
                    {
                        WebClient wClient = new WebClient();
                        string urlDownload = element.FirstChild.GetAttribute("src");
                        wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
                    }
                }
            }
        }
    }
}

}

推荐答案

您拨打导航的,你认为文件已经准备好遍历和检查图像。但实际上它需要一些时间来加载。您需要等到文档加载完成的。

Ones you call navigate, you assume document is ready to traverse and check for images. but practically it take some time to load. You need to wait until Document loading Completed.

添加事件 DocumentCompleted 您的浏览器对象

Add event DocumentCompleted to your browser object

 browser.DocumentCompleted += browser_DocumentCompleted;

实现为

static void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    WebBrowser browser = (WebBrowser)sender;
    HtmlElementCollection collection;
    List<HtmlElement> imgListString = new List<HtmlElement>();
    if (browser != null)
    {
        if (browser.Document != null)
        {
            collection = browser.Document.GetElementsByTagName("img");
            if (collection != null)
            {
                foreach (HtmlElement element in collection)
                {
                    WebClient wClient = new WebClient();
                    string urlDownload = element.GetAttribute("src");
                    wClient.DownloadFile(urlDownload, urlDownload.Substring(urlDownload.LastIndexOf('/')));
                }
            }
        }
    }
}

这篇关于用C#图片刮板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆