网页抓取进度条 [英] Web scraping with progress bar

查看:77
本文介绍了网页抓取进度条的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最初我计划将程序构建为桌面应用程序但是当我考虑它并与朋友讨论时我们都同意它将更好地用作网站(对于最终用户)。我熟悉C#,但我对Web开发并不熟悉。



我最初构建数据库的当前控制台应用程序使用此方法进行Web抓取(使用HtmlAgilityPack)

Originally I planned to build a program as a desktop application but as I thought about it and discussed it with friends we all agreed it would be better served as a website (for the end users). I am familiar with C# but I am completely new to web development.

My current console application for originally building the DB uses this method for web scraping (using HtmlAgilityPack)

public static List<Set> GetSets()
        {
            var sets = new List<Set>();
            var setsDoc = Load("URL");
            var count = SetCount();
            var on = 0;
            foreach (var link in setsDoc.DocumentNode.SelectNodes("//a[@href]"))
            {
                var att = link.Attributes["href"];
                if (att.Value.Contains("URL"))
                {
                    var setName = WebUtility.HtmlDecode(link.FirstChild.InnerText);
                    var setLink = "URL" + att.Value;
                    Console.Clear();
                    Console.WriteLine("Gathering Set data.");
                    Console.WriteLine("Set: " + setName);
                    Console.WriteLine(++on + " out of " + count);
                    DrawProgressBar(on, count, 20, '+');
                    if (!Program.SetExists(setName, setLink))
                    {
                        sets.Add(GetSet(setLink, setName));
                    }
                }
            }
            return sets;
        }



尝试将其转换为MVC 5中的网页时我完全失去了如何做到这一点,并保持进度条。通常情况下这个动作会非常快,不需要进度条,但是我正在抓取的网站需要我每10秒才会打到他们的网站大约1次,这会把它变成可能需要一段时间的东西。



我真的对任何类型的解决方案持开放态度,但是我想了解为什么我会使用它以及它是如何工作的(如果不是不言而喻的话),甚至一些伪代码或研究细节也会大。这将是一个令人难以置信的帮助,非常感谢,因为我甚至不知道从哪里开始。


While trying to turn this into a webpage in MVC 5 I am at a complete loss how I could do this, and maintain the progress bar. Normally this action would be very fast and not need a progress bar however the site I am scraping requires that I only hit their website about 1 time every 10 seconds which turns it into something that can take a while.

I am open to any type of solution really, however I want to understand why I would use it and how it works (if not self evident), even some pseudo code or specifics to research would be great. This would be an incredible help and very much appreciated as I am at a loss on where to even get started.

推荐答案

这篇关于网页抓取进度条的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆