通过从Visual Basic中的网站上检索数据 [英] Retrieve data from a website via Visual Basic

查看:140
本文介绍了通过从Visual Basic中的网站上检索数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有就是这个网站,我们购买的部件从它们的每个在其自己的网页部分提供了相关细节。例如:的http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND.我一定要找到他们所有的部件是在我们的数据库中,并添加制造商和制造商零件编号值的字段。

There is this website that we purchase widgets from that provides details for each of their parts on its own webpage. Example: http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND. I have to find all of their parts that are in our database, and add Manufacturer and Manufacturer Part Number values to their fields.

有人告诉我,有针对Visual Basic的方式来访问一个网页,并提取信息。如果有人能在哪里开始朝着正确的方向指向我,我相信我可以算出来。

I was told that there is a way for Visual Basic to access a webpage and extract information. If someone could point me in the right direction on where to start, I'm sure I can figure this out.

感谢。

推荐答案

我同意的 htmlagilitypack 的( HTTP://htmlagilitypack.$c$cplex.com/ )是实现最简单的方法。这是不太容易出错,只是使用正则表达式。下面将我怎么收拾刮。

How to scrape a website using HTMLAgilityPack (VB.Net)

I agree that htmlagilitypack (http://htmlagilitypack.codeplex.com/) is the easiest way to accomplish this. It is less error prone that just using Regex. The following will be how I deal with scraping.

下载的 htmlagilitypack 的DLL后,创建一个新的应用程序并添加引用。如果可以使用Chrome,它可以让你检查的页面,以获取有关您的信息的位置信息。右键单击您要捕捉和寻找它在发现表(按照HTML了一下)的值。

After downloading htmlagilitypack dll, create a new application and add a reference to it. If you can use Chrome, it will allow you to inspect the page to get information about where your information is located. Right click on a value you wish to capture and look for the table that it is found in (follow the html up a bit).

下面的例子将提取定价表内从该页面所有的值。我们需要知道的XPath( http://www.w3schools.com/xpath/default.asp )值为表(该值用于指示在寻找什么htmlagilitypack),以便我们创建的文档看起来我们的具体数值。这可以通过查找任何结构中的值是并点击右键复制XPath来实现。由此我们得到...

The following example will extract all the values from that page within the "pricing" table. We need to know the xpath(http://www.w3schools.com/xpath/default.asp) value for the table (this value is used to instruct htmlagilitypack on what to look for) so that the document we create looks for our specific values. This can be achieved by finding whatever structure your values are in and right click copy xpath. From this we get...

//*[@id="pricing"]

请注意,有时你从Chrome中得到的XPath可能会相当大。你可以经常发现关于表的值是一些独特简化它。在这个例子中是ID,但在其他情况下,它很容易被标题或类或什么的。

Please note that sometimes the xpath you get from Chrome may be rather large. You can often simplify it by finding something unique about the table your values are in. In this example it is "id", but in other situations it could easily be headings or class or whatever.

此值的XPath查找的东西用id等于定价,这是我们的餐桌。当我们看得更远,我们看到,我们的价值观是TBODY,TR和TD标签内。与TBODY所以忽略它HtmlAgilitypack不能很好地工作。我们的新XPath是...

This xpath value looks for something with the id equal to pricing, that is our table. When we look further in, we see that our values are within tbody,tr and td tags. HtmlAgilitypack doesn't work well with the tbody so ignore it. Our new xpath is...

//*[@id='pricing']/tr/td

这说的XPath查找的页中的定价ID,然后寻找它的TR和TD标签内的文本。现在我们添加了code ...

This xpath says look for the pricing id within the page, then look for text within its tr and td tags. Now we add the code...

Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='pricing']/tr/td")

Next

要提取值我们只是引用在我们的循环创造了我们的表值和它的成员的innerText

To extract the values we simply reference our table value that was created in our loop and it's innertext member.

    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Doc = Web.Load("http://www.digikey.ca/product-search/en?lang=en&site=ca&KeyWords=AE9912-ND")
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='pricing']/tr/td")
        MsgBox(table.InnerText)
    Next

现在我们有一个弹出的价值观消息框...你可以切换消息框,一个ArrayList填充或任何方式要存储的值。现在,只需做你希望得到的任何其他表一样。

Now we have message boxes that pop up the values...you can switch the message box for an arraylist to fill or whatever way you wish to store the values. Now simply do the same for whatever other tables you wish to get.

请注意,所创建的文件变量是reuasable,因此,如果您通过在同一页面不同的表要循环,您不必重新加载页面。这是一个好主意,尤其是如果你正在许多请求,你不想踩住网站,如果你是自动化了大量的擦伤,提出请求的一段时间。

Please note that the Doc variable that was created is reuasable, so if you wanted to cycle through a different table in the same page, you do not have to reload the page. This is a good idea especially if you are making many requests, you don't want to slam the website, and if you are automating a large number of scrapes, it puts some time between requests.

刮痧是真的那么容易。这是基本的想法。玩得开心!

Scraping is really that easy. That's is the basic idea. Have fun!

这篇关于通过从Visual Basic中的网站上检索数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆