WEB数据抓取器/Web数据提取 [英] WEB Data Scraper / Web data extraction

查看:87
本文介绍了WEB数据抓取器/Web数据提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HI

有人可以帮我做Web数据搜刮吗?
问题是,就像我想抓取一个网站(例如:www.dell.com)一样,其中包含诸如
之类的信息 项目描述,制造商,其站点上某些产品的供应商名称,价格等.

是否有任何外部工具可以执行此操作,或者我们可以编写一个抓取程序吗.

有人可以帮我这个功能吗.


谢谢

HI

Can somebody please help me out in doing Web Data Scraping.
Problem is that like I want to scrape a web site (ex:- www.dell.com) with some information like
item Description , manufacturer ,Supplier Par no, price, etc of the some of the products in their site.

Are there any external tools to do this or Can we write a scraping program..

Can somebody please help me out in this feature.


Thanks

推荐答案

我不知道有什么工具可以做到这一点,但是有类似的工具,您可以自己构建.

我曾经这样做是为了从网页上下载所有链接.您可能想看一看有关Code Project的网络爬虫,并借鉴其中的一些想法. br/>
基本上,您使用WebRequest来获取页面的HTML.然后,您可以使用各种方法来提取所需的数据.我选择使用正则表达式,因为它们又快又脏,而且我再也不会使用该应用程序了.您还可以解析HTML.

您将需要查找模式,就像包含字符串"Item Description"的任何TD元素一样,然后查看在此之后定义的下一个TD(将保留目标数据的位置).然后,您可以将该算法应用于使用该模式的每个页面.您还可以让网络爬虫为您搜索www.dell.com上的每个页面,这样您就可以让它自动搜索页面和页面内容.
I don''t know of a tool to do exactly this, but there are similar tools and you could build one yourself.

I once did this to download all links from a webpage. You might want to look at one of the web crawlers on Code Project and borrow some of their ideas.

Basically, you use a WebRequest to get the HTML for a page. Then, you can use various means to extract the data you want. I chose to use regular expressions, because they were quick and dirty and I was never going to use the application again. You could also parse the HTML.

You''d have to look for patterns, like any TD element that contains the string "Item Description", then look at the next TD defined after that (where your target data will be held). You could then apply that algorithm to every page that uses that pattern. You could also have the web crawler search every page on www.dell.com for you, that way you can have it search the pages and the contents of the pages automatically.


这篇关于WEB数据抓取器/Web数据提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆