从任何网站收集数据/信息 [英] collect the data/information from any website

查看:135
本文介绍了从任何网站收集数据/信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我正在寻找从任何网站阅读收集数据/信息的方法
例如:从亚马逊网站上,我需要特定主题的书籍和作者清单?或从博客网站获取所有标题和日期的列表,或从披萨网站获取所有类型的披萨和价格
并将其提取到XML文件甚至DB表中

任何人都知道任何API都可以做到这一点.

感谢

Hi
I''m looking for away to read the collect the data/information from any website
for example: from amazon website I need the list of books and authors for specific subject? or from blogs website get a list of all titles and dates or from pizza website get all type of pizza and the prices
and extract it in XML file or even DB table

any one know any API do that.

thanks

推荐答案

您必须为要从中收集数据的每个站点编写一个解析器.亚马逊拥有自己的API,可用于从中检索数据.博客网站具有自己的API,可以从中收集数据,但是您可以轻松地将其RSS feed用作检索内容的标准接口.比萨网站会要求您编写专门的解析器,以从每个页面的HTML中提取信息.您需要为每个要从中收集数据的站点提供自定义信息,以告诉解析器如何找到您要查找的数据.
You''d have to write a parser for each and every site you want to gather data from. Amazon has it''s own API to use to retrieve data from. Blog sites have their own API''s to gather data from, but you could easily just use their RSS feeds as a standard interface for retrieving contents. Pizza sites would require you to write a specialized parser to pull the information from the HTML of each page. You''d need to to supply custom information for each and every site you wanted to gather data from to tell the parser how to find the data you''re looking for.


您使用网络爬虫?

许多优秀的Web搜寻器可让您从全部或部分网站中提取所有网页(以及信息).那里有一些不错的东西,只需要一点搜索即可.它们对于获取信息非常有用,对于保留有用但即将被删除的站点的副本非常有用; P
Or could you use a web crawler?

Many good web crawlers let you pull all the webpages (and thus info) off all or select parts of a website. There are some good ones out there it just takes a bit of searching. They can be very useful for getting information and excellent for keeping copies of sites that are useful but about to be pulled down ;P


以上答案为您提供了足够的信息.但是,您也可以编写自己的网络机器人.有关如何编写网络机器人的Google.

这里有一些链接...

这是一本提供实例的书.
http://www.heatonresearch.com/articles/series/20 [ http://www.download3k.com/Software-Development/编辑器工具/下载-Web-Bot-Programming-Library.html [
The above answer given you enough information. However it is also possible for you to write your own webbots. Google for how to write a webbot.

Here are a few links...

It is a book which give examples.
http://www.heatonresearch.com/articles/series/20[^]

Here is a free library

http://www.download3k.com/Software-Development/Editors-Tools/Download-Web-Bot-Programming-Library.html[^]

This snippet is from the book Introduction to Neural Networks for C#, second Edition published by Heaton Research www.heatonresearch,com

Chapter 13: Bot Programmi ng and Neural Networks
• Creating a Simple Bot
• Analyzing Text
• Training a Neural Bot
• Using a Neural Bot
Bots are computer programs that are designed to use the Internet in much the
same way as humans use it. Neural networks can be useful in developing bots.
In this<br />
chapter you will see how a neural network can be used to assist a bot in finding desired<br />
information on the Internet. 


这篇关于从任何网站收集数据/信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆