现场挖掘工具 [英] Site-Mining tools

查看:87
本文介绍了现场挖掘工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里提出的许多问题与我正在做的研究有关.这些问题和答案分布广泛,并不容易找到,需要进行手动浏览,有时在不相关的主题中也会有见地的答案或评论.

Many of the questions asked here are relevant to research I'm doing. These questions and answers are widely dispersed and not always easy to find, doing manual browsing, and sometimes an insightful answer or comment occurs in unrelated topics as well.

我想自动找到这些相关的Q和然后,基于关键字集的A将这些信息用作进一步深入研究的指示.

I want to automate finding these relevant Q's & A's, based on sets of keywords, then use the information as pointers towards further in-depth research.

我可以使用哪些工具(最好是开源工具)来进行这种类型的站点挖掘?我不是网络专家对我来说,尝试开发它们将花费很长时间,并且还会影响我本可以花在研发上的时间.

What tools, preferably open-source, are available that I can use for this type of site-mining? I am not a web guru & for me to try to develop them will take a long time and also impact on time I could have spent on my R&D.

推荐答案

从您的问题中尚不清楚您是否是程序员,所以我不确定在应用程序或服务方面您是否追求工具到您想要的东西,或者使站点挖掘变得容易的库.

It is not clear from your question whether you are a programmer or not, so I'm not sure whether you are after tools in the sense of apps or services that to what you want, or a library that makes site-mining easier.

如果是后者,并且您使用ruby,我可以彻底推荐 WWW :: Mechanize .它提供了一个不错的API,用于编写脚本来搜索网页(按DOM或按文本),跟踪链接以及填写表格.我已经多次使用它来组织分布在站点内多个网页上的信息.

If the latter is the case and you use ruby, I can thoroughly recommend WWW::Mechanize. It provides a nice API for writing scripts to search web pages (by DOM or by text), follow links, and fill out forms. I've used it several times to organise information that's spread over several web pages within a site.

我相信ruby版本基于早期的 perl库,但是我不能为没有使用过的Perl版本提供担保.

I believe the ruby version was based on an earlier library for perl but I can't vouch for the perl version it I've not used it.

这篇关于现场挖掘工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆