从使用Ruby的web僵尸网站自动数据检索 [英] Automate data retrieval from a web site using a Ruby web bot

查看:154
本文介绍了从使用Ruby的web僵尸网站自动数据检索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个网站,显示您的标志,当你输入你卷号。您还可以看到其他人的标记以同样的方式,通过增加自己的卷号。

Say I have a website which displays your marks when you input your roll number. You can also see others' marks the same way by incrementing your own roll number.

我想创建一个Excel工作表,找到标志的标准偏差(大专以上项目)。

I want to create an Excel sheet to find the standard deviation of the marks (college project).

这在物理上是不可能的,我手动输入所有的数据,所以我寻找一些自动化的方法,该方法能为我做这项工作,并保存在一个文本文件中的所有字段,我可以很容易地转换成表格。

It is physically impossible for me to manually enter all the data, so I am searching for some automation method which can do this work for me and save all fields in a text file, which I can easily convert to a table.

背景细节:

链接到这里网站

Link to the site here.

输入是在文本框中它。当提交被点击是从网页服务器端,并显示生成的表

The input is in a text box which. When submit is clicked the table is generated from the server side and displays in the web page.

在code查找一个网络机器人,从生成的页面发送请求,并收集数据很容易。

The code looks easy enough for a web bot to send request and collect the data from the generated page.

问题:

我不知道如何写一个网络机器人在哪里写的Web机器人。而我已经准备好学习一门编程语言上爬起来。

I have no Idea how to write a web bot where to write a web bot. And I am ready to learn a programming language ground up.

我已经开始学习/编码Ruby和将达到足以做到这一点在一个星期左右的水平。但我仍然需要帮助找到我的路,在如何做到这一点。

I have started studying/coding Ruby and would reach level enough to do this in a week or so. But I still need help to find my way, over how to do so.

如果您需要查看的网页链接和生成的页面,请随时使用我的卷号: 5675351

If you need to see the web link and the generated page, please feel free to use my roll number: 5675351

推荐答案

首先,你需要一个Ruby库,可以发出POST请求。如法拉第。然后你会发出带有参数的散列POST请求(填表)。你的情况的参数的名称是REGNO(看网页的HTML源代码来弄明白自己)和值是好您要提取数据的数量。

First of all, you will need a ruby library that can issue a POST request. Such as Faraday . Then you will issue a POST request with hash of parameters(filling the form). In your case the name of parameter is "regno"(look at the html source of the page to figure it out yourself) and the value is well the number for which you want to extract data.

你将有在这个舞台上是结果HTML页面的源代码。

What you will have on this stage is the source of html page with results.

结果都在大致相同的形式:

Results are all in roughly the same form:

<tr bgColor="#ffffff">
    <td align="middle"><font face="Arial" size=2> 301</font></td>
    <td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
    <td align="left" ><font face="Arial" size=2>084&nbsp;&nbsp;&nbsp;&nbsp;</font></td>
    <td align="middle"><font face="Arial" size=2>A2</font></td>
  </tr>

TR的只有BGCOLOR变化,当然这些数据。您需要使用定期EX pression 提取所有这些块, 例如。你可以做的更好,并使用引入nokogiri ,另一个Ruby库的XPath功能。您需要自己寻找这两个了。

Only the bgColor of tr varies and the data of course. You need to extract all these blocks using a regular expression, for example. You can do one better and use XPath feature of Nokogiri, another ruby library. You need to look these two up by yourself.

当你把所有的数据,你并不需要创建Excel工作表 - Ruby是能够通过自身做这样简单的数学的

When you have all the data, you don't need to create Excel sheet - Ruby is capable of doing such simple math by itself.

我建议要通过2位的库所有的实例和应用的所有相关的人到你的特定的任务。红宝石实际上是这样的任务,一个伟大的选择,因为库大多是好的,开始是无痛的。由于没有编程经验,虽然将沿途的事情复杂化。

I recommend going through all examples of two mentioned libraries and applying all relevant ones to your specific task. Ruby is actually a great choice for such task, as libraries are mostly good and starting is painless. Having no programming experience though will complicate things along the way.

这篇关于从使用Ruby的web僵尸网站自动数据检索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆