我需要使用红宝石从脸书游戏中刮取数据 [英] I need to scrape data from a facebook game - using ruby

查看:142
本文介绍了我需要使用红宝石从脸书游戏中刮取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

修改(澄清问题)

我已经花了几天时间才想出如何从Facebook游戏中刮取特定信息;不过,我在砖墙后遇到了砖墙。最好的说,主要的问题如下。我可以使用Chrome的检查元素工具来手动找到我需要的html - 它出现在iframe中。但是,当我尝试并擦除iframe时,它是空的(属性除外):

I've spent a few days already trying to figure out how to scrape specific information from a facebook game; however, I've run into brick wall after brick wall. As best as I can tell, the main problem is as follows. I can use Chrome's inspect element tool to manually find the html that I need - it appears nestled inside an iframe. However, when I try and scrape that iframe, it is empty (except for properties):

<iframe id="game_frame" name="game_frame" src="" scrolling="no" ...></iframe>

这是与我看到的相同的输出,如果我使用浏览器查看页面源工具。
我不明白为什么我看不到iframe中的数据。答案不是AJAX之后添加的。 (我知道,因为查看页面源可以读取Ajax添加的数据,还因为我已经等待,直到我可以看到数据页面,然后再刮它,它仍然不存在)。

This is the same output that I see if I use a browsers "View page source" tool. I don't understand why I can't see the data in the iframe. The answer is NOT that it's being added afterwards by AJAX. (I know that both because "View page source" can read data that's been added by Ajax and also because I've b/c I've waited until after I can see the data page before scraping it and it's still not there).

这是因为Facebook的反屏幕刮刮而发生的,如果是这样的话呢?或者我只是错过了一些东西。我在红宝石程序,我试过nokogiri,然后机械化,然后capybara没有成功。

Is this happening because of facebook's anti-screen scraping, and if so is there a way around it? Or am I just missing something. I program in ruby and I've tried nokogiri, then mechanize, then capybara without success.

我不知道是否有任何区别,但在我看来iframe使用iframe的game_frame引用来获取它的数据,该引用显然是指这个文档中较早出现的这个html:

I don't know if it makes any difference, but it seems to me that the iframe is getting it's data using the iframe's "game_frame" reference which apparently refers to this piece of html that appears earlier in the document:

<form id="hidden_login_form_1331840407" action="" method="POST" target="game_frame">
  <input type="hidden" name="signed_request" autocomplete="off" value="v6kIAsKTZa...">
  ...
</form>

原始问题

我写了一个红宝石程序使用nokogiri从Facebook游戏的HTML中删除数据。目前,我通过使用chrome的检查元素工具获取HTML,并将其保存到文件并从中解析出来。但是,我真的希望能够从ruby中访问信息。例如,我将通过该程序的页面名称www.gamename.com/...?id=12345,它将登录到Facebook,转到该页面并刮取数据。目前,如果我尝试,它不工作,因为我被重定向到Facebook的登录页面。我如何能通过登录屏幕访问我需要的页面?

I wrote a ruby program that uses nokogiri to scrape data from a facebook game's HTML. Currently, I get the HTML by using chrome's "inspect element" tool and I save it to a file and parse it from there. However, I would really like to be able to access the information from within ruby. For example, I would pass the program the page name "www.gamename.com/...?id=12345" and it would login to facebook, go to that page and scrape the data. Currently, if I try that, it doesn't work because I get redirected to facebook's login page. How can I get past the login screen to access the page(s) I need?

我想使用我已经编写的nokogiri代码来执行此操作;但是,如果我必须使用别的东西重写它。目前,该程序是一个独立的程序 - 而不是一个rails程序 - 但我可以改变。我看到一些可能指向Omniauth方向的信息,但我不确定这是我正在寻找的,也看起来很复杂。我希望有一个更简单的解决方案。

I would like to do this using the nokogiri code that I have already written; however, if I have to I could rewrite it using something else. Currently, the program is a standalone program - not a rails program - but I could change that. I've see some information that might point me in the direction of Omniauth but I'm not sure that's what I'm looking for and it also looks very complicated. I'm hoping there's a simpler solution.

谢谢

推荐答案

我可以为这种任务推荐 capybara-webkit 。它使用QtWebkit,并了解Javascript:

I can recommend capybara-webkit for this kind of task. It uses QtWebkit under the hood and understands Javascript:

require 'capybara-webkit'
require 'capybara/dsl'
require 'nokogiri'

include Capybara::DSL
Capybara.current_driver = :webkit

# login
visit("https://www.facebook.com")
find("#email").set("user")
find("#pass").set("password")
find("#loginbutton//input").click

# navigate to the JS-generated page
visit("www.gamename.com/...?id=12345")

# parse HTML
doc = Nokogiri::HTML.parse(body)

这篇关于我需要使用红宝石从脸书游戏中刮取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆