如何使用Nokogiri和Ruby解析JavaScript [英] How to parse JavaScript using Nokogiri and Ruby

查看:45
本文介绍了如何使用Nokogiri和Ruby解析JavaScript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从网站中解析出一个数组.我要解析的JavaScript部分如下所示:

I need to parse an array out of a website. The part of the JavaScript I want to parse looks like this:

_arPic[0] = "http://example.org/image1.jpg";
_arPic[1] = "http://example.org/image2.jpg";
_arPic[2] = "http://example.org/image3.jpg";
_arPic[3] = "http://example.org/image4.jpg";
_arPic[4] = "http://example.org/image5.jpg";
_arPic[5] = "http://example.org/image6.jpg";

我使用如下代码获取整个JavaScript:

I get the whole JavaScript using something like this:

product_page = Nokogiri::HTML(open(full_url))    
product_page.css("div#main_column script")[0]

是否有一种简单的方法来解析所有变量?

Is there an easy way to parse all the variables?

推荐答案

如果我没看错,您正在尝试解析JavaScript并使用您的图片网址获取Ruby数组,是吗?

If I read you correctly you're trying to parse the JavaScript and get a Ruby array with your image URLs yes?

Nokogiri仅解析HTML/XML,因此您将需要一个不同的库;粗略的搜索会显示 RKelly 库,该库具有 parse 函数,该函数需要JavaScript字符串并返回解析树.

Nokogiri only parses HTML/XML so you're going to need a different library; A cursory search turns up the RKelly library which has a parse function that takes a JavaScript string and returns a parse tree.

一旦有了一个解析树,您将需要遍历它并按名称找到感兴趣的节点(例如 _arPic ),然后在分配的另一端获取字符串内容.

Once you have a parse tree you're going to need to traverse it and find the nodes of interest by name (e.g. _arPic) then get the string content on the other side of the assignment.

或者,如果不必太健壮(也不一定),则可以使用正则表达式来搜索JavaScript:

Alternatively, if it doesn't have to be too robust (and it wouldn't be) you can just use a regex to search the JavaScript if possible:

/^\s*_arPic\[\d\] = "(.+)";$/

可能是一个很好的入门正则表达式.

might be a good starter regex.

这篇关于如何使用Nokogiri和Ruby解析JavaScript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆