如何从< script>中提取文本使用nokogiri进行标记并机械化? [英] How to extract text from <script> tag by using nokogiri and mechanize?

查看:70
本文介绍了如何从< script>中提取文本使用nokogiri进行标记并机械化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是预订网站的源代码的一部分:

this is a part of the source code of a bookings web site:

<script>
booking.ensureNamespaceExists('env');
booking.env.b_map_center_latitude = 53.36480155016638;
booking.env.b_map_center_longitude = -2.2752803564071655;
booking.env.b_hotel_id = '35523';
booking.env.b_query_params_no_ext = '?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs;sid=e1c9e4c7a000518d8a3725b9bb6e5306;dcid=1';
</script>

我想提取booking.env.b_hotel_id.这样我就可以得到"25523"的值.如何使用nokogiri实现这一目标并实现机械化?

And I want to extract booking.env.b_hotel_id . So that i would get the value of '25523'. How do I achieve this with nokogiri and mechanize?

希望有人可以提供帮助!谢谢! :)

Hope somebody can help! thanks! :)

推荐答案

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://www.booking.com/hotel/us/solera-by-stay-alfred.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmcgV1c19ueYgBAZgBMbgBBMgBBNgBAegBAfgBAg;sid=695d6598485cb1a8fd9e39c5de3878ba;dcid=4;checkin=2015-10-20;checkout=2015-10-21;dist=0;group_adults=2;room1=A%2CA;sb_price_type=total;srfid=cf5d76283b73d34a1d7e0d61cad6974e38a94351X1;type=total;ucfs=1&')

match = agent.page.search("script").text.scan(/^booking.env.b_hotel_id = \'.*\'/)
puts match
puts match[0].split("'")[1]

输出:

booking.env.b_hotel_id = '1202411'
1202411

可帮助我解决这一问题的页面:

Pages that helped me figure this out:

http://robdodson.me/crawling-pages-with-机械化和nokogiri/

使用nokogiri解析javascript函数元素

正则表达式-以字符串开头和结尾

http://www.rubular.com

这篇关于如何从&lt; script&gt;中提取文本使用nokogiri进行标记并机械化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆