如何使用Ruby或Nokogiri获取页面的原始HTML源代码? [英] How to get the raw HTML source code for a page by using Ruby or Nokogiri?

查看：173 发布时间：2020/7/5 5:42:15 ruby nokogiri raw-data

本文介绍了如何使用Ruby或Nokogiri获取页面的原始HTML源代码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Nokogiri (Ruby Xpath库)在网页上grep内容.然后，我发现某些网页(例如Ajax网页)存在问题，这意味着当我查看源代码时，不会看到诸如<table>等的确切内容.

I'm using Nokogiri (Ruby Xpath library) to grep contents on web pages. Then I found problems with some web pages, such as Ajax web pages, and that means when I view source code I won't be seeing the exact contents such as <table>, etc.

如何获取实际内容的HTML代码?

How can I get the HTML code for the actual content?

推荐答案

如果您想要网页的原始资源，请不要使用Nokogiri.只需直接以字符串形式获取网页，然后不要将其提供给Nokogiri.例如:

Don't use Nokogiri at all if you want the raw source of a web page. Just fetch the web page directly as a string, and then do not feed that to Nokogiri. For example:

require 'open-uri'
html = open('http://phrogz.net').read
puts html.length #=> 8461
puts html        #=> ...raw source of the page...

另一方面，如果您想要页面的JavaScript修改后内容(例如，执行JavaScript代码以获取新内容并更改页面的AJAX库)，则不能使用Nokogiri.您需要使用Ruby来控制网络浏览器(例如，在Selenium或Watir上阅读).

If, on the other hand, you want the post-JavaScript-modified contents of a page (such as an AJAX library that executes JavaScript code to fetch new content and change the page), then you can't use Nokogiri. You need to use Ruby to control a web browser (e.g. read up on Selenium or Watir).

这篇关于如何使用Ruby或Nokogiri获取页面的原始HTML源代码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Ruby或Nokogiri获取页面的原始HTML源代码? [英] How to get the raw HTML source code for a page by using Ruby or Nokogiri?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Ruby或Nokogiri获取页面的原始HTML源代码? [英] How to get the raw HTML source code for a page by using Ruby or Nokogiri?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭