机械化 - 如何关注或“点击"Meta 在 Rails 中刷新 [英] Mechanize - How to follow or "click" Meta refreshes in rails

查看:56
本文介绍了机械化 - 如何关注或“点击"Meta 在 Rails 中刷新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Mechanize 上有点问题.

当使用 Mechanize 提交表单时.我来到一个带有元刷新的页面,并且没有链接.

我的问题是如何关注元刷新?

我曾尝试允许元刷新,但随后出现套接字错误.示例代码

需要'机械化'代理 = WWW::Mechanize.newagent.get("http://euroads.dk")表单 = agent.page.forms.firstform.username = "用户名"form.password = "密码"表单提交page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")代理.page.body

回复:

<头><META HTTP-EQUIV=\"Refresh\" CONTENT=\"0;URL=index.php?showpage=m_frontpage\"></html>

然后我尝试:

redirect_url = page.parser.at('META[HTTP-EQUIV=\"Refresh\"]')["0;URL=index.php?showpage=m_frontpage\"][/url=(.+)/, 1]

但我明白了:

<前>NoMethodError: 未定义的方法 '[]' 为 nil:NilClass

解决方案

在内部,Mechanize 使用 Nokogiri 处理将 HTML 解析为 DOM.您可以访问 Nokogiri 文档,以便您可以使用 XPath 或 CSS 访问器在返回的页面中进行挖掘.

这是仅使用 Nokogiri 获取重定向 URL 的方法:

需要'nokogiri'html = <<EOT<头><meta http-equiv="refresh" content="2;url=http://www.example.com/"><身体>富</html>EOTdoc = Nokogiri::HTML(html)redirect_url = doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]redirect_url # =>http://www.example.com/"

doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1] 分解为: 标记的 CSS 访问器的第一次出现 (at),带有 refreshhttp-equiv 属性代码>.获取该标签的 content 属性并返回 url= 后面的字符串.

这是一些典型用途的机械化代码.因为您没有提供基于我的示例代码,所以您必须从这里开始工作:

agent = Mechanize.newpage = agent.get('http://www.examples.com/')redirect_url = page.parser.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]页面 = agent.get(redirect_url)

<小时>

at('META[HTTP-EQUIV=\"Refresh\"]')

您的代码具有上述 at().请注意,您正在转义单引号字符串中的双引号.这导致反斜杠后跟字符串中的双引号,这不是我的示例使用的内容,并且是我对您为什么会收到错误的第一个猜测.Nokogiri 找不到标签,因为没有 <meta http-equiv=\"Refresh\"...>.

机械化有一个内置的方式来处理元刷新,通过设置:

 agent.follow_meta_refresh = true

它还具有解析元标记并返回的方法内容.来自文档:

<块引用>

解析(内容,uri)

从元标记的内容属性解析延迟和 url.当没有指定 url 时,Parse 需要当前页面的 uri 来推断 url.如果给出一个块,解析的延迟和 url 将传递给它以进行进一步处理.如果延迟和 url 无法解析,则返回 nil.

# uri = URI.parse('http://current.com/')Meta.parse("5;url=http://example.com/", uri) # =>['5', 'http://example.com/']Meta.parse("5;url=", uri) # =>['5', 'http://current.com/']Meta.parse("5", uri) # =>['5', 'http://current.com/']Meta.parse("无效内容", uri) # =>零

I have a bit trouble with Mechanize.

When a submit a form with Mechanize. I am come to a page with one meta refresh and there is no links.

My question is how do i follow the meta refresh?

I have tried to allow meta refresh but then i get a socket error. Sample code

require 'mechanize'
agent = WWW::Mechanize.new
agent.get("http://euroads.dk")
form = agent.page.forms.first
form.username = "username"
form.password = "password"
form.submit
page = agent.get("http://www.euroads.dk/system/index.php?showpage=login")
agent.page.body

The response:

<html>
 <head>
   <META HTTP-EQUIV=\"Refresh\" CONTENT=\"0;URL=index.php?showpage=m_frontpage\">
 </head>
</html>

Then I try:

redirect_url = page.parser.at('META[HTTP-EQUIV=\"Refresh\"]')[
  "0;URL=index.php?showpage=m_frontpage\"][/url=(.+)/, 1]

But I get:

NoMethodError: Undefined method '[]' for nil:NilClass

解决方案

Internally, Mechanize uses Nokogiri to handle parsing of the HTML into a DOM. You can get at the Nokogiri document so you can use either XPath or CSS accessors to dig around in a returned page.

This is how to get the redirect URL with Nokogiri only:

require 'nokogiri'

html = <<EOT
<html>
  <head>
    <meta http-equiv="refresh" content="2;url=http://www.example.com/">
    </meta>
  </head>
  <body>
    foo
  </body>
</html>
EOT

doc = Nokogiri::HTML(html)
redirect_url = doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
redirect_url # => "http://www.example.com/"

doc.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1] breaks down to: Find the first occurrence (at) of the CSS accessor for the <meta> tag with an http-equiv attribute of refresh. Take the content attribute of that tag and return the string following url=.

This is some Mechanize code for a typical use. Because you gave no sample code to base mine on you'll have to work from this:

agent = Mechanize.new
page = agent.get('http://www.examples.com/')
redirect_url = page.parser.at('meta[http-equiv="refresh"]')['content'][/url=(.+)/, 1]
page = agent.get(redirect_url)


EDIT: at('META[HTTP-EQUIV=\"Refresh\"]')

Your code has the above at(). Notice that you are escaping the double-quotes inside a single-quoted string. That results in a backslash followed by a double-quote in the string which is NOT what my sample uses, and is my first guess for why you're getting the error you are. Nokogiri can't find the tag because there is no <meta http-equiv=\"Refresh\"...>.

EDIT: Mechanize has a built-in way to handle meta-refresh, by setting:

 agent.follow_meta_refresh = true

It also has a method to parse the meta tag and return the content. From the docs:

parse(content, uri)

Parses the delay and url from the content attribute of a meta tag. Parse requires the uri of the current page to infer a url when no url is specified. If a block is given, the parsed delay and url will be passed to it for further processing. Returns nil if the delay and url cannot be parsed.

# <meta http-equiv="refresh" content="5;url=http://example.com/" />
uri = URI.parse('http://current.com/')

Meta.parse("5;url=http://example.com/", uri)  # => ['5', 'http://example.com/']
Meta.parse("5;url=", uri)                     # => ['5', 'http://current.com/']
Meta.parse("5", uri)                          # => ['5', 'http://current.com/']
Meta.parse("invalid content", uri)            # => nil

这篇关于机械化 - 如何关注或“点击"Meta 在 Rails 中刷新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆