在两个单独的页面上抓取需要登录用户名和密码的站点 [英] Scraping a site that requires login username and password on two separate pages

查看:30
本文介绍了在两个单独的页面上抓取需要登录用户名和密码的站点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从我的公司 Intranet 中抓取信息,以便我可以通过仪表板在我们的办公室墙板上显示信息.我正在尝试使用以下提供的信息:此站点.除了菜鸟之外,我遇到的问题是,为了访问我想要抓取的信息,我需要登录到我们的 Intranet,在一个页面上提供我的用户名,然后提交到另一个,以便我可以提供我的密码.登录后,我可以链接和抓取我的数据.

I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and scrape my data.

这是我的登录用户名页面的一些源代码:

Here is some source code from my login username page:

<form action='loginauthpwd.asp?PassedURL=' method='post' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Username:</span><br><input type='text' class='normal' autocomplete='off' id='LoginUser' name='LoginUser' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='button' value='Go' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var username=document.getElementById('LoginUser').value; if (username.length > 2) { submit(); } else { alert('Enter your Username.'); }"></form>

这是我的登录密码页面的一些来源:

Here is some source from my login password page:

<form action='loginauthprocess.asp?UserName=******&Page=&PassedURL=' target='_top' method='post' onsubmit='checkMyBrowser();' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Password:</span><br><input class='normal' type='password' autocomplete='off' id='LoginPassword' name='LoginPassword' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='submit' value='Log In' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var password=document.getElementById('LoginPassword').value; if (password.length > 2) { submit(); } else { alert('Enter your Password.'); }"></form>

使用所述资源的示例,这是我认为应该有效但似乎不是:

Using said resource's example this is what I think should work but doesn't seem to be:

require 'mechanize'
@agent = Mechanize.new
@agent.verify_mode = OpenSSL::SSL::VERIFY_NONE

##Login Page:
page = @agent.get 'http://www.website_here.com/intranet/login.asp'

##Username Page:
form = page.forms[0]
form['USER NAME HERE'] = LoginUser
##Submit User:
page = form.submit

##Password Page:
form = page.forms[0]
form['USER PASSWORD HERE'] = LoginPassword
##Submit Password:
page = form.submit

当我测试我的代码时,我得到以下输出:

When I test my code I get the following output:

test.rb:10:in `': 未初始化的常量 LoginUser (NameError)

谁能指出我做错了什么?

Can anyone point out what I'm doing wrong?

谢谢

编辑 3/27/15:

使用@seoyoochan 资源,我尝试编写如下代码:

Using @seoyoochan resource I tried to form my code like this:

require 'rubygems'
require 'mechanize'
login_page  = agent.get "http://www.website_here.com/intranet/loginauthusr.asp?Page="
login_form = login_page.form_with(action: '/sessions') 
user_field = login_form.field_with(name: "session[user]") 
user.value = 'My User Name'

login_form.submit

当我尝试运行我的代码时,我现在得到了这个输出:

When I try to run my code I'm now getting this output:

test.rb:4:in <main>':未定义的局部变量或方法agent' for main:Object (NameError)

test.rb:4:in <main>': undefined local variable or methodagent' for main:Object (NameError)

我需要一个示例,说明如何分配我提供的表单将使用的正确名称/类.

I'm needing an example on how to assign the right names/classes that my provided form will work with.

编辑 4/4/15:

好的,现在使用@tylermauthe 示例我正在尝试测试以下代码:

Okay, Now using @tylermauthe example I'm trying to test the following code:

require 'mechanize'
require 'io/console'

agent = Mechanize.new
page = agent.get('http://www.website_here.com/intranet/loginauthusr.asp?Page=')

form = page.forms.find{|form| form.action.include?("loginauthpwd.asp?PassedURL=")}

puts "Login:"
form.login = gets.chomp
page = agent.submit(form)
pp page

现在我的想法是,这段代码应该允许我输入并提交我的用户名,从而将我带到我的下一个页面,该页面会要求我输入密码.但是,当我尝试运行它并输入我的用户名时,我得到以下输出:

Now my thoughts are that this code should allow me to enter and submit my username bringing me to my next page that would ask for my password. BUT, when I try to run it and enter my username, I get the following output:

/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing':未定义方法loginUser=' for # (NoMethodError)来自 scraper.rb:10:in `'

/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing': undefined methodloginUser=' for # (NoMethodError) from scraper.rb:10:in `'

我遗漏了什么或输入错误?请参阅我的第一次编辑以了解我的表单是如何编码的.另外要清楚的是,我没有以这种方式对表格进行编码.我只是想学习如何编码和抓取需要在我的 Dashing Dashboard 项目上显示的数据.

What am I missing or have entered wrong? Please refer to my first edit to see how my form is coded. Also to be clear I did not code the forms this way. I'm only trying to learn how to code and scrape data needed to display on my Dashing Dashboard project.

推荐答案

我能够通过以下示例登录.感谢所有帮助我学习的资源和示例的人!

I was able to get logged in with the following example. Thanks to everyone that helped me with all the resources and examples to learn from!

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new

# Below opens URL requesting username and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
username_field = login_form.field_with(:name => "user_session[username]")
username_field = "YOUR USERNAME HERE"
page = agent.submit login_form

# Below opens URL requesting password and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
password_field = login_form.field_with(:name => "user_session[password]")
password_field = "YOUR PASSWORD HERE"
page = agent.submit login_form

# Below will print page showing information confirming that you have logged in.

pp page

我从用户那里找到了以下示例:Senthess 这里.我仍然不是 100% 了解所有单独的代码在做什么,所以如果有人想花时间分解它,请这样做.这将有助于我自己和其他人更好地理解.

I found the following example from user:Senthess HERE. I'm still not 100% on what all the individual code is doing so if anyone would like to take the time and break it down, please do so. This will help myself and others to better understand.

谢谢!

这篇关于在两个单独的页面上抓取需要登录用户名和密码的站点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆