在两个单独的页面上搜寻需要登录用户名和密码的网站 [英] Scraping a site that requires login username and password on two separate pages

查看:73
本文介绍了在两个单独的页面上搜寻需要登录用户名和密码的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从公司的Intranet抓取信息,以便可以通过虚线仪表板在办公室的墙板上显示信息.我正在尝试使用以下提供的信息:本网站.除了成为菜鸟以外,我还有一个问题是,为了获得对我想抓取的信息的访问权限,我需要登录到我们的Intranet,并在一页上提供我的用户名,然后提交到另一个,以便我可以提供密码.登录后,就可以链接并抓取数据了.

I'm trying to scrape information from my companies Intranet so that I can display information on our office wall board via dashing dashboard. I'm trying to work with the provided information from:This Site.The problem that I'm having other than being a noob is that in order to gain access to the information I want to scrape, I need to login to our Intranet providing my username on one page then submitting to another so that I can provide my password. Once I'm logged in, I can then link and scrape my data.

这是我的登录用户名页面上的一些源代码:

Here is some source code from my login username page:

<form action='loginauthpwd.asp?PassedURL=' method='post' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Username:</span><br><input type='text' class='normal' autocomplete='off' id='LoginUser' name='LoginUser' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='button' value='Go' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var username=document.getElementById('LoginUser').value; if (username.length > 2) { submit(); } else { alert('Enter your Username.'); }"></form>

这是我的登录密码页面上的一些消息来源:

Here is some source from my login password page:

<form action='loginauthprocess.asp?UserName=******&Page=&PassedURL=' target='_top' method='post' onsubmit='checkMyBrowser();' style='margin: 0px;'><table border='0' cellspacing='1' width='999' height='350'><tr><td width='100'>&nbsp;</td><td valign='center' width='100'><table style='width: 350px; background-color: #EEEEEE; border: 1px solid gray;'><tr><td class='fontBlack' style='padding: 10px; vertical-align: top;'><span style='font-weight: bold;'>Password:</span><br><input class='normal' type='password' autocomplete='off' id='LoginPassword' name='LoginPassword' style='border: 1px solid gray; height: 16px; font-family: arial; font-size: 11; width: 180px;' maxlength='30'><input class='normal_button' type='submit' value='Log In' style='border: 1px solid gray; font-weight: bold; width: 80px; margin-left: 10px;' onclick="var password=document.getElementById('LoginPassword').value; if (password.length > 2) { submit(); } else { alert('Enter your Password.'); }"></form>

使用上述资源的示例,这是我认为应该有效的方法,但似乎并非如此:

Using said resource's example this is what I think should work but doesn't seem to be:

require 'mechanize'
@agent = Mechanize.new
@agent.verify_mode = OpenSSL::SSL::VERIFY_NONE

##Login Page:
page = @agent.get 'http://www.website_here.com/intranet/login.asp'

##Username Page:
form = page.forms[0]
form['USER NAME HERE'] = LoginUser
##Submit User:
page = form.submit

##Password Page:
form = page.forms[0]
form['USER PASSWORD HERE'] = LoginPassword
##Submit Password:
page = form.submit

当我测试我的代码时,我得到以下输出:

When I test my code I get the following output:

test.rb:10:in'':未初始化的常量LoginUser(NameError)

有人可以指出我在做什么错吗?

Can anyone point out what I'm doing wrong?

谢谢

修改15年3月27日:

使用@seoyoochan资源,我试图像这样形成代码:

Using @seoyoochan resource I tried to form my code like this:

require 'rubygems'
require 'mechanize'
login_page  = agent.get "http://www.website_here.com/intranet/loginauthusr.asp?Page="
login_form = login_page.form_with(action: '/sessions') 
user_field = login_form.field_with(name: "session[user]") 
user.value = 'My User Name'

login_form.submit

当我尝试运行我的代码时,我现在得到以下输出:

When I try to run my code I'm now getting this output:

test.rb:4:在<main>': undefined local variable or method agent'中表示main:Object(NameError)

test.rb:4:in <main>': undefined local variable or methodagent' for main:Object (NameError)

我需要一个示例,说明如何分配提供的表单将使用的正确名称/类.

I'm needing an example on how to assign the right names/classes that my provided form will work with.

修改15年4月4日:

好的,现在使用@tylermauthe示例,我正在尝试测试以下代码:

Okay, Now using @tylermauthe example I'm trying to test the following code:

require 'mechanize'
require 'io/console'

agent = Mechanize.new
page = agent.get('http://www.website_here.com/intranet/loginauthusr.asp?Page=')

form = page.forms.find{|form| form.action.include?("loginauthpwd.asp?PassedURL=")}

puts "Login:"
form.login = gets.chomp
page = agent.submit(form)
pp page

现在我的想法是该代码应允许我输入并提交用户名,将我带到要求输入密码的下一页.但是,当我尝试运行它并输入我的用户名时,得到以下输出:

Now my thoughts are that this code should allow me to enter and submit my username bringing me to my next page that would ask for my password. BUT, when I try to run it and enter my username, I get the following output:

/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing': undefined method loginUser ='for#(NoMethodError) 来自scraper.rb:10:in''

/var/lib/gems/1.9.1/gems/mechanize-2.7.3/lib/mechanize/form.rb:217:in method_missing': undefined methodloginUser=' for # (NoMethodError) from scraper.rb:10:in `'

我缺少什么或输入错误?请参考我的第一次编辑以查看我的表单的编码方式.同样要清楚的是,我没有以这种方式编写表格代码.我只是想学习如何对显示在Dashing Dashboard项目上的数据进行编码和抓取.

What am I missing or have entered wrong? Please refer to my first edit to see how my form is coded. Also to be clear I did not code the forms this way. I'm only trying to learn how to code and scrape data needed to display on my Dashing Dashboard project.

推荐答案

我能够通过以下示例登录.感谢所有为我提供所有学习资源和示例的人!

I was able to get logged in with the following example. Thanks to everyone that helped me with all the resources and examples to learn from!

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new

# Below opens URL requesting username and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
username_field = login_form.field_with(:name => "user_session[username]")
username_field = "YOUR USERNAME HERE"
page = agent.submit login_form

# Below opens URL requesting password and finds first field and fills in form then submits page.

login = agent.get('http://www.website_here.com')
login_form = login.forms.first
password_field = login_form.field_with(:name => "user_session[password]")
password_field = "YOUR PASSWORD HERE"
page = agent.submit login_form

# Below will print page showing information confirming that you have logged in.

pp page

我从用户那里找到以下示例:句法 HERE .对于所有单独的代码,我仍然不是100%,如果有人想花时间分解一下,请这样做.这将帮助我自己和其他人更好地理解.

I found the following example from user:Senthess HERE. I'm still not 100% on what all the individual code is doing so if anyone would like to take the time and break it down, please do so. This will help myself and others to better understand.

谢谢!

这篇关于在两个单独的页面上搜寻需要登录用户名和密码的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆