我怎样才能凑需要使用Node.js的身份验证的网站? [英] How can I scrape sites that require authentication using node.js?
问题描述
我遇到许多<一个href=\"http://net.tutsplus.com/tutorials/javascript-ajax/how-to-scrape-web-pages-with-node-js-and-jquery/\"相对=nofollow>教程讲解如何刮公共网站不要求认证/注册,使用Node.js的。
I've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.
有人可以解释如何刮即要求使用Node.js的登录网站?
Can somebody explain how to scrape sites that require login using node.js?
推荐答案
使用 Mikeal的申请库,需要启用Cookie支持这样的:
Use Mikeal's Request library, you need to enable cookies support like this:
var request = request.defaults({jar: true})
所以,你首先应该建立在该网站上的用户名(手动),使POST请求到该网站时,通过为PARAMS用户名和密码。在此之后,服务器将与请求会记得一个cookie响应,所以你将能够访问需要您登录到该网站的网页。
So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.
请注意:如果是这样的reCAPTCHA是登录页面上使用这个方法是行不通的。
Note: this approach doesn't work if something like reCaptcha is used on the login page.
这篇关于我怎样才能凑需要使用Node.js的身份验证的网站?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!