如何在 Owler 等网站的自动化中保留登录令牌? [英] How to preserve login tokens in automation for websites like Owler?
问题描述
我正在尝试为 angel.co 等各种网站开发抓取工具.我一直在为 www.owler.com 网站设计爬虫,因为它需要通过邮件登录,当我们尝试访问有关公司的信息时.
I am trying to develop a scraper for various sites like angel.co. I'm stuck at designing a crawler for the www.owler.com website, as it requires login through mail, when we try to access information about company.
每次登录时,我们都会在电子邮件中获得一个新的登录令牌,该令牌将在一段时间后过期.那么,是否有任何适当的解决方案可以使用带有 Py 绑定的 Selenium 在浏览器会话中保留登录会话?
Each time we login we'll get a new login token on email that will expire after some time. So, is there any proper solution to preserve the login session on the browser session using Selenium with Py-bindings?
我只是在寻找处理此类情况的指南.已经尝试使用 Selenium 自动执行此任务,但这不是一种富有成效的方法.
I'm just looking for guidelines to handle these type of situation. Already tried automating this task using Selenium, but it wasn't a fruitful approach.
推荐答案
老兄!是,这可以通过 Selenium 来完成,但这需要一些 Selenium 的高级知识 &基本了解用户如何在网站上进行身份验证
cookies
.
I got you man! YES, this can be done via Selenium, but it will take some advanced knowledge of Selenium & basic understanding of how users are authenticated
on websites & cookies
.
在我的脑海里,你有以下选择:
Off the top of my head you have the following options:
- 1. 存储电子邮件接收的身份验证链接 &将令牌以
cookie
的形式注入到您的浏览器会话中; - 2. 以特定于您正在运行测试的浏览器的 Selenium
Profile
形式存储您的会话,然后将其加载到您生成的实例上脚本.
- 1. Storing the email-received authentication link & injecting the token inside it into your browser session in the form of a
cookie
; - 2. Storing your session in the form of a Selenium
Profile
specific to the browser you're running your tests on and loading it afterwards on the instance spawned by your script.
1.(注意:这从一开始就很有魅力,所以请密切关注.)
1. (Note: This worked like a charm from the first go so follow closely.)
- 在隐身窗口中打开 www.owler.com(我使用的是 Chrome) 并打开 cookie 部分;
- 找出您正在使用的 cookie(请参阅this 打印屏幕);
Sign In
以接收您的电子邮件.检查登录链接(请参阅此 打印屏幕);- 复制&将链接加载到另一个浏览器(不是您的隐身会话);
- 登录后,打开
浏览器控制台
(F12
,或CTRL+Shift+J
在 Chrome 上)> 转到Applications
标签 > 点击Cookies
部分(对于 Owler 域)并复制OWLER_PC
cookie 的值.(有关详细信息,请参阅此 打印屏幕)立> - 在您的匿名会话(未登录)中,转到浏览器控制台并通过
document.cookie
函数以 cookie 的形式添加auth_token
,像这样:document.cookie=OWLER_PC=
; - 刷新页面2次,VOILA,您已登录.
- Open www.owler.com in an incognito window (I am using Chrome) and open the cookies section;
- Spot the cookies you are working with (see this print-screen);
Sign In
in order to receive your email. Inspect the Sign-In link (see this print-screen);- Copy & load the link into another browser (not your incognito session);
- Once you are logged-in, open the
browser console
(F12
, orCTRL+Shift+J
on Chrome) > go toApplications
tab > click onCookies
section (for the Owler domain) and copy the value ofOWLER_PC
cookie. (see this print-screen for more details) - In your anonymous session (not logged in), go to the browser console and add the
auth_token
in the form of a cookie, via thedocument.cookie
function, like this:document.cookie=OWLER_PC=<yourTokenHere>
; - Refresh the page 2 times, and VOILA, you are logged in.
注意:我知道您必须将该 cookie 添加为 OWLER_PC
,因为我已经检查了登录会话,这是唯一一个新的 cookie.cookie 的值(通常)与您通过电子邮件收到的身份验证令牌相同.
Note: I knew that you have to add that cookie as OWLER_PC
, because I've inspected the logged-in session and that was the only cookie that was new. The cookie's value (usually) is the same as the authentication token you receive via email.
现在剩下要做的就是通过代码模拟这一点.您必须在脚本中存储这些电子邮件身份验证令牌之一(请注意,它们会在 1 年后过期,所以您应该没问题).
Now all that is left to do is simulate this via code. You have to store one of these email authentication tokens in your script (notice they expire in 1 year, so you should be good).
然后,一旦您打开会话,请使用您正在使用的框架/语言的 Selenium 绑定来添加所述 cookie,然后刷新页面.对于 WedriverIO/JavaScript(我选择的武器),它是这样的:
Then once you've opened your session, use the Selenium bindings for the framework/language you are using to add said cookie, then refresh the page. For WedriverIO/JavaScript (my weapons of choice) it goes something like this:
browser.setCookie({name: 'OWLER_PC', value: 'SPF-yNNJSXeXJ...'});
browser.refresh();
browser.refresh();
// Assert you are logged in
2. 有时,您不想添加 cookie,或者编写样板代码来登录网站,或者在您的 Selenium 上加载一组特定的浏览器扩展驱动程序实例.所以你使用浏览器配置文件.
2. Sometimes, you don't want to add cookies, or write boiler-plate code to just be logged into a website, or have a specific set of browser-extensions loaded on your Selenium driver instance. So you use Browser Profiles.
您必须记录自己,因为这是一个冗长的主题.这个 问题也可能对您有帮助,因为您正在使用 Python Selenium 绑定.
You will have to document yourself on it as it is a lengthy topic. This question might also help you as you are using Python Selenium bindings.
希望这会有所帮助!
这篇关于如何在 Owler 等网站的自动化中保留登录令牌?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!