up:如何存储会话(包括cookie,页面状态,本地存储等)并在以后继续? [英] Puppeteer: how to store a session (including cookies, page state, local storage, etc) and continue later?

查看:113
本文介绍了up:如何存储会话(包括cookie,页面状态,本地存储等)并在以后继续?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能打开一个Puppeteer脚本并与页面进行交互,然后按原样保存该浏览器会话,并加载另一个脚本并从那里继续?

Is it possible to have a Puppeteer script that opens and interacts with a page, and then saves that browser sessions as-is, and have another script load that and continue from there?

浏览器会话"是指当前加载的页面,包括页面状态(DOM空间和javascript变量等),Cookie,本地存储,整个页面.基本上,它所需要的所有内容都必须从上一个脚本停止处继续进行.

By "browser session" I mean the currently loaded page including the page state (DOM space and javascript variables etc), cookies, local storage, the whole shebang. Basically everything it needs to continue exactly where the previous script left off.

如果没有,那么是否可以至少导出和导入cookie以及本地存储?因此,我可以重新加载特定页面并继续进行处理,保持所有登录或会话数据不变.

If not, then is it possible to at least export and import cookies and local storage? So I can reload a particular page and continue processing, keeping any login or session data intact.

推荐答案

我不确定,但是由于Puppeteer只是Chrome DevTools协议(cdp)的包装,而cpd没有本机命令"可以满足您的要求,不可能在整个工作期间都做到这一点.

I can't say for sure, but since Puppeteer is "just" a wrapper for Chrome DevTools Protocol (cdp), and cpd doesn't have a native "command" that does what you are asking for, it's not possible to do it for the whole shebang.

但是您可以选择.一个不错的选择是为下一个脚本重新使用相同的浏览器.您只需要传递" userDataDir "puppeteer.launch命令的选项.示例: puppeteer.launch({userDataDir:'/tmp/myChromeSession'}); .每个使用此脚本的人偶脚本都将使用相同的浏览器,因此它们将共享永久" cookie.会话" Cookie(或具有过期时间的cookie)肯定会被删除,但这是cookie应该起作用的方式.

But you have options. One good option is to reutilize the same browser for the next script. You just need to pass the "userDataDir" option to puppeteer.launch command. Example: puppeteer.launch({ userDataDir: '/tmp/myChromeSession' });. Every puppeteer script that use this will use the same browser, so they will share the "permanent" cookies. The "session" cookies (or the ones that have an expiration time) sure get deleted, but this is the way that cookies are supposed to work.

有关用户数据目录的摘录:

用户数据目录包含个人资料数据,例如历史记录、书签,Cookie以及其他按安装的本地状态.

The user data directory contains profile data such as history, bookmarks, and cookies, as well as other per-installation local state.

尽管此参考文献未写任何有关Web存储的内容,但它也保留在用户数据目录中.因此,使用此选项很不错.我认为这是适合您情况的最佳选择.

Despite this reference don't write nothing about Web Storage, it is preserved on the User Data Directory too. So, using this option you are good to go. I think is the best option for your case.

您还有其他选择,例如仅复制cookie和存储(localStorage和sessionStorage).

You have other options too, like copy just the cookies and Storage (localStorage and sessionStorage).

使用木偶复制Cookie

使用木偶戏,此过程非常痛苦:您必须指定要从中复制Cookie的每个来源.例如,如果您的网站嵌入了第三方内容(例如Google登录或跟踪),则您必须从"google.com",.google.com","www.google.com"等中复制Cookie.这非常非常愚蠢而痛苦.无论如何,要复制Cookie来源 https://abc ,请发出: const abcCookies =等待page.cookies('https://abc'); 要还原它们:等待page.setCookie(... abcCookies); .由于它们是json,您可以将它们序列化并保存到磁盘,以便稍后恢复.

With puppeteer, this process is very painful: you have to specify every origin you want to coope the cookies from. For example, if your site embed third-party things, like google signin or tracking, you have to copy cookies from "google.com", ".google.com", "www.google.com", etc. It's very very dumb and painful. Anyway, to copy cookies origin https://a.b.c, issue: const abcCookies = await page.cookies('https://a.b.c'); To restore them: await page.setCookie(...abcCookies);. Since they are json, you can serialize them and save to disk, to restore later.

使用CDP复制Cookie

let { cookies } = await page._client.send('Network.getAllCookies');

参考: Network.getAllCookies

要恢复它们,请使用 Network.setCookies cdp方法.同样,您可以序列化这些cookie并保存到磁盘以供以后还原.

To restore them, you use the Network.setCookies cdp method. Again, you can serialize those cookies and save to disk to restore later.

复制存储(localStorage和sessionStorage)

您可以通过 const ls = await page.evaluate(()=> JSON.stringify(localStorage)); const ss = await page.evaluate来转移您自己的原始存储(()=> JSON.stringify(sessionStorage)); .但是,出于安全原因,您无法访问其他来源的存储.不知道等效的CDP,以为它还不存在.

You can transfer you own origin Storage via const ls = await page.evaluate(() => JSON.stringify(localStorage)); and const ss = await page.evaluate(() => JSON.stringify(sessionStorage));. However you can't access other origins Storages for security reasons. Don't know CDP equivalent and think it doesn't exist yet.

Web缓存

如果您的站点有 service worker,它很可能会将内容保存在 Web缓存API .我不知道保存此缓存的数据是否有意义,但是如果对您来说很重要,您也可以传输这些缓存,但不使用伪造的api或cdp.您必须自己使用Cache API,然后使用page.evaluate转移缓存.

If your site has a service worker, chances are that it save things on Web Cache API. I don't know if it make any sense to save this cached data, but if is important to you, you can transfer these cache too, but not using puppeteer apis or cdp. You have to use the Cache api by yourself and transfer the cache using page.evaluate.

IndexedDB

如果要复制IndexedDB内容,可以使用cdp IndexedDB 域方法(例如"IndexedDB.requestData")来获取任何原始数据,但是您无法设置/还原该数据.:)不过,您可以使用page.evaluate以编程方式恢复数据.

If you want to copy IndexedDB contents, you can use the cdp IndexedDB domain methods (like "IndexedDB.requestData") to get the data for any origin, but you can't set/restore this data. :) You can however, in your own origin, restore the data programatically using page.evaluate.

这篇关于up:如何存储会话(包括cookie,页面状态,本地存储等)并在以后继续?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆