是否可以使用 Scrapy 从 Whatsapp Web 中抓取所有短信? [英] Is it possible to scrape all text messages from Whatsapp Web with Scrapy?

查看:52
本文介绍了是否可以使用 Scrapy 从 Whatsapp Web 中抓取所有短信?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用 Scrapy 进行网页抓取,并且我有兴趣从 Whatsapp 上的所有聊天中检索所有文本消息,以用作机器学习项目的训练数据.我知道有些网站会阻止网络爬虫/抓取工具,所以我想知道是否可以使用 Scrapy 来获取这些消息,如果不可能,我可以使用哪些替代方法?我知道我可以点击每个聊天的电子邮件聊天"选项,但如果我想获取大量数据,不仅来自我自己的聊天,还来自其他愿意让我在项目中使用他们的聊天记录.

I've been experimenting with web scraping using Scrapy, and I was interested in retrieving all text messages from all chats on Whatsapp to use as training data for a Machine Learning project. I know there are websites that block web crawlers/scrapers, so I would like to know if it is possible to use Scrapy to obtain these messages, and if it isn't possible, what are some alternatives I can use? I understand that I can click on the "Email chat" option for each chat, but this might not be feasible if I want to obtain a large amount of data, not just from my own chats, but from other people who are willing to let me use their chats for the project.

推荐答案

我认为 WhatsApp 不会阻止爬虫和抓取工具.您只能访问您的 web.whatsapp.com.您将如何处理您的消息由您决定.当我编写代码来读/写 WhatsApp 消息时,我使用了 Selenium WebDriver,它可以完全自动化任何浏览器操作.它对 WhatsUpp 来说太稳定了.这不是完全自动化,当然是二维码.如果您按 F12 并转到 Web 浏览器中的网络"选项卡,您会注意到 XHR 数据包中包含消息.您可以在滚动或打开人员期间加载新消息时看到它.它看起来像字节数据.所以我不认为你可以为此编写 Scrapy 代码.

I think WhatsApp do not block crawlers and scrapers. You have access only to your web.whatsapp.com. It's your metter what will you do with your messages. When I write code to read/write WhatsApp messages I used Selenium WebDriver, which can fully automate any browser actions. It worked too stable for WhatsUpp. It was not fully automation, be course of QR code. If you press F12 and go to "network" tab in web browser, you will notice XHR packets with messages inside. You can see it when you load new messages during scrolling or opening person. It look like byte data. So I do not think you can write Scrapy code for that.

这篇关于是否可以使用 Scrapy 从 Whatsapp Web 中抓取所有短信?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆