使用Nodejs实时抓取聊天记录 [英] Realtime scrape a chat using Nodejs

查看:85
本文介绍了使用Nodejs实时抓取聊天记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是在NodeJs上构建一个 scraping 应用程序,并从该应用程序实时监视聊天并将某些消息存储在任何数据库中?

What I want to do is to build a scraping application on NodeJs from which it m*onitors on Realtime a chat and store certain messages within any database?

我想要做的是以下操作,我想从聊天平台流中捕获数据,从而捕获一些有用的信息,以帮助那些正在进行流服务的人;

What I am wanting to do is the following, I am wanting to capture data from the chat platforms streaming, and thus capture some useful information that helps those who are doing the streaming service;

但是我不知道如何开始使用NodeJs

到目前为止,我一直能够捕获消息的数据,但是我无法实时监控新消息,在这方面有什么帮助吗?

What I have been able to do so far has been to capture the data of the messages, however I can not monitor in realtime new messages, any help in this regard?

到目前为止我做了什么:

What i did so far:

server.js

var express     = require('express');
var fs          = require('fs');
var request     = require('request');
var puppeteer = require('puppeteer');
var app         = express();

app.get('/', function(req, res){

    url = 'https://www.nimo.tv/live/6035521326';

    (async() => {
        
        const browser = await puppeteer.launch();

        const page = await browser.newPage();
        await page.goto(url);
        await page.waitForSelector('.msg-nickname');

        const messages = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('.msg-nickname'))
                    .map(item => item.innerText);
        });

        console.log(messages);
    })();
    res.send('Check your console!')

});

app.listen('8081') 
console.log('Magic happens on port 8081'); 
exports = module.exports = app;

通过此操作,我获得了用户昵称"消息并将其放入一个数组中,我想让我的应用程序运行并在聊天中完成输入后自动接收新的昵称,对这个挑战有帮助吗?

With this, I get the Nicknames of Users messages and put in an Array, I want to make my application run and receive new Nicknames automatically when the input is done in the chat, Any help with this challenge?

也许我将需要使用WebSocket

Maybe I'm going to need to use WebSocket

推荐答案

如果可能的话,您应该使用API​​,这是在聊天.尝试打开Chrome开发者工具中的网络"标签,并尝试找出正在发生的网络请求.

If possible you should use the API, the chat is using. Try to open the network tab inside the Chrome developer tools and try to figure out which network requests are happening.

如果不可能,则可以使用 MutationObserver 可以监视DOM的变化.通过 page.exposeFunction ,然后聆听相关更改.然后,您可以将获取的数据插入数据库中.

If that is not possible, you can use a MutationObserver to monitor DOM changes. Expose a function via page.exposeFunction and then listen to relevant changes. You can then insert the obtained data into a database.

以下是一些示例代码,可以帮助您入门:

Here is some example code to get you started:

const puppeteer = require('puppeteer');
const { Client } = require('pg');

(async () => {
    const client = new Client(/* ... */);
    await client.connect(); // connect to database

    const browser = await puppeteer.launch({ headless: false });
    const [page] = await browser.pages();

    // call a handler when a mutation happens
    async function mutationListener(addedText) {
        console.log(`Added text: ${addedText}`);

        // insert data into database
        await client.query('INSERT INTO users(text) VALUES($1)', [addedText]);
    }
    page.exposeFunction('mutationListener', mutationListener);

    await page.goto('http://...');
    await page.waitForSelector('.msg-nickname');

    await page.evaluate(() => {
        // wait for any mutations inside a specific element (e.g. the chatbox)
        const observerTarget = document.querySelector('ELEMENT-TO-MONITOR');
        const mutationObserver = new MutationObserver((mutationsList) => {
            // handle change by checking which elements were added and which were deleted
            for (const mutation of mutationsList) {
                const { removedNodes, addedNodes } = mutation;
                // example: pass innerText of first added element to our mutationListener
                mutationListener(addedNodes[0].innerText);
            }
        });
        mutationObserver.observe( // start observer
            observerTarget,
            { childList: true }, // wait for new child nodes to be added/removed
        );
    });
})();

这篇关于使用Nodejs实时抓取聊天记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆