如何将javascript应用于html模拟浏览器 [英] How to apply javascript to html simulating a browser

查看:68
本文介绍了如何将javascript应用于html模拟浏览器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在Internet上搜索了如何创建"一个简单的无头浏览器,因为我很想知道浏览器在内部如何工作.我想实现一个简单的无头浏览器.

I've already searched on the Internet how to "create" a simple headless browser, because I was interested to know how does a Browser works internally. I'd like to implement a simple headless-browser.

我的意思是:假设您有一个html字符串和一个javascript字符串,这都是由于对服务器的HttpRequest的结果.如何将javascript应用于html字符串?

What I mean is: suppose you have an html string, and a javascript string, both as a result of a HttpRequest to the server; how can I apply the javascript into the html string?

例如:我向X服务器请求了html源文件,并且在响应中获得了以下信息:

For example: I requested to an X server the html source file, and I obtained in the response this:

<html>
    <head>
         <script type="text/javascript" src="javascript.js">
    </head>
    <body>
        <p id="content"></p>
    <body>
</html>

然后,我请求javascript.js文件,并获得此文件:

Then, I request the javascript.js file, and I obtain this:

document.getElementById("content").text = "Hello";

如何将javascript.js文件的内容应用于html文件?我应该遵循的步骤与此类似?

How can I apply the content of the javascript.js file into the html file? The steps I should follow is something similar to this?:

  1. 将html源代码解析为Javascript DOM元素
  2. 将javascript应用于DOM

我想用Java,Scala或Node.js做到这一点.Idk,如果您了解主要思想...我是拉丁美洲人,我的英语不太好.抱歉如果不明白,请在评论中让我知道,我将编辑我的帖子.

I'd like to do it with Java, Scala or Node.js. Idk if you understand the main idea... im latin american, and my english isn't so good. Sorry for that. If dont understand, please let me know in the comments and I'll edit my post.

换句话说,我想做的事就像是这样的伪方法/函数(用伪代码):

what I would like to do, in other words, is like a pseudo method/function like this (in pseudocode):

function applu(html, js){
    // Apply js into html
}

推荐答案

如果您正在使用无头浏览器,我确定您知道 phantomsJS .PhantomJS是基于Apple的 webkit 浏览器引擎的无头浏览器.

If you're looking a headless browser I'm sure you're aware of phantomsJS. PhantomJS is a headless browser based off apple's webkit browser engine.

您在这里要求很多.您需要:

You're asking for a lot here. You need:

  1. 用于运行javascript的javascript运行时(例如v8).
  2. 一个网络引擎,使html及其定义的文档对象模型栩栩如生.

这两种方法都需要数百万行代码来执行.

Both of those things take millions of lines of code to execute.

我的建议是将您的程序与PhantomJS集成.PhantomJS是一个无头Web浏览器和javascript环境.如果您使用的是Scala,请启动phantomjs的子进程,并通过std i/o向其发送消息.PhantomJS的 JS 部分意味着您可以通过其javascript API使用它,因此,您还必须编写一个js脚本来处理来自std i/o的消息.它没有记录,但是phantomjs具有 system.std.in system.std.out api来处理消息.

My recommendation is integrate your program with PhantomJS. PhantomJS is a headless webbrowser and a javascript environment. If you're using scala, start a child process of phantomjs and send messages to it via std i/o. The JS part of PhantomJS means that you use it via it's javascript API, so additionally you'd have to write a js script to handle the messages coming in from std i/o. It's undocumented but phantomjs has a system.std.in and system.std.out apis to handle the messages.

要使它正常工作,需要大量的工作和JVM之外的许多额外资源.我看到您正在使用scala,因此可以使用 jsoup 来解析和修改HTML文档的简单解决方案,但是您必须使用scala(或java)进行转换.

That's a lot of work and a lot of extra resources outside of the JVM to get it work. I saw that you're using scala so you could go with a simpler solution using jsoup to parse and modify the HTML document, however you would have to do the transformations using scala (or java).

实际上,现在考虑到这一点,您应该将 jsdom 与nodejs配对使用.JSm实现了dom API,而没有实际呈现它,这可能正是您所需要的.jsdom是为无头的nodejs制作的.如果您想同时使用scala和node,还可以使用node的std i/o并让它向JVM发送消息或从JVM发送消息.

Actually, now that I think about it, you should use jsdom paired with nodejs. JSDom implements the dom API without actually rendering it which might be what you need. jsdom is made for nodejs which is headless. You can also use node's std i/o and have it send messages to and from the JVM if you wanted to use both scala and node.

这是使用 jsdom 评估javascript并修改html的概念证明.这是一个非常简单的解决方案,对于给定的任务(这是一项艰巨的任务),它是最节省资源的.

Here is a proof of concept to using jsdom to evaluate the javascript and modify the html. It's a really simple solution and it is the most resource efficient for the given task (and this is a hard task).

我用非常简单的概念证明为您制作了要点.要运行要点,请执行以下操作:

I made a gist for you with a very simple proof of concept. To run the gist do:

git clone https://gist.github.com/c8aef41ee27e5304e94f6a255b048f87.git apply-js-to-html
cd apply-js-to-html
npm install
node example.js

这是示例的内容:

const jsdom = require('jsdom');

module.exports = function (html, js) {
    return new Promise((resolve, reject) => {
        jsdom.env(html, (error, window) => {
            if (error) {
                reject(error);
            }
            try {
            (function evalInContext () {
                'use strict';
                const document = this.document;
                const window = this.window;
                eval(js);
                resolve(window.document.documentElement.innerHTML);
            }).call(window);
            } catch (e) {
                reject(e);
            }
        });
    });
}

这是正在使用的模块

const applu = require('./index');

const html = `
    <html>
        <head></head>
        <body>
            <p id="content"></p>
        <body>
    </html>
`;

const js = `document.getElementById("content").innerHTML = "Hello";`

applu(html, js).then(result => {
    console.log('input html: ', html);
    console.log('output html: ', result);
}).catch(err => console.error(error));

这是代码的输出:

input html:  
    <html>
        <head></head>
        <body>
            <p id="content"></p>
        <body>
    </html>

output html:  <head></head>
        <body>
            <p id="content">Hello</p>


</body>

jsdom 创建了一个无头的 window document 环境,该环境不会呈现任何内容.您可以使用 eval 在上下文中调用,使用 window 作为 this 值.我还声明了 document window 再次将要逃避的js将这些变量包含在范围内.

jsdom creates a headless window and document environment that doesn't render anything. You can use eval and call it in context using window as the this value. I've also declared document and window again the js that will be evaled will have those variables in scope.

这只是一个基本的POC,您可以自己解决所有细节.

This is a just a basic POC, you'll have iron out the details by yourself.

这篇关于如何将javascript应用于html模拟浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆