如何使用 Node.js 解析 HTML/XML 文档? [英] How to parse HTML/XML documents with Node.js?

查看:20
本文介绍了如何使用 Node.js 解析 HTML/XML 文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 generatePNG 函数的 editor.html:

I have an editor.html that contains generatePNG function:

  <!DOCTYPE html> 
<html> 
<head> 
    <meta charset="UTF-8"> 
    <title>Diagram</title> 

    <script type="text/javascript" src="lib/jquery-1.8.1.js"></script> 
//    <!-- I use many resources -->
<script></script> 

    <script> 

        function generatePNG (oViewer) { 
            var oImageOptions = { 
                includeDecoratorLayers: false, 
                replaceImageURL: true 
            }; 

            var d = new Date(); 
            var h = d.getHours(); 
            var m = d.getMinutes(); 
            var s = d.getSeconds(); 

            var sFileName = "diagram" + h.toString() + m.toString() + s.toString() + ".png"; 

            var sResultBlob = oViewer.generateImageBlob(function(sBlob) { 
                b = 64; 
                var reader = new window.FileReader(); 
                reader.readAsDataURL(sBlob); 
                reader.onloadend = function() { 
                    base64data = reader.result; 
                    var image = document.createElement('img'); 
                    image.setAttribute("id", "GraphImage"); 
                    image.src = base64data; 
                    document.body.appendChild(image); 
                } 

            }, "image/png", oImageOptions); 
            return sResult; 
        } 

    </script> 


</head> 

<body > 
    <div id="diagramContainer"></div> 
</body> 
</html>

我想访问 DOM 并使用 Node.js 获取 image.src.我发现我可以使用cheerio或jsdom.

I want to access the DOM and get image.src using Node.js. I find that I can work with cheerio or jsdom.

我从这个开始:

var cheerio = require('cheerio'),
    $ = cheerio.load('editor.html');

但我没有找到如何访问和获取 image.src.

But I don't find how to access and get image.src.

推荐答案

问题在于,将 html 文件加载到cheerio(或任何其他节点模块)不会像浏览器那样处理 HTML.不会像在浏览器中那样加载和/或处理资产(例如样式表、图像和 JavaScript).

The problem is that by loading an html file into cheerio (or any other node module) will not process the HTML as a browser does. Assets (such as stylesheets, images and javascripts) will not be loaded and/or processed as they would be within a browser.

虽然 node.js 和现代网络浏览器都具有相同(或相似)的 javascript 引擎,但是浏览器添加了许多额外的东西,例如 windowDOM (document) 等Node.js 没有这些概念,所以没有 window.FileReader 也没有 document.createElement.

While both node.js and modern webbrowsers have the same (or similar) javascript engines, however a browser adds a lot of additional stuff, such as window, the DOM (document), etc. Node.js does not have these concepts, so there is no window.FileReader nor document.createElement.

如果图像完全是在没有用户交互的情况下创建的(您的代码示例神奇地"接收了 sBlob 参数,该参数似乎是一个类似 data:<type>;<encoding> 的字符串;,<data>) 您可以在服务器上使用所谓的无头浏览器,PhantomJS 似乎最受欢迎这些日子.再说一次,如果创建 sBlob 不需要用户交互,你可能最好使用纯 node.js 解决方案,例如如何在 Node 中解析数据 URL?.

If the image is created entirely without user interaction (your code sample 'magically' receives the sBlob argument wich appears to be a string like data:<type>;<encoding>,<data>) you could use a so called headless browser on the server, PhantomJS seems most popular these days. Then again, if no user interaction is required for the creation of the sBlob, you are probably better off using a pure node.js solution, e.g. How do I parse a data URL in Node?.

如果创建 sBlob 需要某种用户交互,并且您需要将其存储在服务器上,您可以使用与上述几乎相同的解决方案,只需发送 sBlob 使用 Ajax 或 websocket 发送到服务器,将 sBlob 处理为图像并(可选)返回可在何处找到图像的 URL.

If there is some kind of user interaction required to create the sBlob, and you need to store it on a server, you can use pretty much the same solution as mentioned by simply sending the sBlob to the server using Ajax or a websocket, processing the sBlob into an image and (optionally) returning the URL where to find the image.

这篇关于如何使用 Node.js 解析 HTML/XML 文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆