将DOM操作应用于HTML并保存结果? [英] Applying DOM Manipulations to HTML and saving the result?

查看:130
本文介绍了将DOM操作应用于HTML并保存结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大约100个静态HTML页面,我想应用一些DOM操作。他们都遵循相同的HTML结构。我想对每个这些文件应用一些DOM操作,然后保存生成的HTML。

I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.

这些是我要应用的操作:

These are the manipulations I want to apply:

# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
  $("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML

第一行( .wrap())我可以轻松地找到和替换,但是当我需要确定元素的计算高度时,它会变得棘手,这可以不容易被确定sans-JavaScript。有没有人知道我能如何实现这个目标?

The first line (.wrap()) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.

谢谢!

推荐答案

虽然第一部分可以使用正则表达式或更完整的DOM实现在文本模式中解决JavaScript,对于第二部分(高度计算),您将需要一个真正的,完整的浏览器或无头引擎,如 PhantomJS

While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.

PhantomJS主页


PhantomJS是一个用于打包和嵌入WebKit的命令行工具。
从字面上来说,它像任何其他基于WebKit的Web浏览器一样,除了
没有显示到屏幕(因此,术语headless)。在
之外,PhantomJS可以使用
JavaScript API进行控制或脚本化。

PhantomJS is a command-line tool that packs and embeds WebKit. Literally it acts like any other WebKit-based web browser, except that nothing gets displayed to the screen (thus, the term headless). In addition to that, PhantomJS can be controlled or scripted using its JavaScript API.






下面是一个示意图说明(我承认没有测试)。


A schematic instruction (which I admit is not tested) follows.

在你的修改脚本(比如说,修改-html-file.js )打开一个HTML页面,修改它的DOM树和 console.log 根元素的HTML:

In your modification script (say, modify-html-file.js) open an HTML page, modify it's DOM tree and console.log the HTML of the root element:

var page = new WebPage();

page.open(encodeURI('file://' + phantom.args[0]), function (status) {
    if (status === 'success') {
        var html = page.evaluate(function () {
            // your DOM manipulation here
            return document.documentElement.outerHTML;
        });
        console.log(html);
    }
    phantom.exit();
});

接下来,通过将脚本的输出重定向到一个文件来保存新的HTML:

Next, save the new HTML by redirecting your script's output to a file:

#!/bin/bash

mkdir modified
for i in *.html; do
    phantomjs modify-html-file.js "$1" > modified/"$1"
done

这篇关于将DOM操作应用于HTML并保存结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆