如何使用html5ever解析页面,修改DOM并将其序列化? [英] How do I parse a page with html5ever, modify the DOM, and serialize it?

查看:162
本文介绍了如何使用html5ever解析页面,修改DOM并将其序列化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析网页,在特定位置插入锚点,然后再次渲染修改后的DOM,以便为 Dash .这可能吗?

I would like to parse a web page, insert anchors at certain positions and render the modified DOM out again in order to generate docsets for Dash. Is this possible?

从html5ever中包含的示例中,我可以看到如何读取HTML文件并执行穷人的HTML输出,但是我不明白如何修改检索到的RcDom对象.

From the examples included in html5ever, I can see how to read an HTML file and do a poor man's HTML output, but I don't understand how I can modify the RcDom object I retrieved.

我希望看到一个片段,将锚元素(<a name="foo"></a>)插入到RcDom.

I would like to see a snippet inserting an anchor element (<a name="foo"></a>) to an RcDom.

注意:这是一个关于Rust和html5ever的问题……我知道如何用其他语言或更简单的HTML解析器来做到这一点.

Note: this is a question regarding Rust and html5ever specifically ... I know how to do it in other languages or simpler HTML parsers.

推荐答案

以下是一些用于解析文档,向链接添加achor并打印新文档的代码:

Here is some code that parses a document, adds an achor to the link and prints the new document:

extern crate html5ever;

use html5ever::{ParseOpts, parse_document};
use html5ever::tree_builder::TreeBuilderOpts;
use html5ever::rcdom::RcDom;
use html5ever::rcdom::NodeEnum::Element;
use html5ever::serialize::{SerializeOpts, serialize};
use html5ever::tendril::TendrilSink;

fn main() {
    let opts = ParseOpts {
        tree_builder: TreeBuilderOpts {
            drop_doctype: true,
            ..Default::default()
        },
        ..Default::default()
    };
    let data = "<!DOCTYPE html><html><body><a href=\"foo\"></a></body></html>".to_string();
    let dom = parse_document(RcDom::default(), opts)
        .from_utf8()
        .read_from(&mut data.as_bytes())
        .unwrap();

    let document = dom.document.borrow();
    let html = document.children[0].borrow();
    let body = html.children[1].borrow(); // Implicit head element at children[0].

    {
        let mut a = body.children[0].borrow_mut();
        if let Element(_, _, ref mut attributes) = a.node {
            attributes[0].value.push_tendril(&From::from("#anchor"));
        }
    }

    let mut bytes = vec![];
    serialize(&mut bytes, &dom.document, SerializeOpts::default()).unwrap();
    let result = String::from_utf8(bytes).unwrap();
    println!("{}", result);
}

这将打印以下内容:

<html><head></head><body><a href="foo#anchor"></a></body></html>

如您所见,我们可以通过children属性浏览子节点.

As you can see, we can navigate through the child nodes via the children attribute.

我们可以更改Element属性向量中存在的属性.

And we can change an attribute present in the vector of attributes of an Element.

这篇关于如何使用html5ever解析页面,修改DOM并将其序列化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆