自动编辑html的代码 [英] Codes to automatically edit html

查看:28
本文介绍了自动编辑html的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 HTML 文件列表.我经常需要做的是在 R 中打开它们,找到标题部分的结尾 </head> 并手动粘贴一组给定的代码,例如:

I have a list of HTML files. What I regularly need to do is open them in R find the end of the header section </head> and manually paste a given set of codes like say:

<p>此报告可帮助您找到数据中的关键问题</p>

有人能帮我写一个代码,自动找到标题标签的结尾并粘贴给定的一组行吗?

Can someone help me in writing a code that automatically finds the end of the header tag and pastes the given set of lines?

同样的练习可能在其他一些工具中是可行的,但请专门为 R 提供帮助

The same exercise may be doable in some other tool, but please assist me for R specifically

推荐答案

我认为您想使用 XML 页面并了解 xpath 查询,这有助于搜索 HTML 文件.假设您将所有文件下载到 some_dir,并且您想解析文本并仅找到 <p> 位于 <div class="come_class">.

I think you want to use the XML page and learn about xpath queries, which help to search through HTML files. Let's say you downloaded all your files to some_dir, and you wanted to parse the text and find only <p> that were in <div class="come_class">.

library(XML)
files <- list.files("some_dir", full.names = TRUE, include.dirs = TRUE)
docs <- lapply(files, htmlParse)

text.nodes <- 
  lapply(docs, function(doc) 
    getNodeSet(doc, '//div[@class="some_div"]//p'))

text.value <- 
  lapply(text.nodes, function(node)
            sapply(node, xmlValue))

这篇关于自动编辑html的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆