自动编辑html的代码 [英] Codes to automatically edit html
问题描述
我有一个 HTML 文件列表.我经常需要做的是在 R 中打开它们,找到标题部分的结尾 </head>
并手动粘贴一组给定的代码,例如:
I have a list of HTML files. What I regularly need to do is open them in R find the end of the header section </head>
and manually paste a given set of codes like say:
<p>此报告可帮助您找到数据中的关键问题</p>
有人能帮我写一个代码,自动找到标题标签的结尾并粘贴给定的一组行吗?
Can someone help me in writing a code that automatically finds the end of the header tag and pastes the given set of lines?
同样的练习可能在其他一些工具中是可行的,但请专门为 R 提供帮助
The same exercise may be doable in some other tool, but please assist me for R specifically
推荐答案
我认为您想使用 XML
页面并了解 xpath 查询,这有助于搜索 HTML 文件.假设您将所有文件下载到 some_dir
,并且您想解析文本并仅找到 <p>
位于 <div class="come_class">
.
I think you want to use the XML
page and learn about xpath queries, which help to search through HTML files. Let's say you downloaded all your files to some_dir
, and you wanted to parse the text and find only <p>
that were in <div class="come_class">
.
library(XML)
files <- list.files("some_dir", full.names = TRUE, include.dirs = TRUE)
docs <- lapply(files, htmlParse)
text.nodes <-
lapply(docs, function(doc)
getNodeSet(doc, '//div[@class="some_div"]//p'))
text.value <-
lapply(text.nodes, function(node)
sapply(node, xmlValue))
这篇关于自动编辑html的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!