在 html 解析期间以编程方式分配变量值 [英] Assigning variable values programmatically during html parsing
问题描述
我正在扩展一个上一个问题 关于 html 解析,包括一个关于空白值的问题.假设我从 HTML 中提取的某些变量有空值.有多个变量可能为空,所以我想要一种系统的方法来处理它们(循环或函数).
这个问题实际上是关于以编程方式分配变量的,我发现的大部分信息都建议避免使用 eval(parse(text
,但我不知道如何在此替换它案例.我有以下 HTML:
html <-'<!DOCTYPE html><身体><div class="foo"><div class="fooname">第一个 foo 的名称</div><div class="abc">ABC 值仅在此处显示</div><span>第一个 foo 中的第一个跨度</span><span>第一个 foo 中的第二个跨度</span>
<div class="foo"><div class="fooname">第二个foo的名称</div><span>第二个 foo 中只有 1 个跨度</span>
</html>'
这里是解析:
库(XML)html.parse <- htmlParse(html)myFunc <- 函数(x){fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- xpathSApply(x, "./span", fun = xmlValue)df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}结果 <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)# data.frame 中的错误(fooname, abc, Span1 = span[1], Span2 = span[2]) :# 参数意味着不同的行数:1, 0
这是我尝试的修复.
myFunc <- function(x){fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- xpathSApply(x, "./span", fun = xmlValue)dfvars <- c("fooname", "abc", "span")#我想我在`apply`中分配变量有同样的问题#函数,对吧?for(dfvars 中的 var){if(length(eval(parse(text = var))) == 0) {cat("No ", var, " 为该组找到值.\n")#注意列表"类:cat("Class of ", var, " 是:", class(eval(parse(text = var))), "\n")cat("放置一个 NA.\n")#这一行给出了一个错误:赋值(eval(parse(text = var)), as.character(NA))cat(", var, " 的新值: ", eval(parse(text = var)), "\n")cat("新长度", var, " : ", length(eval(parse(text = var))), "\n")cat("新类", var, " : ", class(eval(parse(text = var))), "\n")}}df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}结果 <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)# 赋值错误(eval(parse(text = var)), as.character(NA)) :# 第一个参数无效
请注意,这里的 for
循环(或 apply
函数,如果我这样做的话)位于第二个嵌套层中.在我真正的项目中,它在第三个;外层在一系列页面中打开.如果可能的话,最好避免进入第三层,但我也想让事情简单明了.
您可以定义自己的 xpathSApply
函数来测试 list()
:
myXpathSApply <- function(x, ...){y <- xpathSApply(x, ...)if(length(y) > 0){y}else{NA}}
并在使用 xpathSApply
的地方使用此函数:
myFunc <- function(x){fooname <- myXpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- myXpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- myXpathSApply(x, "./span", fun = xmlValue)df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}
I am expanding a previous question about html parsing to include a question about blank values. Suppose I have empty values for certain variables that I am pulling from the HTML. There are multiple variables that could be empty, so I want a systematic approach to handling them (loop or function).
This question really is about assigning variables programmatically, and most of the information I have found suggests avoiding the use of eval(parse(text
, but I'm not sure how to replace it in this case. I have the following HTML:
html <-
'<!DOCTYPE html>
<html>
<body>
<div class="foo">
<div class="fooname">Name of 1st foo</div>
<div class="abc">ABC value only present here</div>
<span>1st span in 1st foo</span>
<span>2nd span in 1st foo</span>
</div>
<div class="foo">
<div class="fooname">Name of 2nd foo</div>
<span>Only 1 span in 2nd foo</span>
</div>
</body>
</html>'
Here is the parsing:
library(XML)
html.parse <- htmlParse(html)
myFunc <- function(x){
fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- xpathSApply(x, "./span", fun = xmlValue)
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}
result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)
# Error in data.frame(fooname, abc, Span1 = span[1], Span2 = span[2]) :
# arguments imply differing number of rows: 1, 0
Here is my attempted fix.
myFunc <- function(x){
fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- xpathSApply(x, "./span", fun = xmlValue)
dfvars <- c("fooname", "abc", "span")
#I think I have the same issue about assigning a variable in `apply`
#functions, right?
for(var in dfvars) {
if(length(eval(parse(text = var))) == 0) {
cat("No ", var, " value found for this group.\n")
#Note the "list" class:
cat("Class of ", var, " is: ", class(eval(parse(text = var))), "\n")
cat("Placing an NA.\n")
#This line gives an error:
assign(eval(parse(text = var)), as.character(NA))
cat("new value of ", var, " : ", eval(parse(text = var)), "\n")
cat("New length of ", var, " : ", length(eval(parse(text = var))), "\n")
cat("New class of ", var, " : ", class(eval(parse(text = var))), "\n")
}
}
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}
result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)
# Error in assign(eval(parse(text = var)), as.character(NA)) :
# invalid first argument
Note that while here the for
loop (or apply
function if I do it that way) is in the second nesting layer. In my real project, it's in the third; the outer layer opens up each in a series of pages. It would be good to avoid going into a third level if possible, but I also want to keep things straightforward.
You could define your own xpathSApply
function that tests for a list()
:
myXpathSApply <- function(x, ...){
y <- xpathSApply(x, ...)
if(length(y) > 0){y}else{NA}
}
and use this function where you use xpathSApply
:
myFunc <- function(x){
fooname <- myXpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
abc <- myXpathSApply(x, "./div[@class='abc']", fun = xmlValue)
span <- myXpathSApply(x, "./span", fun = xmlValue)
df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
return(df)
}
这篇关于在 html 解析期间以编程方式分配变量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!