在 html 解析期间以编程方式分配变量值 [英] Assigning variable values programmatically during html parsing

查看:36
本文介绍了在 html 解析期间以编程方式分配变量值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在扩展一个上一个问题 关于 html 解析,包括一个关于空白值的问题.假设我从 HTML 中提取的某些变量有空值.有多个变量可能为空,所以我想要一种系统的方法来处理它们(循环或函数).

这个问题实际上是关于以编程方式分配变量的,我发现的大部分信息都建议避免使用 eval(parse(text,但我不知道如何在此替换它案例.我有以下 HTML:

html <-'<!DOCTYPE html><身体><div class="foo"><div class="fooname">第一个 foo 的名称</div><div class="abc">ABC 值仅在此处显示</div><span>第一个 foo 中的第一个跨度</span><span>第一个 foo 中的第二个跨度</span>

<div class="foo"><div class="fooname">第二个foo的名称</div><span>第二个 foo 中只有 1 个跨度</span>

</html>'

这里是解析:

库(XML)html.parse <- htmlParse(html)myFunc <- 函数(x){fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- xpathSApply(x, "./span", fun = xmlValue)df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}结果 <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)# data.frame 中的错误(fooname, abc, Span1 = span[1], Span2 = span[2]) :# 参数意味着不同的行数:1, 0

这是我尝试的修复.

myFunc <- function(x){fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- xpathSApply(x, "./span", fun = xmlValue)dfvars <- c("fooname", "abc", "span")#我想我在`apply`中分配变量有同样的问题#函数,对吧?for(dfvars 中的 var){if(length(eval(parse(text = var))) == 0) {cat("No ", var, " 为该组找到值.\n")#注意列表"类:cat("Class of ", var, " 是:", class(eval(parse(text = var))), "\n")cat("放置一个 NA.\n")#这一行给出了一个错误:赋值(eval(parse(text = var)), as.character(NA))cat(", var, " 的新值: ", eval(parse(text = var)), "\n")cat("新长度", var, " : ", length(eval(parse(text = var))), "\n")cat("新类", var, " : ", class(eval(parse(text = var))), "\n")}}df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}结果 <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)# 赋值错误(eval(parse(text = var)), as.character(NA)) :# 第一个参数无效

请注意,这里的 for 循环(或 apply 函数,如果我这样做的话)位于第二个嵌套层中.在我真正的项目中,它在第三个;外层在一系列页面中打开.如果可能的话,最好避免进入第三层,但我也想让事情简单明了.

解决方案

您可以定义自己的 xpathSApply 函数来测试 list():

myXpathSApply <- function(x, ...){y <- xpathSApply(x, ...)if(length(y) > 0){y}else{NA}}

并在使用 xpathSApply 的地方使用此函数:

myFunc <- function(x){fooname <- myXpathSApply(x, "./div[@class='fooname']", fun = xmlValue)abc <- myXpathSApply(x, "./div[@class='abc']", fun = xmlValue)跨度 <- myXpathSApply(x, "./span", fun = xmlValue)df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])回报(df)}

I am expanding a previous question about html parsing to include a question about blank values. Suppose I have empty values for certain variables that I am pulling from the HTML. There are multiple variables that could be empty, so I want a systematic approach to handling them (loop or function).

This question really is about assigning variables programmatically, and most of the information I have found suggests avoiding the use of eval(parse(text, but I'm not sure how to replace it in this case. I have the following HTML:

html <- 
'<!DOCTYPE html>
<html>
    <body>
        <div class="foo">
            <div class="fooname">Name of 1st foo</div>
            <div class="abc">ABC value only present here</div>
            <span>1st span in 1st foo</span>
            <span>2nd span in 1st foo</span>
        </div>

        <div class="foo">
            <div class="fooname">Name of 2nd foo</div>
            <span>Only 1 span in 2nd foo</span>
        </div>
    </body>
</html>'

Here is the parsing:

library(XML)

html.parse <- htmlParse(html)

myFunc <- function(x){
    fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
    abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
    span <- xpathSApply(x, "./span", fun = xmlValue)

    df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
    return(df)
}

result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)

#  Error in data.frame(fooname, abc, Span1 = span[1], Span2 = span[2]) : 
#   arguments imply differing number of rows: 1, 0 

Here is my attempted fix.

myFunc <- function(x){
    fooname <- xpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
    abc <- xpathSApply(x, "./div[@class='abc']", fun = xmlValue)
    span <- xpathSApply(x, "./span", fun = xmlValue)


    dfvars <- c("fooname", "abc", "span")

    #I think I have the same issue about assigning a variable in `apply`
        #functions, right?

    for(var in dfvars) {

        if(length(eval(parse(text = var))) == 0) {
            cat("No ", var, " value found for this group.\n")

            #Note the "list" class:
            cat("Class of ", var, " is: ", class(eval(parse(text = var))), "\n")
            cat("Placing an NA.\n")

            #This line gives an error:
            assign(eval(parse(text = var)), as.character(NA))

            cat("new value of ", var, " : ", eval(parse(text = var)), "\n")
            cat("New length of ", var, " : ", length(eval(parse(text = var))), "\n")
            cat("New class of ", var, " : ", class(eval(parse(text = var))), "\n")

        }
    }

    df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
    return(df)
}

result <- getNodeSet(html.parse, "//div[@class='foo']", fun = myFunc)

#  Error in assign(eval(parse(text = var)), as.character(NA)) : 
#   invalid first argument 

Note that while here the for loop (or apply function if I do it that way) is in the second nesting layer. In my real project, it's in the third; the outer layer opens up each in a series of pages. It would be good to avoid going into a third level if possible, but I also want to keep things straightforward.

解决方案

You could define your own xpathSApply function that tests for a list():

myXpathSApply <- function(x, ...){
  y <- xpathSApply(x, ...)
  if(length(y) > 0){y}else{NA}
}

and use this function where you use xpathSApply:

myFunc <- function(x){
    fooname <- myXpathSApply(x, "./div[@class='fooname']", fun = xmlValue)
    abc <- myXpathSApply(x, "./div[@class='abc']", fun = xmlValue)
    span <- myXpathSApply(x, "./span", fun = xmlValue)

    df <- data.frame(fooname, abc, Span1 = span[1], Span2 = span[2])
    return(df)
}

这篇关于在 html 解析期间以编程方式分配变量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
其他开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆