R:站点时如何获取父属性和节点值? [英] R: How to get parent attributes and node values at the site time?

查看:26
本文介绍了R:站点时如何获取父属性和节点值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的 html 和 R 代码,需要将每个节点值与其父 ID 关联到 data.frame 中.每个人都有一些不同的信息.

example <-"

<div class='phone'>555-5555</div><div class='email'>jhon@123.com</div>

<div class='person' id='2'><div class='phone'>123-4567</div><div class='email'>maria@gmail.com</div>

<div class='person' id='3'><div class='phone'>987-6543</div><div class='age'>32</div><div class='city'>纽约</div>

"doc = htmlTreeParse(例如,useInternalNodes = T)值 <- xpathSApply(doc, "//*[@class='person']/div", xmlValue)变量 <- xpathSApply(doc, "//*[@class='person']/div", xmlGetAttr, 'class')id <- xpathSApply(doc, "///*[@class='person']", xmlGetAttr, 'id')# 问题:创建一个data.frame(id,variables,values)

使用 xpathSApply(),我还可以获得电话、电子邮件和年龄值以及人员属性 (id).但是,这些信息是孤立的,我需要将它们引用到正确的 data.frame 变量和正确的人.在我的真实数据中,有很多不同的信息,因此命名每个变量的过程必须是自动的.

我的目标是创建一个像这样的 data.frame,将每个 id 与其正确的数据相关联.

 id 变量值1 1 电话 555-55552 1 电子邮件 jhon@123.com3 2 电话 123-45674 2 电子邮件 maria@gmail.com5 3 电话 987-65436 3 年龄 327 3 城市纽约

我相信我必须创建一个在 xpathSApply 中使用的函数,它会同时获取人员电话和人员 ID,因此它们是相关的,但我没有到目前为止,任何成功.

有人可以帮我吗?

解决方案

总的来说,这并不容易:

idNodes <- getNodeSet(doc, "//div[@id]")ids <- lapply(idNodes, function(x) xmlAttrs(x)['id'])值 <- lapply(idNodes, xpathApply, path = './div[@class]', xmlValue)属性 <- lapply(idNodes, xpathApply, path = './div[@class]', xmlAttrs)do.call(rbind.data.frame, mapply(cbind, ids, values, attributes))V1 V2 V31 1 555-5555 电话2 1 jhon@123.com 邮箱3 2 123-4567 电话4 2 maria@gmail.com 电子邮件5 3 987-6543 电话6 3 32 年龄7 3 纽约市

假设它们嵌套在具有关联的 iddiv 中,以上将为您提供属性和值对.

更新:如果要将其包装在 xpathApply 类型调用中

utilFun <- function(x){id <- xmlGetAttr(x, 'id')值 <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlValue)属性 <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlAttrs)data.frame(id = id,属性 = 属性,值 = 值,stringsAsFactors = FALSE)}res <- xpathApply(doc, '//div[@id]', utilFun)do.call(rbind, res)id 属性值1 1 电话 555-55552 1 电子邮件 jhon@123.com3 2 电话 123-45674 2 电子邮件 maria@gmail.com5 3 电话 987-65436 3 年龄 327 3 城市纽约

I have a html and a R code like these and need to relate each node value to its parent id in a data.frame. There are some different information available for each person.

example <- "<div class='person' id='1'>
<div class='phone'>555-5555</div>
<div class='email'>jhon@123.com</div>
</div>
<div class='person' id='2'>
<div class='phone'>123-4567</div>
<div class='email'>maria@gmail.com</div>
</div>
<div class='person' id='3'>
<div class='phone'>987-6543</div>
<div class='age'>32</div>
<div class='city'>New York</div>
</div>"

doc = htmlTreeParse(example, useInternalNodes = T)

values <- xpathSApply(doc, "//*[@class='person']/div", xmlValue)
variables <- xpathSApply(doc, "//*[@class='person']/div", xmlGetAttr, 'class')
id <- xpathSApply(doc, "//*[@class='person']", xmlGetAttr, 'id')

# The problem: create a data.frame(id,variables,values)

With xpathSApply(), I can get phone, email, and age values as well as person attributes (id) too. However, those information come isolated and I need to reference them to the right data.frame variable and the right person. In my real data there are a lot of different information, so this process of naming each variable has to be automatic.

My goal is to create a data.frame like this relating each id to its proper data.

  id variables          values
1  1     phone        555-5555
2  1     email    jhon@123.com
3  2     phone        123-4567
4  2     email maria@gmail.com
5  3     phone        987-6543
6  3       age              32
7  3      city        New York

I believe I would have to create a function to use inside xpathSApply which would get at the same time the person phone and the person id, so they would be related, but I haven't had any success with that so far.

Can anyone help me?

解决方案

In general its not going to be easy:

idNodes <- getNodeSet(doc, "//div[@id]")
ids <- lapply(idNodes, function(x) xmlAttrs(x)['id'])
values <- lapply(idNodes, xpathApply, path = './div[@class]', xmlValue)
attributes <- lapply(idNodes, xpathApply, path = './div[@class]', xmlAttrs)
do.call(rbind.data.frame, mapply(cbind, ids, values, attributes))
  V1              V2    V3
1  1        555-5555 phone
2  1    jhon@123.com email
3  2        123-4567 phone
4  2 maria@gmail.com email
5  3        987-6543 phone
6  3              32   age
7  3        New York  city

The above will give you attribute and value pairs assumming they are nested in a div with an associated id.

UPDATE: if you want to wrap it in an xpathApply type call

utilFun <- function(x){
  id <- xmlGetAttr(x, 'id')
  values <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlValue)
  attributes <- sapply(xmlChildren(x, omitNodeTypes = "XMLInternalTextNode"), xmlAttrs)
  data.frame(id = id, attributes = attributes, values = values, stringsAsFactors = FALSE)
}
res <- xpathApply(doc, '//div[@id]', utilFun)
do.call(rbind, res)
  id attributes          values
1  1      phone        555-5555
2  1      email    jhon@123.com
3  2      phone        123-4567
4  2      email maria@gmail.com
5  3      phone        987-6543
6  3        age              32
7  3       city        New York

这篇关于R:站点时如何获取父属性和节点值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
其他开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆