如何在R中递归构建未知深度的树 [英] How to recursively build a tree of unknown depth in R

查看:52
本文介绍了如何在R中递归构建未知深度的树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试存储抓取的媒体评论数据 获得 使用 R 中的 Christoph Gluc 的 data.tree 包通过 RSelenium、XML 和 JSON 转换为树.我面​​临的问题是,由于某些评论的发布方式,我事先不知道树的深度.主要站点是 Disqus,在那里评论的人可能会对文章发表评论,回复其他人的评论,而其他人直接回复评论.因此评论的深度是未知的.

I am trying to store scraped media comment data obtained via RSelenium, XML and JSON into a tree using Christoph Gluc's data.tree package in R. The problem I am facing is that I do not know beforehand the depth of the tree because of the way some of the comments are posted. The primary site is Disqus where people commenting might comment on the article, reply to other people's comments with yet other people replying to comments directly. Thus the depth of the comments is not known.

我的数据保存在一个列表列表中,其中每个列表代表一个 disqus 评论,如果评论没有子"评论,则可能包含 6 个元素,如果存在子评论,则可能包含 7 个元素.第七个元素将是一个或多个评论的另一个列表.以下结构给出了数据的概念:

My data is held in a list of lists where each list represents a disqus comment and might either contain 6 elements if the comment does not have "child" comments or 7 if child comments exist. The seventh element will be another list of one or more comments. The following structure gives an idea of the data:

newtree <- Node$new("article_Name")
post <- newtree$AddChild("Post ID_1")$
AddChild("Date")$
AddSibling("Poster")$
AddSibling("Disqus Name")$
AddSibling("Message")$
AddSibling("Num Children")$
parent$
AddChildNode(Node$new("Child Post ID_1"))$
  AddChild("Date")$
  AddSibling("Poster")$
  AddSibling("Disqus Name")$
  AddSibling("Message")$
  AddSibling("Num Children")$
  AddChildNode(Node$new("Child-Child Post ID_1"))$
    AddChild("Date")$
    AddSibling("Poster")$
    AddSibling("Disqus Name")$
    AddSibling("Message")$
    AddSibling("Num Children")$
  parent$
  parent$
  parent$
AddChildNode(Node$new("Child Post ID_2"))$
root$
AddChild("Post ID 2")
print(newtree)

我曾尝试使用循环创建上述内容,但显然我无法超越第一级儿童,因为我不知道儿童的儿童是否也可能有儿童.

I have tried creating the above using loops but I obviously cannot go beyond the first level of children as I do not know whether a child of a child might have a child as well.

我曾尝试寻找有关递归的帖子,但可以找到任何有关 R 的帖子,尽管在其他语言(例如 javascript)中也有很多.以下是我尝试创建树的代码,它确实看起来很难看,其中包含访问一些更深层次元素所需的所有 [[]].

I have tried looking for post on recursion but could locate any regarding R although there are quite a few in other languages such as javascript. The following is the code I tried to create the tree and it sure looks ugly with all the [[]] that are required to access some of the deeper elements.

commentTree <- Node$new("article_Name")
for (i in 1:length(appNodes)) {
i <- 1
post <- commentTree$AddChild(
    postData[[i]][1])$
    AddChild(postData[[i]][2])$
    AddSibling(postData[[i]][3])$
    AddSibling(postData[[i]][4])$
    AddSibling(postData[[i]][5])$
    AddSibling(postData[[i]][6])

while (postData[[i]][[6]] > 0) {
  for (j in 1 : length(postData[[i]][[7]])) {
     post$AddChildNode(Node$new(postData[[i]][[7]][[j]][1]))$
     AddChild(postData[[i]][[7]][[j]][2])$
     AddSibling(postData[[i]][[7]][[j]][3])$
     AddSibling(postData[[i]][[7]][[j]][4])$
     AddSibling(postData[[i]][[7]][[j]][5])$
     AddSibling(postData[[i]][[7]][[j]][6])
  }
}
print(commentTree)
}

对编写递归函数的任何帮助将不胜感激.谢谢.

Any help to write a recursive function would be greatly appreciated. Thanks.

编辑 - 添加了评论中发布的示例日期以提供清晰度

[[1]]
[[1]]$postId
[1] "2794864846"

[[1]]$date
[1] "Thursday, July 21, 2016 9:28 AM"

[[1]]$poster
[1] "Lucienne"

[[1]]$disqusUname
[1] "disqus_AEt1ZsgK9N"

[[1]]$message
[1] "200 hundred pilots for 7 planes? Wow each of them must work very long hours. "

[[1]]$numChildren
[1] 1

[[1]]$child
[[1]]$child[[1]]
[[1]]$child[[1]]$postId
[1] "2795010796"

[[1]]$child[[1]]$date
[1] "Thursday, July 21, 2016 11:50 AM"

[[1]]$child[[1]]$poster
[1] "Jesmond Tedesco Triccas"

[[1]]$child[[1]]$disqusUname
[1] "jesmondtedescotriccas"

[[1]]$child[[1]]$message
[1] "My thoughts exactly"

[[1]]$child[[1]]$numChildren
[1] 0

当我使用 tmpTree <- as.Node(postData) 从列表转换为树时,我获得了以下结果.可以使用 tmpTree$'1'$poster 访问我的树,给出Lucienne",tmpTree$'1'$child$'1'$poster 给出Jesmond Tedesco..".进行转换时是否可以将子节点名称设置为 postId 字段中的值?

When I used the tmpTree <- as.Node(postData) to convert from list to tree, I obtained the following. My tree can be accessed by using tmpTree$'1'$poster gives "Lucienne" and tmpTree$'1'$child$'1'$poster gives "Jesmond Tedesco..". Can the child node name be somehow set to the value in the postId field when doing the conversion?

我仍然坚持尝试实现递归方式来读取所有评论数据.

I'm also still stuck on trying to implement a recursive manner to read all of a comment's data.

     levelName
1 Root         
2  °--1        
3      °--child
4          °--1

编辑 - 添加了可重现的代码 此代码是带有子注释的注释.对于示例的长度,我深表歉意.这是一个与有其他孩子等的孩子评论的例子.

EDIT - Added a reproducible code This code is a comment with a child comment. I apologize for the length of the example. This is an example having comments with children who have other children etc.

    list(structure(list(postId = "2794968061", date = "Thursday, July 21, 2016 10:56 AM", 
    poster = "toni", disqusUname = "disqus_bujblK3zF5", message = "unbeleivable, to hear today's socialists condemning workers for trying to organise a strike, where are the likes of GWU and the Labour of old defending workers rights?", 
    numChildren = 1L, child = list(structure(list(postId = "2794971958", 
        date = "Thursday, July 21, 2016 11:01 AM", poster = "Glorfindel", 
        disqusUname = "disqus_daQLxWKMFy", message = "Workers rights yes, but these are not workers but wanna be millionaires in the making! They should do some research and see what a great life they have, then maybe drop these unrealistic demands! Shame on these pilots!", 
        numChildren = 2L, child = list(structure(list(postId = "2798727439", 
            date = "Saturday, July 23, 2016 9:14 AM", poster = "Christopher Hitch Borg", 
            disqusUname = "christopherhitchborg", message = "Pilots are workers.", 
            numChildren = 1L, child = list(structure(list(postId = "2798801249", 
                date = "Saturday, July 23, 2016 11:06 AM", poster = "Glorfindel", 
                disqusUname = "disqus_daQLxWKMFy", message = "Dream on. Sounds to me you are either very dumb or a capitalist trying to confuse issues.", 
                numChildren = 0), .Names = c("postId", "date", 
            "poster", "disqusUname", "message", "numChildren"
            )))), .Names = c("postId", "date", "poster", "disqusUname", 
        "message", "numChildren", "child")), structure(list(postId = "2794982098", 
            date = "Thursday, July 21, 2016 11:14 AM", poster = "toni", 
            disqusUname = "disqus_bujblK3zF5", message = "pilots all over the world have a good salary, to be were they are they had to make big sacrifices and pay  lots of money for the studies. The shame is on persons getting 13000 euros for absolutely nothing, or persons put in high places with no experience at all, shame is making an 18 year old a CEO, and I could go on for ever.We should thank all Air Malta pilots for doing a good job for all this time", 
            numChildren = 2L, child = list(structure(list(postId = "2795785527", 
                date = "Thursday, July 21, 2016 8:00 PM", poster = "Glorfindel", 
                disqusUname = "disqus_daQLxWKMFy", message = "Agan: when a company is fighting for survival, it is shameful, distasteful and counterproductive to demand a 30% salary increase!!! Especially that when compared to other pilots their perks are already better than most!The other stuff you mentio has nothing to do with this article. However two wrongs do not make a right. Simple as that.", 
                numChildren = 0), .Names = c("postId", "date", 
            "poster", "disqusUname", "message", "numChildren"
            )), structure(list(postId = "2795010275", date = "Thursday, July 21, 2016 11:50 AM", 
                poster = "Jesmond Tedesco Triccas", disqusUname = "jesmondtedescotriccas", 
                message = "Air Malta pilots have their training paid for by the company. And do they have definite or indefinite contracts?", 
                numChildren = 1L, child = list(structure(list(
                  postId = "2795206130", date = "Thursday, July 21, 2016 2:53 PM", 
                  poster = "toni", disqusUname = "disqus_bujblK3zF5", 
                  message = "I have my doubts about the company paying for training, because I know of persons who couldin't make it for the financial reasons, however whatever the situation one cannot deny that they have one of the most difficult and responsible jobs existing", 
                  numChildren = 0), .Names = c("postId", "date", 
                "poster", "disqusUname", "message", "numChildren"
                )))), .Names = c("postId", "date", "poster", 
            "disqusUname", "message", "numChildren", "child")))), .Names = c("postId", 
        "date", "poster", "disqusUname", "message", "numChildren", 
        "child")))), .Names = c("postId", "date", "poster", "disqusUname", 
    "message", "numChildren", "child")))), .Names = c("postId", 
"date", "poster", "disqusUname", "message", "numChildren", "child"
)))

推荐答案

这已经为你实现了,你不需要自己应用递归.

This is already implemented for you, and you do not need to apply recursion yourself.

采用上面发布的数据,并假设它被称为 lol(对于list-of-list",没有双关语的意思),我们可以这样做:

Taking your posted data above, and assuming it's called lol (for "list-of-list", no pun intended), we can do:

tree <- FromListExplicit(lol[[1]], nameName = "postId", childrenName = "child")
print(tree, "date", "poster", "disqusUname")

这将打印为:

                   levelName                             date                  poster           disqusUname
1 2794968061                 Thursday, July 21, 2016 10:56 AM                    toni     disqus_bujblK3zF5
2  °--2794971958             Thursday, July 21, 2016 11:01 AM              Glorfindel     disqus_daQLxWKMFy
3      ¦--2798727439          Saturday, July 23, 2016 9:14 AM  Christopher Hitch Borg  christopherhitchborg
4      ¦   °--2798801249     Saturday, July 23, 2016 11:06 AM              Glorfindel     disqus_daQLxWKMFy
5      °--2794982098         Thursday, July 21, 2016 11:14 AM                    toni     disqus_bujblK3zF5
6          ¦--2795785527      Thursday, July 21, 2016 8:00 PM              Glorfindel     disqus_daQLxWKMFy
7          °--2795010275     Thursday, July 21, 2016 11:50 AM Jesmond Tedesco Triccas jesmondtedescotriccas
8              °--2795206130  Thursday, July 21, 2016 2:53 PM                    toni     disqus_bujblK3zF5

有关 FromListExplicit 的详细信息,请参阅

For details on FromListExplicit, see

?FromListExplicit

或(更容易记住):

?as.Node.list

这篇关于如何在R中递归构建未知深度的树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆