在R上的igraph中的大型数据集中查找根顶点 [英] Finding root vertices in largeish dataset in igraph on R

查看:140
本文介绍了在R上的igraph中的大型数据集中查找根顶点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个根据边列表创建的图形,并且有数百个顶点.我想做的是确定所有后续顶点都与之相关的初始顶点集(例如母树或家谱).

Suppose you have an graph that you've made from an edgelist, and there are a couple hundred vertices. What I'm looking to do is to identify the initial set of vertices from which all subsequent ones are related to (like a mother, or family tree).

这是一个数据集,代表冰岛",它们是从冰川中脱落并漂浮在海面上的大块表格冰块.初始裂缝代表根节点.随后的顶点是对这些较小的碎片(熔化的岛)或断裂的碎片(因此源顶点具有两个边的网络并继续形成两个新顶点)的重新观察.

This is a data set that represents 'ice islands', large tabular sheets of ice that break off from glaciers and float around the sea. The initial fractures represent the root nodes. The subsequent vertices are re-observations of these pieces that are either smaller (melted islands), or pieces that have broken off (so the source vertex has a network of two edges and goes on to form two new vertices).

是否有一段代码或函数可以轻松地对我做到这一点?如果在图上添加标签,将无法读取.我已经找到的大多数操作根节点的方法都涉及小的样本数据集,您可以在其中随意命名图形中的事物或使用顶点的实际名称.我的数据来自大型已建立的CSV,具有超长的数字字符名称.这使它变得困难.

Is there a piece of code or a function that can do this easily for me? If I add labels to my plot it's impossible to read. Most of the methods of manipulating root nodes that I've been able to find involve small sample data sets where you just arbitrarily name things in the graph, or use the vertex's actual name. My data is stuff coming from a huge established CSV with super long number-character names. It makes it difficult.

我也是编码的新手,R对我来说是一场噩梦.请保持温柔,并使用简单的示例!如果您认为有帮助,我可以附加我的代码,我的所有数据都已从服务器中拉出,并且我不知道从您的角度看是否很清楚.

I'm also super new to coding and R is a nightmare for me to use. Please be gentle and use simple examples! I can attach my code if you think it helps, all my data is being pulled out from a server and I don't know if it will be very clear from your perspective.

谢谢.

推荐答案

对于任何节点,n都可以使用 neighbors(g, n, mode="in").如果节点没有任何边缘进入,则该节点为初始顶点.因此,您只需测试所有节点的节点有多少边,然后选择答案为零的边即可.

For any node, n, you can find the number of edges into the node using neighbors(g, n, mode="in"). A node is an initial vertex if it does not have any edges coming into it. So you can just test all of the nodes for how many edges enter the node and select those for which the answer is zero.

这是一个简单的示例图:

Here is a simple example graph:

library(igraph)
set.seed(2017)
g = erdos.renyi.game(12, 20, type="gnm", directed=TRUE)
plot(g)

现在我们可以找到根节点.

Now we can find the root nodes.

which(sapply(sapply(V(g), 
    function(x) neighbors(g,x, mode="in")), length) == 0)
[1] 1 2

这表示节点1和2是源.

This says that nodes 1 and 2 are sources.

由于您说自己是新手,所以让我解释一下.

Since you say that you are a beginner, let me explain this just a little.

function(x) neighbors(g,x, mode="in")是一个函数,它使用节点作为参数,并使用neighbors返回具有从y到x(x的父代)的链接的节点y的列表.

function(x) neighbors(g,x, mode="in") is a function that takes a node as an argument and uses neighbors to return a list of nodes y that have a link from y to x (the parents of x).

sapply(V(g), function(x) neighbors(g,x, mode="in"))将该函数应用于图中的所有节点,因此给出了每个节点的父级列表.我们对没有父节点的节点感兴趣,因此我们希望该列表的长度为零的节点.因此,我们将长度应用于父母列表,并检查哪些长度为零.

sapply(V(g), function(x) neighbors(g,x, mode="in")) applies that function to all of the nodes in the graph, and so gives a list of the parents for every node. We are interested in the nodes that have no parents so we want the nodes for which the length of this list is zero. Thus, we apply length to the list of parents and check which lengths are zero.

这篇关于在R上的igraph中的大型数据集中查找根顶点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆