R:了解图表 [英] R: Understanding Graph

查看:36
本文介绍了R:了解图表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R编程语言,而我使用的是"igraph"图书馆.我试图更好地理解两种模式"的图形结构.图(其中有两种类型的节点的图).特别是,我试图了解如何投影"项目.两种模式"(据我所知,这些通常是二分图")图.(

我读到,理解二部图的更好的方法是通过演员和电影".同一部电影中可以有不同的演员,而同一部电影中可以有一个演员-但是,这样的演员不能与自己共享优势,而电影也不能与自身共享优势.这是我对这种网络的解释:

  film_data<-data.frame(电影"= c("movie_1","movie_1","movie_1","movie_2","movie_2","movie_2","movie_3","movie_3","movie_3",movie_4","movie_4","movie_4","movie_4","movie_5","movie_5","movie_5","movie_6","movie_6"),演员"= c("actor_1","actor_2","actor_3","actor_2","actor_3","actor_4","actor_1","actor_5","actor_6"actor_2","actor_7","actor_1","actor_8","actor_5","actor_9","actor_3","actor_2","actor_8"))#创建有向图图表<-graph.data.frame(film_data,有向= F)图<-简化(图)情节(图) 

但是,(根据此处的stackoverflow帖子:

我的问题:

  1. 为什么原始的演员-电影图不是双向"的?毕竟,它是无向且循环的.

  2. 为什么行 V(graph)$ type<-V(graph)$ name%in%film_data [,1] 将actor-movie图转换成二部图?

  3. 有什么原因

    is.bipartite(proj $ proj1) 1

    is.bipartite(proj $ proj2) 1

  4. 此行 proj< -bipartite.projection(graph,V(graph)$ type,multiplicity = TRUE)如何工作?在原始的演员/电影图中,我特定地输入了数据,以便两个电影或两个演员之间没有直接的关系.例如,在"proj2"中,在"movie_1"和"movie_1"之间有一条边和电影_2"-这是如何发生的?为什么发生?在我的原始数据中,movie_1和movie_2之间没有这种直接关系吗?

  5. 假设actor_1,actor_2,actor_3,actor_4是男性,而actor_5,actor_6,actor_7,actor_8,actor_9是女性.现在有办法做出3个预测吗?是男性演员的放映,女性演员的放映和电影放映?

谢谢

解决方案

除了您的演员和电影类电影外,我还要补充一点,演员只能 连接到0或更多电影,从来没有其他演员.电影只能 连接到0个或更多演员.现在,对于问题:

A1.

当某个功能的输出与您的期望不符时,查看该功能的帮助页面通常会很有帮助.该命令将解释第一个问题:

 ?is.bipartite 

二部图在igraph中具有类型顶点属性,对于第一种类型的顶点为boolean和FALSE,对于第二种类型的顶点为TRUE.

[...]

is_bipartite 检查图是否为二分图.它只是检查图是否具有称为 type 的顶点属性.

因此, is_bipartite 不会将原始的actor-movie图视为二分图,因为该图没有名为 type 的顶点属性.在 graph 中根本没有任何信息可以告诉它每个顶点属于哪个集合.我们将在下一个问题中添加此信息:

A2.

在这里,我们将看一下您已经找到的示例,我将尝试对其进行解释.让我们首先再次查看帮助页面:

?V

创建包含图的所有顶点的顶点序列(vs).[...]

此函数 V()从图中创建一系列顶点. V(graph)将列出 graph 中的所有顶点.我们希望 V(graph)$ type 包含基本属性 type .

如第一个帮助页面中所述,V(graph)$ type需要为 graph 中的每个顶点包含TRUE/FALSE值,这是在此代码中完成的操作:

  V(graph)$ type<-V(graph)$ name%in%film_data [,1] 

V(graph)$ name 是一个包含所有顶点值的向量. film_data [,1] 是一个向量,其中包含所有主顶点(第一组顶点)的值.查看这两个R来研究它们的内容,您将明白我的意思.

最后,%in%运算符将检查左侧的每个项目是否在右侧的矢量中.如果是这样,则返回TRUE.如果不是,则返回FALSE.在这种情况下,它将为每个次要顶点(例如演员)返回一个向量为TRUE的向量,而对于主要组中的每个顶点(例如电影)返回一个FALSE的向量.

完整的构造 V(graph)$ name%in%film_data [,1] 因此创建了TRUE和FALSE的向量,其中TRUE表示顶点属于第二组.正如帮助页面所说,我们可以通过简单地将此信息存储在 V(graph)$ type 中来使我们的 graph 成为二分之一.

A3.

如果我们像A1一样查看 V(proj $ proj1)$ type ,我们会发现 proj $ proj1 不包含 type 属性.再次不知道其顶点是主要组还是次要组-运行 bipartite.projection()时,此信息会丢失.但是这次没有必要.我们知道它不是二部图,因为它只包含一组.

您可以选择使用 bipartite.projection()中的 remove.type = F 选项保留此信息.

A4.

双向投影显示了哪些 actor 共享相同的 movies ,以及哪些 movies 共享相同的 actors .

示例:在您的示例数据中,我们可以看到Actor 6仅与电影3连接.电影3也与Actor 1和5连接.双向投影将显示Actor 6仅与Actor 1和5连接.

>

A5.

在这里,我将设计一个包含两个集合(演员和电影)的图表,其中的演员具有一个额外的属性(男性)来指定性别.

您之前构造图形的方式未指定这些项目之间的关系.我不知道这种方法,也不认为这是指定图形的正确方法. igraph手册页中提到了多种创建图的方法..我将演示一种与您的方法接近的方法:

 项目<-data.frame(名称= c("actorM1","actorM2","actorM3","actorF1","actorF2","actorF3","actorF4","movie1","movie2"; movie3"),电影= c(F,F,F,F,F,F,F,T,T,T),男性= c(T,T,T,F,F,F,F,NA,NA,NA))项目关系<-data.frame(a = c("actorM1","actorM1","actorM2","actorM3","actorM3","actorM3","actorF1","actorF2","2"或"actor").;"actorF3","actorF3","actorF3","actorF4"),b = c(电影1",电影2",电影3",电影1",电影2",电影3",电影2",电影2",电影3","; movie1","movie2","movie3","movie3"))关系图<-graph_from_data_frame(关系,有向= F,顶点=项目)图形情节(图) 

上面我创建了两个数据框:

  • 项目包含每个项目(5名演员和3部电影)的条目及其特征(是电影,是男性),以及
  • 关系列出了它们的连接方式.

然后我使用 graph_from_data_frame()将这两部分合并成一个图形.

您会记住下一步:我根据$ movie的值分配集合.然后我把电影画出来.暂时不要画演员,因为我们仍然需要将男人和女人分开.

  actors_movies<-bipartite.projection(graph,types = V(graph)$ movie,remove.type = F)情节(actors_movies $ proj2) 

我找不到比这更好的解决方案:将所有女性从投影中删除以密谋所有男人,然后将所有男性从投影中删除以密谋所有女人.请注意,由于 bipartite.projection()中的选项 remove.type = F ,信息 $ male 仍然可用.

  male = delete_vertices(actors_movies $ proj1,V(actors_movies $ proj1)$ male == F)情节(男)女性= delete_vertices(actors_movies $ proj1,V(actors_movies $ proj1)$ male == T)情节(女) 

我希望这对您有帮助.至少我喜欢学习igraph.

I am using the R programming language and the "igraph" library. I am trying to better understand graph structures for "two mode" graphs (graphs in which there are two types of nodes). In particular, I am trying to understand how to "project" two mode" (to my understanding, these are usually "bipartite") graphs. (https://rpubs.com/pjmurphy/317838)

For instance, I created a graph of relationships between "men" and "women". Although this graph has two modes (men and women), I don't think that this graph is bipartite (since "edges" can exist between the same types of nodes:

library(igraph)

# I don't think this is a bipartite graph
gender_data <- data.frame(
    
    "men" = c("john", "kevin", "mark", "kevin", "kevin", "mark", "henry", "mark", "susan", "john", "henry", "susan", "susan", "janet", "janet", "henry", "henry", "john"),
    "women" = c("janet", "janet", "sarah", "lucy", "lucy", "susan", "janet", "susan", "lucy", "kevin", "lucy", "janet", "kevin", "mark", "lucy", "sarah", "mark", "mark")
)

#create directed graph 
graph <- graph.data.frame(gender_data, directed=F)
graph <- simplify(graph)

V(graph)["john"]$color<-"red"
V(graph)["kevin"]$color<-"red"
V(graph)["mark"]$color<-"red"
V(graph)["janet"]$color<-"blue"
V(graph)["sarah"]$color<-"blue"
V(graph)["lucy"]$color<-"blue"
V(graph)["henry"]$color<-"red"
V(graph)["susan"]$color<-"blue"

plot(graph)

I read that a better way to understand bipartite graphs is through "actors and movies". Different actors can be in the same movie and one actor can be in different movies - but as such an actor can't share an edge with itself and a movie can not share an edge with itself. Here is my interpretation of such a network:

film_data <- data.frame(
    
    "movie" = c("movie_1", "movie_1", "movie_1", "movie_2", "movie_2", "movie_2", "movie_3", "movie_3", "movie_3", "movie_4", "movie_4", "movie_4", "movie_4", "movie_5", "movie_5", "movie_5", "movie_6", "movie_6"),
    "actor" = c("actor_1", "actor_2", "actor_3", "actor_2", "actor_3", "actor_4", "actor_1", "actor_5", "actor_6", "actor_2", "actor_7", "actor_1", "actor_8", "actor_5", "actor_9", "actor_3", "actor_2", "actor_8")
)

#create directed graph 
graph <- graph.data.frame(film_data, directed=F)
graph <- simplify(graph)
plot(graph)

However, (according to this stackoverflow post here: valued bipartite projection using R igraph ), this actor graph is still not bipartite (I don't understand why):

is.bipartite(graph)
[1] FALSE

According to the same stackoverflow post, the actor graph can still be converted into a bipartite graph (I don't understand what just happened):

V(graph)$type <- V(graph)$name %in% film_data[,1]
is.bipartite(graph)
[1] TRUE

From here, a projection can be made that "projects" two separate graphs:

proj<-bipartite.projection(graph, V(graph)$type,multiplicity = TRUE)
proj

$proj1
IGRAPH b5bc5ca UNW- 9 16 -- 
+ attr: name (v/c), weight (e/n)
+ edges from b5bc5ca (vertex names):
 [1] actor_1--actor_2 actor_1--actor_3 actor_1--actor_5 actor_1--actor_6 actor_1--actor_7 actor_1--actor_8 actor_2--actor_3 actor_2--actor_4
 [9] actor_2--actor_7 actor_2--actor_8 actor_3--actor_4 actor_3--actor_5 actor_3--actor_9 actor_5--actor_6 actor_5--actor_9 actor_7--actor_8

$proj2
IGRAPH b5bc5ca UNW- 6 11 -- 
+ attr: name (v/c), weight (e/n)
+ edges from b5bc5ca (vertex names):
 [1] movie_1--movie_3 movie_1--movie_4 movie_1--movie_2 movie_1--movie_6 movie_1--movie_5 movie_2--movie_4 movie_2--movie_6 movie_2--movie_5
 [9] movie_3--movie_4 movie_3--movie_5 movie_4--movie_6

Finally, the two projections can be plotted:

plot(proj$proj1)
plot(proj$proj2)

My questions:

  1. Why wasn't the original actor-movie graph "bipartite"? After all, it was undirected and cyclic .

  2. Why does the line V(graph)$type <- V(graph)$name %in% film_data[,1] transform the actor-movie graph into a bipartite graph?

  3. Is there any reason that

    is.bipartite(proj$proj1) 1 FALSE

    is.bipartite(proj$proj2) 1 FALSE

  4. How does this line proj<-bipartite.projection(graph, V(graph)$type,multiplicity = TRUE) "work"? In the original actor-movie graph, I specifically entered the data so that there are no direct relationships between two movies or two actors. For instance, in "proj2" there is a edge between "movie_1" and "movie_2" - how did this happen and why did this happen? In my original data, there is no such direct relationship between movie_1 and movie_2?

  5. Suppose actor_1, actor_2, actor_3, actor_4 are male and actor_5, actor_6, actor_7, actor_8, actor_9 are female. Is there a way to now make 3 projections? Projection for male actors, projection for female actors and projections for movies?

Thanks

解决方案

In addition to your actors and movies-analogy, I would like to add that an actor can only be connected to 0 or more movies, never to other actors. And movies can only be connected to 0 or more actors. Now, for the questions:

A1.

When the output of a certain function doesn't match your expectation, it is often helpful to look at the help page for that function. This command will explain the first question:

?is.bipartite

Bipartite graphs have a type vertex attribute in igraph, this is boolean and FALSE for the vertices of the first kind and TRUE for vertices of the second kind.

[...]

is_bipartite checks whether the graph is bipartite or not. It just checks whether the graph has a vertex attribute called type.

So, is_bipartite doesn't consider the original actor-movie graph to be bipartite, because the graph doesn't have a vertex attribute called type. There simply is no information in graph that tells it which set each vertex belongs to. We'll add this information in the next question:

A2.

Here we'll look at the example you already found, and I'll try to explain it. Let's first check the help page again:

?V

Create a vertex sequence (vs) containing all vertices of a graph. [...]

This function V() creates a sequence of vertices from a graph. V(graph) will list all vertices in graph. We want V(graph)$type to contain the essential attribute type.

As explained in the first help-page, V(graph)$type needs to contain a TRUE/FALSE values for each vertex in graph, which is what is done in this code:

V(graph)$type <- V(graph)$name %in% film_data[,1]

V(graph)$name is a vector that contains the values of all vertices. film_data[,1] is a vector that contains the values of all primary vertices (vertices of the first group). View these two R to study their contents and you'll see what I mean.

Finally, the %in% operator checks, for each item on the left, if it exists in the vector on the right. If so, it returns TRUE. If not, it returns FALSE. In this case it will return a vector with TRUE for each of the secondary vertices (e.g. actors), and a FALSE for those in the primary group (e.g. movies).

The complete construct V(graph)$name %in% film_data[,1] thus creates a vector of TRUEs and FALSEs, where a TRUE indicates that a vertex belongs to the secondary group. And as the help page said, we can make our graph bipartite by simply storing this information in V(graph)$type.

A3.

If we look at V(proj$proj1)$type, like what we did in A1, we see that proj$proj1 doesn't contain the type attribute. Again doesn't know whether its vertices are primary or secondary group - this information got lost when running bipartite.projection(). But this time it's not necessary; we know it's not a bipartite graph because it only contains one set.

You can optionally choose retain this information with the option remove.type = F in bipartite.projection().

A4.

The bipartite projection shows which actors share the same movies, and which movies share the same actors.

Example: In your example data, we can see Actor 6 is connected with only Movie 3. Movie 3 is also connected with Actors 1 and 5. The bipartite projection will show Actor 6 connected with only Actors 1 and 5.

A5.

Here I will design a graph with two sets (actors and movies), of which actors have an extra attribute (male) to specify their gender.

The way you constructed a graph before did not specify the relationships between these items. I don't know this method and I don't think it is a proper way to specify a graph. There are multiple ways to create a graph mentioned in the igraph manual pages. I will demonstrate one close to your method:

items <- data.frame(
    name  = c("actorM1","actorM2","actorM3","actorF1","actorF2","actorF3","actorF4","movie1","movie2","movie3"),
    movie = c(F,F,F,F,F,F,F,T,T,T),
    male  = c(T,T,T,F,F,F,F,NA,NA,NA)
)
items

relations <- data.frame(
    a = c("actorM1","actorM1","actorM2","actorM3","actorM3","actorM3","actorF1","actorF2","actorF2","actorF3","actorF3","actorF3","actorF4"),
    b = c("movie1", "movie2", "movie3", "movie1", "movie2", "movie3", "movie2", "movie2", "movie3", "movie1", "movie2", "movie3", "movie3")
)
relations

graph <- graph_from_data_frame(relations, directed=F, vertices=items)
graph
plot(graph)

Above I have created two dataframes:

  • items contains an entry for each item (5 actors and 3 movies) along with their characteristics (are they a movie, are they male), and
  • relations lists how they are connected.

I then merged these two into a graph with graph_from_data_frame().

You'll remember the next step: I assign sets based on the value of $movie. Then I plot the movies. Don't plot the actors yet, because we still need to split men from women.

actors_movies <- bipartite.projection(graph, types = V(graph)$movie, remove.type = F)
plot(actors_movies$proj2)

I couldn't find a nicer solution to split up this group than this: remove all women from the projection to plot all men, then remove all men from the projection to plot all women. Note that the information $male is still available thanks to the option remove.type = F in bipartite.projection().

male = delete_vertices(actors_movies$proj1, V(actors_movies$proj1)$male == F)
plot(male)

female = delete_vertices(actors_movies$proj1, V(actors_movies$proj1)$male == T)
plot(female)

I hope this is helpful for you. At least I enjoyed learning about igraph.

这篇关于R:了解图表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆