将社交网络可视化,以显示用户在R中提及的频率 [英] Visualize a social network to show how often a user is mentioned in R

查看:148
本文介绍了将社交网络可视化,以显示用户在R中提及的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出如下的数据框:

  v1 v2 v3 v4 v5 
tom小指A 3
ben B hugo C 2
lily A a A 1
...



< <> c $ c> v2 v3 >来自 v4 v5 次。例如,来自A组的tom从小组A中提到了3次pinky。现在我想绘制一个社交网络,每个用户用一个点表示,其大小与他或她被完全提及的时间成正比。如果他们彼此相互或单方面地提及彼此,则有两条线联系。



当我查看ggplot文档时,找不到任何函数来执行它。

你有什么想法吗?提前致谢!

编辑:



这是我到目前为止获得的图表:

解决方案

  library(igraph)
library (dplyr)

#创建示例数据集
dt = data.frame(v1 = c(tom,ben,lilly,mark),
v2 = c(A,B,A,C),
v3 = c(pinky,hugo,tom,pinky),
v4 = c(A,D,A,A),
v5 = c(20,10,15,15),
stringsAsFactors = F)

dt

#v1 v2 v3 v4 v5
#1 tom A小指A 20
#2 ben B hugo D 10
#3 lilly A tom 15
#4标记C小指A 15


#选择要用于图形的名称列
dt_graph = dt%>%select(v1, v3)

#创建图形
g = graph.data.fram e(dt_graph)

#count提及姓名的次数
dt_times_ mentioned =
dt%>%
group_by(v3)%>%
总结(times = sum(v5))

dt_times_ mentioned

#v3 times
#(chr)(dbl)
#1 hugo 10
#2小指35
#3 tom 15


#返回到顶点名称以包含图中未提及的名称
dt_weights =
data.frame(names = names(V(g)),stringsAsFactors = F)%>%
left_join(dt_times_ mentioned,by = c(names=v3))%> %
mutate(times = ifelse(is.na(times),0,times))

dt_weights

#名称次数
#1 tom 15
#2 ben 0
#3 lilly 0
#4 mark 0
#5小指35
#6 hugo 10


#根据名称的第一列和第二列创建两个数据集
dt1 = dt%>%select(names = v1,group = v2)
dt2 = dt%>%select(names = v3,group = v4)


#可以区分名称和它们的组值
dt_group =
dt1%>%rbind(dt2)%>%distinct()%>%
mutate(color = colors()[as.numeric (factor(group))+ 5])#从组值中获取颜色

#(注意,为了使颜色容易区分,此示例中的+5是一个任意值。这并不是真的需要。如果您没有多个组可以手动设置颜色)

dt_group

#名称组颜色
#1 tom A antiquewhite3
# 2 ben b antiquewhite4
#3 lilly A antiquewhite3
#4标记C海蓝宝石
#5小指A古董白银3
#6 hugo D海蓝宝石1


#绘制图形
plot(g,vertex.size = dt_weights $ times,vertex.color = dt_group $ color)

#添加图例
legend(1.5,1.5 ,
legend = unique(dt_group $ group),
pch = 19,
col = unique(dt_group $ color),
title =Colors - Groups)


Given a data frame as following:

v1    v2  v3     v4   v5
tom   A    pinky  A   3
ben   B    hugo   C   2
lily  A    tom    A   1
...

Which denotes that v1 from group v2 has mentioned v3 from group v4 for v5 times. For instance, tom from group A has mentioned pinky from group A for 3 times. Now I'd like to plot a social network, each user denoted by a point and its size is proportional to the times he or she has been mentioned totally. And there is a line linkage two points if they have mentioned each other mutually or unilaterally.

As I look into the ggplot document, I can not find any function to do it.

Do you have any idea? Thanks in advance!

EDIT:

Here is the graph I get so far:

解决方案

library(igraph)
library(dplyr)

# create example dataset
dt = data.frame(v1 = c("tom", "ben", "lilly", "mark"),
                v2 = c("A","B","A","C"),
                v3 = c("pinky", "hugo", "tom", "pinky"),
                v4 = c("A","D","A","A"),
                v5 = c(20,10,15,15),
                stringsAsFactors = F)

dt

#      v1 v2    v3 v4 v5
# 1   tom  A pinky  A 20
# 2   ben  B  hugo  D 10
# 3 lilly  A   tom  A 15
# 4  mark  C pinky  A 15


# select columns of names to use for the graph
dt_graph = dt %>% select(v1,v3)

# create the graph
g = graph.data.frame(dt_graph)

# count number of times names were mentioned
dt_times_mentioned =
  dt %>%
  group_by(v3) %>%
  summarise(times = sum(v5))

dt_times_mentioned

#      v3 times
#   (chr) (dbl)
# 1  hugo    10
# 2 pinky    35
# 3   tom    15


# join back to the vertex names to include names in the graph that were not mentioned
dt_weights =
  data.frame(names = names(V(g)), stringsAsFactors = F) %>%
  left_join(dt_times_mentioned, by=c("names"="v3")) %>%
  mutate(times = ifelse(is.na(times), 0, times))

dt_weights

#   names times
# 1   tom    15
# 2   ben     0
# 3 lilly     0
# 4  mark     0
# 5 pinky    35
# 6  hugo    10


# create two datasets based on 1st and 2nd column of names
dt1 = dt %>% select(names=v1, group=v2) 
dt2 = dt %>% select(names=v3, group=v4)


# get distinct names and their group values
dt_group = 
  dt1 %>% rbind(dt2) %>% distinct() %>% 
  mutate(color = colors()[as.numeric(factor(group))+5]) # get colours from group values

#(note that the +5 above is an arbitrary value for this example in order to get colors easy to distinguish. this is not really needed. if you don't have many groups you can manually set the colors)

dt_group

#   names group         color
# 1   tom     A antiquewhite3
# 2   ben     B antiquewhite4
# 3 lilly     A antiquewhite3
# 4  mark     C    aquamarine
# 5 pinky     A antiquewhite3
# 6  hugo     D   aquamarine1


# plot the graph
plot(g, vertex.size = dt_weights$times, vertex.color = dt_group$color)

# add legend
legend(1.5, 1.5,
       legend=unique(dt_group$group),
       pch=19,
       col=unique(dt_group$color),
       title = "Colors - Groups")

这篇关于将社交网络可视化,以显示用户在R中提及的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆