从不同的数据创建图表 [英] Create a chart from different data

查看:104
本文介绍了从不同的数据创建图表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要帮助来创建图表。我解释得更好。



我创建了10个随机图,每个图都有N个节点。
我已经完成了N = 10 ^ 3,10 ^ 4,10 ^ 5。
所以共有30张图。



给他们每个人,我找到了他们拥有的多重链接和selfloops的百分比。



现在我想创建一个显示节点数量百分比函数的图形。
所以像这样:





所以我有3个列表:
- listNets 包含30张图
- listSelf 包含selfloops的百分比
- listMul 包含多重链接的百分比



这就是我所做的:

  listN <-c((10 ^ 3),( 10 ^ 4),(10 ^ 5))

#网络列表
listNets< - vector(mode =list,length = 0)
#list of自循环的百分比
listSelf< - vector(mode =list,length = 0)
多链接百分比的列表
listMul< - vector(mode =list,length = 0)

...

(listN中的N){

...

净值< ; - graph_from_adjacency_matrix(adjmatrix = adjacency_matrix,mode =undirected)#它的工作,事实上,如果我绘制它,我看到一个正确的网络
listNets< - c(listNets,net)#我加net到l ($ net













$ -loops e multilinks
netmatr< - as_adjacency_matrix(net,sparse = FALSE)
num_selfloops< - sum(diag(netmatr))
num_multilinks< - sum(netmatr> 1)

#我找到百分比
per_self< - ((num_selfloops / num_vertices)* 100)
per_mul< - ((num_multilinks / num_edges)* 100)

listSelf< - c(listSelf,per_self)
listMul< - c(listMul,per_mul)
}

现在,如果我以这种方式打印 listNets ,我有些奇怪:

 > print(listNets)
[[1]]
[1] 9

[[2]]
[1] FALSE

[[3]]
[1] 7 6 3 8 8 8

[[4]]
[1] 0 1 2 4 5 7

[[5]]
[1] 2 1 0 3 4 5

[[6]]
[1] 0 1 2 3 4 5

[[7]]
[1] 0 0 0 0 1 1 1 2 3 6

[[8]]
[1] 0 1 2 3 3 4 5 5 6 6

[[9]]
[[9]] [[1]]
[1] 1 0 1

[[9]] [[2]]
名单列表()

[[9]] [[3]]
list()

$ [$ 9]] [[4]]
list()


[[10]]
< environment:0x000000001a6284a8>
$ b $ [[11]]
[1] 9

[[12]]
[1] FALSE

[[13]]
[1] 2 5 8 8 7 8

[[14]]
[1] 0 1 3 4 6 7

[[15]]
[1] 0 1 4 2 3 5

[[16]]
[1] 0 1 2 3 4 5

[[17]]
[1] 0 0 0 1 1 2 2 3 6

[[18]]
[1] 0 1 2 2 3 4 4 5 6 6

[[19]]
[[19]] [[1]]
[1] 1 0 1

[[19]] [[2]]
名单列表()

[[19]] [[3]]
list()

$ [b] [b]
[b]

...

相反,如果我打印另外两个列表( listSelf listMult 一切正常)。



现在,我如何绘制这些数据?



我阅读了关于数据框的内容,但我不明白如何使用它。
有人可以帮我吗?

我试图通过手工将一个可能的结果表写在一个csv文件中,然后尝试绘制它以查看如果我正朝着正确的方向前进。



这是代码,这就是结果。
注意:我手工创建的表格和我发明的百分比。

 > df<  -  read.csv(./ table.csv,sep =,)#读取csv文件
> df
N perSelf perMul
1 10 ^ 3 2 1
2 10 ^ 3 5 1
3 10 ^ 3 98 15
4 10 ^ 3 50 51
5 10 ^ 3 41 52
6 10 ^ 3 21 100
7 10 ^ 3 36 80
8 10 ^ 3 70 20
9 10 ^ 3 80 55
10 10 ^ 3 100 44
11 10 ^ 4 2 1
12 10 ^ 4 5 18
13 10 ^ 4 100 20
14 10 ^ 4 50 51
15 10 ^ 4 51 52
16 10 ^ 4 21 100
17 10 ^ 4 36 80
18 10 ^ 4 70 20
19 10 ^ 4 73 85
20 10 ^ 4 100 98
21 10 ^ 5 100 10
22 10 ^ 5 5 1
23 10 ^ 5 98 15
24 10 ^ 5 50 51
25 10 ^ 5 41 52
26 10 ^ 5 21 85
27 10 ^ 5 36 80
28 10 ^ 5 65 20
29 10 ^ 5 80 55
30 10 ^ 5 100 44



有s



非常感谢






代码是:

 <$ c $从列表(list_all)创建一个矩阵
mat < - matrix(unlist(list_all),
unique(lengths(list_all)),
dimnames = list(NULL,c (N,%selfloops,%multilinks)))

#将矩阵转换为数据帧
df < - as.data.frame(x = mat,row .names = NULL)
df

#plot
dflong < - melt(df,id.vars ='N')

x11( )
ggplot(dflong,aes(x = N,y = value,color = variable))+
geom_point(size = 5,alpha = 0.7,position = position_dodge(width = 0.3))+
scale_x_discrete(labels = parse(text = as.character(unique(dflong $ N))))+
scale_y_continuous('',breaks = seq(0,100,25),labels = paste(seq ( 0',100,25),'%'))+
scale_color_manual('',values = c('red','blue'),
labels = c('Selfloop Percentage of'多重链接的百分比'))+
theme_minimal(base_size = 14)

<$ c $

  N%selfloops%multilinks 
1 10 11.111111 0.00000
2 10 11.111111 0.00000
3 10 0.000000 0.00000
4 20 0.000000 0.00000
5 20 0.000000 15.38462
6 20 0.000000 0.00000
7 30 3.448276 0.00000
8 30 3.448276 0.00000
9 30 0.000000 0.00000


解决方案

<以您的 df 数据框为起点,您可以分两步获得所需的结果:

1)使用 reshape2 将数据重塑为长格式:

  library( reshape2)
dflong < - melt(df,i d.vars ='N')

2) (ggplot2):
ggplot(dflong,aes(x = N,ggplot2):

  ,y = value,color = variable))+ 
geom_point(size = 5,alpha = 0.7,position = position_dodge(width = 0.3))+
scale_x_discrete(labels = parse(text = as.character (unique(dflong $ N))))+
scale_y_continuous('',breaks = seq(0,100,25),labels = paste(seq(0,100,25),'%'))+
scale_color_manual('',values = c('red','blue'),
labels = c('selfloop百分比','多重链接百分比'))+
theme_minimal(base_size = 14)

给出:



我使用透明度( alpha = 0.7 )能够






回应您的评论和问题中的第二个例子:



您必须稍微修改 ggplot2 代码:


  • 更改 x aes 中的变量作为因子。 没有必要以解析标签的文字,从而删除该部分。

  • 调整y值中的值和中断。



以下代码:

  ggplot(dflong,aes(x = factor(N ),y = value,color = variable))+ 
geom_point(size = 5,alpha = 0.5,position = position_dodge(width = 0.3))+
xlab('N')+ $ b $ (0,20,5),'%'),
limits = c(0,20) ))+
scale_color_manual('',
values = c('red','blue'),
labels = c('自我循环的百分比','多重链接的百分比'))+
theme_minimal(base_size = 14)

会给你:







使用的数据:

  df<  -  structure(list(N =结构(c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,3L,3L,3L ,3L,3L,3L,3L,3L,3L,3L),。标签= c(10 ^ 3,10 ^ 4,10 ^ 5),class =factor b perSelf = c(2L,5L,98L,50L,41L,21L,36L,70L,80L,100L,2L,5L,100L,50L,51L,21L,36L,70L,73L,100L,100L,5L,98L (1L,1L,15L,51L,52L,100L,80L,20L,55L,44L,1L,18L,20L,50L,41L,21L,36L,65L,80L,100L) 51L,52L,100L,80L,20L,85L,98L,10L,1L,15L,51L,52L,85L,80L,20L,55L, 44L)),
.Names = c(N,perSelf,perMul),class =data.frame,row.names = c(NA,-30L))


I need help to create a chart. I explain better.

I created 10 random graphs, each with N nodes. I have done that for N = 10^3, 10^4, 10^5. So in total 30 graphs.

To each of them I found the percentage of multilinks and selfloops they have.

Now I would like to create a single graph that shows the percentage in function of the number of nodes. So something like:

So I have a 3 lists: - listNets containing 30 graphs - listSelf containing the percentage of selfloops - listMul containing the percentage of multilinks

This is what I did:

listN <- c((10^3), (10^4), (10^5))

# list of networks
listNets <- vector(mode = "list", length = 0) 
# list of percentage of selfloops
listSelf <- vector(mode = "list", length = 0)
#list of percentage of multilinks
listMul <- vector(mode = "list", length = 0)

...

for(N in listN) {

    ...

    net <- graph_from_adjacency_matrix(adjmatrix = adjacency_matrix, mode = "undirected") # it's work, infact if I plot it i saw a correct networks 
    listNets <- c(listNets, net) # I add net to list of networks
    x11()
    plot(net, layout = layout.circle(net))

    ...

    # I find self-loops e multilinks
    netmatr <- as_adjacency_matrix(net, sparse = FALSE)
    num_selfloops <- sum(diag(netmatr))
    num_multilinks <- sum(netmatr > 1)

    # I find percentage
    per_self <- ((num_selfloops/num_vertices)*100)
    per_mul <- ((num_multilinks/num_edges)*100)

    listSelf <- c(listSelf, per_self) 
    listMul <- c(listMul, per_mul)
}

Now if I print listNets in this way I have something strange:

> print(listNets)
[[1]]
[1] 9

[[2]]
[1] FALSE

[[3]]
[1] 7 6 3 8 8 8

[[4]]
[1] 0 1 2 4 5 7

[[5]]
[1] 2 1 0 3 4 5

[[6]]
[1] 0 1 2 3 4 5

[[7]]
 [1] 0 0 0 0 1 1 1 2 3 6

[[8]]
 [1] 0 1 2 3 3 4 5 5 6 6

[[9]]
[[9]][[1]]
[1] 1 0 1

[[9]][[2]]
named list()

[[9]][[3]]
list()

[[9]][[4]]
list()


[[10]]
<environment: 0x000000001a6284a8>

[[11]]
[1] 9

[[12]]
[1] FALSE

[[13]]
[1] 2 5 8 8 7 8

[[14]]
[1] 0 1 3 4 6 7

[[15]]
[1] 0 1 4 2 3 5

[[16]]
[1] 0 1 2 3 4 5

[[17]]
 [1] 0 0 0 1 1 1 2 2 3 6

[[18]]
 [1] 0 1 2 2 3 4 4 5 6 6

[[19]]
[[19]][[1]]
[1] 1 0 1

[[19]][[2]]
named list()

[[19]][[3]]
list()

[[19]][[4]]
list()


[[20]]
<environment: 0x000000001a859e28>

...

Instead if I print the other two lists (listSelf and listMult everything is ok).

Now, how can I plot this data?

I read about dataframes, but I don't understand how to use it in my case. Can someone help me please?

I tried to bring me back by writing a possible result table on a csv file by hand and tried to plot it to see if I was going in the right direction.

This is the code and this the result. Note: The table I created by hand and I invented the percentages.

> df <- read.csv("./table.csv", sep = ",")  # read csv file 
> df
      N perSelf perMul
1  10^3       2      1
2  10^3       5      1
3  10^3      98     15
4  10^3      50     51
5  10^3      41     52
6  10^3      21    100
7  10^3      36     80
8  10^3      70     20
9  10^3      80     55
10 10^3     100     44
11 10^4       2      1
12 10^4       5     18
13 10^4     100     20
14 10^4      50     51
15 10^4      51     52
16 10^4      21    100
17 10^4      36     80
18 10^4      70     20
19 10^4      73     85
20 10^4     100     98
21 10^5     100     10
22 10^5       5      1
23 10^5      98     15
24 10^5      50     51
25 10^5      41     52
26 10^5      21     85
27 10^5      36     80
28 10^5      65     20
29 10^5      80     55
30 10^5     100     44

There is something wrong.

Thanks a lot


The code is:

# create a matrix from a list (list_all)
mat <- matrix(unlist(list_all), 
              unique(lengths(list_all)),
              dimnames = list(NULL, c("N", "% selfloops", "% multilinks")))

# convert matrix to data frame
df <- as.data.frame(x = mat, row.names = NULL) 
df

# plot
dflong <- melt(df, id.vars = 'N')

x11()
ggplot(dflong, aes(x = N, y = value, color = variable)) +
  geom_point(size = 5, alpha = 0.7, position = position_dodge(width = 0.3)) +
  scale_x_discrete(labels = parse(text = as.character(unique(dflong$N)))) +
  scale_y_continuous('', breaks = seq(0, 100, 25), labels = paste(seq(0, 100, 25), '%')) +
  scale_color_manual('', values = c('red', 'blue'),
                     labels = c('Percentage of selfloop','Percentage of multilinks')) +
  theme_minimal(base_size = 14)

df is:

   N % selfloops % multilinks
1 10   11.111111      0.00000
2 10   11.111111      0.00000
3 10    0.000000      0.00000
4 20    0.000000      0.00000
5 20    0.000000     15.38462
6 20    0.000000      0.00000
7 30    3.448276      0.00000
8 30    3.448276      0.00000
9 30    0.000000      0.00000

解决方案

Taking your df dataframe as a starting point, you can get the desired result in two steps:

1) Reshape your data into long format with reshape2:

library(reshape2)
dflong <- melt(df, id.vars = 'N')

2) Plot the data with ggplot2:

library(ggplot2)
ggplot(dflong, aes(x = N, y = value, color = variable)) +
  geom_point(size = 5, alpha = 0.7, position = position_dodge(width = 0.3)) +
  scale_x_discrete(labels = parse(text = as.character(unique(dflong$N)))) +
  scale_y_continuous('', breaks = seq(0,100,25), labels = paste(seq(0,100,25),'%')) +
  scale_color_manual('', values = c('red','blue'), 
                     labels = c('Percentage of selfloop','Percentage of multilinks')) +
  theme_minimal(base_size = 14)

which gives:

I used a transparency (alpha = 0.7) in order to be able to see where points overlap.


In response to your comment and the second example in the question:

You have to alter the ggplot2 code a bit:

  • Change the x variable in the aes to a factor.
  • There is no need to parse the text for the labels anymore, thus that part can be removed.
  • Adjust the values and breaks in the y-scale.

The following code:

ggplot(dflong, aes(x = factor(N), y = value, color = variable)) +
  geom_point(size = 5, alpha = 0.5, position = position_dodge(width = 0.3)) +
  xlab('N') +
  scale_y_continuous('', breaks = seq(0, 20, 5), 
                     labels = paste(seq(0, 20, 5), '%'),
                     limits = c(0,20)) +
  scale_color_manual('', 
                     values = c('red', 'blue'),
                     labels = c('Percentage of selfloop','Percentage of multilinks')) +
  theme_minimal(base_size = 14)

will give you:


Used data:

df <- structure(list(N = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("10^3", "10^4", "10^5"), class = "factor"), 
                     perSelf = c(2L, 5L, 98L, 50L, 41L, 21L, 36L, 70L, 80L, 100L, 2L, 5L, 100L, 50L, 51L, 21L, 36L, 70L, 73L, 100L, 100L, 5L, 98L, 50L, 41L, 21L, 36L, 65L, 80L, 100L), 
                     perMul = c(1L, 1L, 15L, 51L, 52L, 100L, 80L, 20L, 55L, 44L, 1L, 18L, 20L, 51L, 52L, 100L, 80L, 20L, 85L, 98L, 10L, 1L, 15L, 51L, 52L, 85L, 80L, 20L, 55L, 44L)), 
                .Names = c("N", "perSelf", "perMul"), class = "data.frame", row.names = c(NA, -30L))

这篇关于从不同的数据创建图表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆