R-比较几个数据集 [英] R - comparing several datasets

查看:133
本文介绍了R-比较几个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在R中进行数据分析的一些帮助。

我有两个数据集(之前和之后),我想知道它们之间的差异有多大。



之前

  11330 STAT1 
2721 STAT2
52438 STAT3
6124 SUZY

之后

  17401 STAT1 
3462 STAT2
0 STAT3
72 SUZY

尝试使用 tapply(在$ V1之前,$ V2之前,FUN =平均值)将它们分组。

我试图绘制它,在x轴上没有得到组名称,而是数字。
如何绘制这样的截图数据(X轴上的频率和X轴上的组名)?



还想问什么是正确的命令R比较这些数据集,我愿意找到他们之间的差异?







dput(在$ V1之前)

c(11330L,2721L,52438L,6124L)



dput(在$ V2之前)

结构(1:4,.Label = c(STAT1,STAT2,STAT3,SUZY class =factor)



解决方案



这是我认为你的数据看起来像什么?

  before<  -  data.frame(val = c(11330,2721,52438,6124),
lab = c(STAT1,STAT2,STAT3,SUZY))
after< - data.frame(val = c(17401,3462,0,72),
lab = c(STAT1,STAT2,STAT3,SUZY))

将它们合并到一个具有期间的单个数据框架变量:

 合并<  -  rbind(data.frame(before,period =before),
data .frame(after,period =after))

重新格式化为矩阵和绘图R) dotchart

  library(reshape2)
m< ; - acast(combined,lab〜period,value.var =val)
dotchart(m)

绘制 ggplot

  
qplot(lab,val,color = period,data = combined)


I need some help with data analysis in R.
I do have two datasets (before & after) and I want to see how big the difference is between them.

Before

11330    STAT1
2721    STAT2
52438    STAT3
6124    SUZY

After

17401    STAT1
3462    STAT2
0    STAT3
72    SUZY

Tried to group them with tapply(before$V1, before$V2, FUN=mean).
But as I am trying to plot it, on x axis am not getting the group name but number instead. How can I plot such tapplied data (frequency on Y axis & group name on X axis)?

Also wanted to ask what is the proper command in R to compare such datasets as I am willing to find the difference between them?


Edited

dput(before$V1)
c(11330L, 2721L, 52438L, 6124L)

dput(before$V2)
structure(1:4, .Label = c("STAT1", "STAT2", "STAT3","SUZY"),class = "factor")

解决方案

Here are a couple of ideas.

This is what I think your data look like?

before <- data.frame(val=c(11330,2721,52438,6124),
                     lab=c("STAT1","STAT2","STAT3","SUZY"))
after <- data.frame(val=c(17401,3462,0,72),
                     lab=c("STAT1","STAT2","STAT3","SUZY"))

Combine them into a single data frame with a period variable:

combined <- rbind(data.frame(before,period="before"),
      data.frame(after,period="after"))

Reformat to a matrix and plot with (base R) dotchart:

library(reshape2)
m <- acast(combined,lab~period,value.var="val")
dotchart(m)

Plot with ggplot:

library(ggplot2)
qplot(lab,val,colour=period,data=combined)

这篇关于R-比较几个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆