如何在R中的依赖数据上应用直方图? [英] How to apply histogram on dependent data in R?

查看:228
本文介绍了如何在R中的依赖数据上应用直方图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过R可视化关于女性和男性中关于窦(独立)和arr / AHB(因变量)情况的比例数据(Nij / n)。
ggplot2 方法和其他任何欢迎!
伪代码


  • 组N11.1,...,N32.1的第二列和第三列的直方图li>


代码

  N11.1 N22 .1 N33.1 N44.1 N21.1 N31.1 N32.1 
窦性窦炎1.0 0.0 0.0 0.0 0.0 0.0 12.0
阿尔/ AHB 1.0 0.0 0.0 0.1 0.0 0.0 20.9
N11.1 N22.1 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus 1.0 0.0 0.0 0.0 0.0 0.0 4.0
Arr / AHB 1.0 0.0 0.0 0.0 0.0 0.0 24.0

第一列有row.names。带数据的代码

  library(ggplot2)

data.female< - structure( list(N11.1 = structure(c(3L,3L),.Label = c(,0.0,
1.0,N11),class =factor),N22.1 =结构(c(2L,2L),.Label = c(,
0.0,2.0,N22),class =factor),N33.1 = structure(c 2L,
2L),.Label = c(,0.0,N33),class =factor),N44.1 = structure(2:3,.Label = c( ,
0.0,0.1,0.2,N44),class =factor),N21.1 =结构(c(2L,
2L),。标签= c (,0.0,N21),class =factor),N31.1 = structure(c(2L,
2L),.Label = c(,0.0 N31),class =factor),N32.1 =结构(c(5L,
7L),。标签= c(,0.0,10.8,11.0,12.0 ,17.0,20.9,
22.8,24.0,3.0,4.0,44.0,N32),class =factor)),.Names = c(N11.1,
N22.1,N33.1,N44.1,N21.1,N31.1,N32.1),行.names = c(Sinus,
Arr / AHB),class =data.frame)

data.male< - structure(list(N11.1 = ST (c(3L,3L),.Label = c(,0.0,
1.0,N11),class =factor),N22.1 = structure(c ,2L),.Label = c(,
0.0,2.0,N22),class =factor),N33.1 = structure(c(2L,
2L).Label = c(,0.0,N33),class =因子),N44.1 =结构(c(2L,
2L),。标签= c( ,0.0,0.1,0.2,N44),class =factor),
N21.1 = structure(c(2L,2L),.Label = c( ,0.0,N21),class =factor),
N31.1 =结构(c(2L,2L),.Label = c(,0.0,N31) ,class =factor),
N32.1 = structure(c(11L,9L),.Label = c(,0.0,10.8,
11.0, ),N.0,B,B,B,B, = c(N11.1,N22.1,
N33.1,N44.1,N21.1,N31.1,N32.1), row.names = c(Sinus,
Arr / AHB),class =data.frame)

尝试单个数据行

  data.female.sinus<  -  data.female [ 1:1,1:7] 
print(data.female.sinus)

g< - ggplot(data.female.sinus)
g + geom_bar()
#Warning messages:
# 1:在min(x,na.rm = na.rm)中:
#没有非缺少参数min;返回Inf
#2:在max(x,na.rm = na.rm)中:
#没有非缺少参数到max;返回-Inf
#3:在min(diff(sort(x)))中:没有非缺少的参数min;返回Inf
#4:将is.na(x):is.na()应用于类型为'NULL'的非(列表或向量)
#5:计算在`stat_count() `:
#arguments意味着不同的行数:0,1
#null设备

预期输出:男性和女性之间的直方图比较,重点在于Arr / AHB是因变量

测试hhh's



,相比之下,这是Man的图表:


1.2。您的因子数据必须转换为矢量或更好:直接读取您的原​​始文件到矢量中,而不是因素!

您的输入数据被格式化为因子数据,在这里很糟糕,这可能是由于错误地使用了read.csv,比如缺少hte标记 na.strings =。或者一些不良格式的元素。更多:




你需要自己操作图例。

2。条形图再次显示比例


数据输入更改为可读格式(不是某些CSZ文件的输出):N32中的值。 1远远大于其他列中的任何其他数据。

  require(点阵)
Sinus< -c(1 ,0,0,0,0,0,12)
ArrAHB <-c(1,0,0,0.1,0,0,20.9)
标签< -c(N11.1 ,N22.2,N33.1,N44.1,N21.1,N31.1,N32.1)
ID <-c(Sinus, (数据女性)
$ b data.female $ b barchart(data.female,auto.key = list(space ='right'))

> data.female
N11.1 N22.2 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus 1 0 0 0.0 0 0 12.0
ArrAHB 1 0 0 0.1 0 0 20.9



I want to visualise the proportional data (Nij/n) about the sinus (independent) and arr/AHB (dependent variable) cases in females and males by R. ggplot2 approach and any other is welcome! Pseudocode

  • histogram of the second and third columns for the groups N11.1, ..., N32.1

Code

        N11.1 N22.1 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus     1.0   0.0   0.0   0.0   0.0   0.0  12.0
Arr/AHB   1.0   0.0   0.0   0.1   0.0   0.0  20.9
        N11.1 N22.1 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus     1.0   0.0   0.0   0.0   0.0   0.0   4.0
Arr/AHB   1.0   0.0   0.0   0.0   0.0   0.0  24.0

The first column has the row.names. Code with the data

library("ggplot2")

data.female <- structure(list(N11.1 = structure(c(3L, 3L), .Label = c("", "0.0", 
"1.0", "N11"), class = "factor"), N22.1 = structure(c(2L, 2L), .Label = c("", 
"0.0", "2.0", "N22"), class = "factor"), N33.1 = structure(c(2L, 
2L), .Label = c("", "0.0", "N33"), class = "factor"), N44.1 = structure(2:3, .Label = c("", 
"0.0", "0.1", "0.2", "N44"), class = "factor"), N21.1 = structure(c(2L, 
2L), .Label = c("", "0.0", "N21"), class = "factor"), N31.1 = structure(c(2L, 
2L), .Label = c("", "0.0", "N31"), class = "factor"), N32.1 = structure(c(5L, 
7L), .Label = c("", "0.0", "10.8", "11.0", "12.0", "17.0", "20.9", 
"22.8", "24.0", "3.0", "4.0", "44.0", "N32"), class = "factor")), .Names = c("N11.1", 
"N22.1", "N33.1", "N44.1", "N21.1", "N31.1", "N32.1"), row.names = c("Sinus", 
"Arr/AHB"), class = "data.frame")

data.male <- structure(list(N11.1 = structure(c(3L, 3L), .Label = c("", "0.0", 
"1.0", "N11"), class = "factor"), N22.1 = structure(c(2L, 2L), .Label = c("", 
"0.0", "2.0", "N22"), class = "factor"), N33.1 = structure(c(2L, 
2L), .Label = c("", "0.0", "N33"), class = "factor"), N44.1 = structure(c(2L, 
2L), .Label = c("", "0.0", "0.1", "0.2", "N44"), class = "factor"), 
    N21.1 = structure(c(2L, 2L), .Label = c("", "0.0", "N21"), class = "factor"), 
    N31.1 = structure(c(2L, 2L), .Label = c("", "0.0", "N31"), class = "factor"), 
    N32.1 = structure(c(11L, 9L), .Label = c("", "0.0", "10.8", 
    "11.0", "12.0", "17.0", "20.9", "22.8", "24.0", "3.0", "4.0", 
    "44.0", "N32"), class = "factor")), .Names = c("N11.1", "N22.1", 
"N33.1", "N44.1", "N21.1", "N31.1", "N32.1"), row.names = c("Sinus", 
"Arr/AHB"), class = "data.frame")

Attempt for a single data row

data.female.sinus <- data.female[1:1,1:7]
print(data.female.sinus)

g <- ggplot(data.female.sinus)
g + geom_bar()
#Warning messages:
#1: In min(x, na.rm = na.rm) :
#  no non-missing arguments to min; returning Inf
#2: In max(x, na.rm = na.rm) :
#  no non-missing arguments to max; returning -Inf
#3: In min(diff(sort(x))) : no non-missing arguments to min; returning Inf
#4: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
#5: Computation failed in `stat_count()`:
#arguments imply differing number of rows: 0, 1 
#null device 

Expected output: histogram in comparison between male and female, with emphasis that Arr/AHB is the dependent variable

Testing hhh's answer

I do not understand why you cannot use the given data with column names like without column names

Sinus <- c(1,0,0,0,0,0,12)
ArrAHB <- c(1,0,0,0.1,0,0,20.9)
# Things work with this data  

Sinus <- data.female[1, 1:7]
ArrAHB <- data.female[2, 1:7]
# Things do not work with this data which has column names

Labels <- c("N11.1","N22.2","N33.1","N44.1","N21.1","N31.1","N32.1")
ID <- c("Sinus","Arr/AHB")
data.female <- data.frame(Sinus,ArrAHB,row.names=Labels)
data.female <- t(data.female)

barchart(data.female,auto.key=list(space='right'))

R: 3.3.1
OS: Debian 8.5

解决方案

Your data looks like this:

> data.female
        N11.1 N22.1 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus     1.0   0.0   0.0   0.0   0.0   0.0  12.0
Arr/AHB   1.0   0.0   0.0   0.1   0.0   0.0  20.9
> data.male
        N11.1 N22.1 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus     1.0   0.0   0.0   0.0   0.0   0.0   4.0
Arr/AHB   1.0   0.0   0.0   0.0   0.0   0.0  24.0

and you want to draw histograms of each row over multiple columns (like here) so the below demostrating.

1. Histogram for each row where Sinus and ArrAHB groups separated

You want to make a common identifier for Sinus and Arr/AHB so we create a new ID column for that. We use this method here with lattice pkg.

require(lattice)
Sinus<-c(1,0,0,0,0,0,12)
ArrAHB<-c(1,0,0,0.1,0,0,20.9)
Labels<-c("N11.1","N22.1","N33.1","N44.1","N21.1","N31.1","N32.1")
ID<-c("Sinus","Arr/AHB")
data.female<-data.frame(Sinus,ArrAHB,row.names=Labels)
data.female<-as.data.frame(t(data.female))
data.female$ID<-ID

barchart(N11.1+N22.1+N33.1+N44.1+N21.1+N31.1+N32.1 ~ ID,
         data=data.female,
         auto.key=list(space='right')
         )

and in comparison this is the chart for Man:

1.2. Your Factor data must be converted to vectors or better: read your original files directly into vectors, not factors!

Your input data is malformated as factor data, bad here, that is probably result of misusing read.csv such as missing hte flag na.strings="." or some malformated elements. More:

"Sometimes when a data frame is read directly from a file, a column you’d thought would produce a numeric vector instead produces a factor. This is caused by a non-numeric value in the column, often a missing value encoded in a special way like . or -. To remedy the situation, coerce the vector from a factor to a character vector, and then from a character to a double vector. (Be sure to check for missing values after this process.) Of course, a much better plan is to discover what caused the problem in the first place and fix that; using the na.strings argument to read.csv() is often a good place to start.*

In order to use this malformated data, the factor elements must be turnt into numeric values. The class commands reveal your mistake in reading your original data into R such that

> class(data.female$N22.1)
[1] "factor"
> as.double(as.character(data.female$N22.1))
[1] 0 0

where the as.double(as.character(...)) allows use to maniputlate the data object again correctly. So the code

require(lattice)
data.female <- structure(list(N11.1 = structure(c(3L, 3L), .Label = c("", "0.0", "1.0", "N11"), class = "factor"),
                              N22.1 = structure(c(2L, 2L), .Label = c("", "0.0", "2.0", "N22"), class = "factor"),
                              N33.1 = structure(c(2L, 2L), .Label = c("", "0.0", "N33"), class = "factor"),
                              N44.1 = structure(2:3, .Label = c("", "0.0", "0.1", "0.2", "N44"), class = "factor"),
                              N21.1 = structure(c(2L, 2L), .Label = c("", "0.0", "N21"), class = "factor"),
                              N31.1 = structure(c(2L, 2L), .Label = c("", "0.0", "N31"), class = "factor"),
                              N32.1 = structure(c(5L, 7L), .Label = c("", "0.0", "10.8", "11.0", "12.0", "17.0", "20.9", "22.8", "24.0", "3.0", "4.0", "44.0", "N32"),
                                                class = "factor")), .Names = c("N11.1", "N22.1", "N33.1", "N44.1", "N21.1", "N31.1", "N32.1"),
                         row.names = c("Sinus", "Arr/AHB"), class = "data.frame")
data.female$ID<-c("Sinus","Arr/AHB")
data.female<-as.data.frame(data.female)

f<-function(x) as.double(as.character(x))   #factors converted to vectors

barchart(f(N11.1)+f(N22.1)+f(N33.1)+f(N44.1)+f(N21.1)+f(N31.1)+f(N32.1) ~ ID,
         data=data.female,
         auto.key=list(space='right')
         )

where the function f does the conversion from factors to vectors, alas factors are special kinds of vectors with class object and attribute value, more here.

where you need to manipulate the legend yourself.

2. Barchart again showing proportions

The data input changed to readable format (not output of some CSZ file): values in N32.1 is far larger than any other data in other columns.

require(lattice)
Sinus<-c(1,0,0,0,0,0,12)
ArrAHB<-c(1,0,0,0.1,0,0,20.9)
Labels<-c("N11.1","N22.2","N33.1","N44.1","N21.1","N31.1","N32.1")
ID<-c("Sinus","Arr/AHB")
data.female<-data.frame(Sinus,ArrAHB,row.names=Labels)
data.female<-t(data.female)

barchart(data.female,auto.key=list(space='right'))

> data.female
       N11.1 N22.2 N33.1 N44.1 N21.1 N31.1 N32.1
Sinus      1     0     0   0.0     0     0  12.0
ArrAHB     1     0     0   0.1     0     0  20.9

这篇关于如何在R中的依赖数据上应用直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆