说明样品限制的影响:简化生产条形图的方法 [英] Illustrating the impacts of sample restrictions: Simplifying the way to produce a barplot
问题描述
我想用ID来说明在条形图中连续应用各种(逐渐减少限制的)样本限制的样本大小的影响:
I'm trying to illustrate the effects, by ID, on sample size of successively applying various (decreasingly restrictive) sample restrictions in a bar plot that looks something like this:
蓝色条是所有5个限制之后剩下的;金条显示最小限制条件的影响;弹簧绿色条显示第二至少限制条件的影响;
The blue bar is what remains after all 5 restrictions are placed; the gold bar shows the impact of the least restrictive condition; the spring green bar shows the impact of the second-least restrictive condition; and so forth.
以下是一些示例数据:
library(data.table)
set.seed(8195)
dt<-data.table(id=rep(1:5,each=2e3),flag1=!!runif(1e4)>.76,
flag2=!!runif(1e4)>.88,flag3=!!runif(1e4)>.90,
flag4=!!runif(1e4)>.95,flag5=!!runif(1e4)>.99)
1)它是相当冗长和2)它不打击我非常鲁棒/可通用性。任何人都有一些经验生产这样的东西,可以在这些前沿提供一些改进?我有一种感觉这种类型的图应该是很常见的数据分析,所以我有点惊讶,没有一个特殊的功能。
The code I'm using so far leaves something to be desired-- 1) it's rather verbose and 2) it doesn't strike me as very robust/generalizable. Does anyone have some experience producing something like this that can offer some improvements on either of these fronts? I have a feeling this type of graph should be pretty common in data analysis, so I'm sort of surprised there's not a special function for it.
到目前为止:
dt[order(-id)][,
#to find out how many observations are lost by
# applying flag 1 (we keep un-flagged obs.),
# look at the count of indices before and
# after applying flag 1
{l1<-!flag1;i1<-.I[l1];n1<-length(.I)-length(i1);
#to find the impact of flag 2, we apply flag 2
# _in addition to_ flag 1--the observations
# we keep have _neither_ flag 1 _nor_ flag 2;
# the impact is measured by the number of
# observations lost by applying this flag
# (that weren't already lost from flag 1)
l2<-l1&!flag2;i2<-.I[l2];n2<-length(i1)-length(i2);
l3<-l2&!flag3;i3<-.I[l3];n3<-length(i2)-length(i3);
l4<-l3&!flag4;i4<-.I[l4];n4<-length(i3)-length(i4);
l5<-l4&!flag5;i5<-.I[l5];n5<-length(i4)-length(i5);
#finally, the observations we keep have _none_
# of flags 1-5 applied
n6<-length(i5);
c(n6,n5,n4,n3,n2,n1)},by=id
][,{barplot(matrix(V1,ncol=uniqueN(id)),
horiz=T,col=c("blue","gold","springgreen",
"orange","orchid","red"),
names.arg=paste("ID: ",uniqueN(id):1),
las=1,main=paste0("Impact of Sample Restrictions",
"\nBy ID"),
xlab="Count",plot=T)}]
不漂亮。感谢您的输入。
Not pretty. Thanks for your input.
推荐答案
正如@Frank指出的,如果所有这些连续的标志都转换为分类变量取,例如,蓝色条为1,金条为2,春天绿色条为3等。
As @Frank pointed out, this is much simpler if all these successive flags are converted to a categorical variable taking, say, 1 for the blue bars, 2 for the gold bars, 3 for the spring green bars, and so on.
正如@Frank也指出, max.col
为我们提供了一种创建变量的方便方法,可以快速获取这些值:
As @Frank also pointed out, max.col
offers us a convenient way of creating a variable that takes exactly those values, and quickly:
dt[,categ2:=max.col(cbind(.5,.SD),ties.method="last"),
.SDcols=paste0("flag",5:1)]
(这里发生了什么? max.col $因为
ties.method =last
- c $ c>正在处理标志的递归性质> TRUE
每个列中的值;如果所有标志都 FALSE
,第一列是最大的,因为它总是.5,大于0 。查看此表:)
(What's happening here? max.col
is taking care of the recursive nature of the flags for us my assigning the rightmost--because ties.method="last"
--TRUE
value in each column; if all flags are FALSE
, the first column is largest because it is always .5, which is greater than 0. Check out this table:)
0 1 2 3 4 5
.5 F F F F F # No flags apply, so column 0 wins
.5 T F T F F # Flags 1 & 3 true--3 is the binding condition--
# Once Flag 5 is applied, it no longer matters
# which of the subsequent flags may or may not apply.
如此定义的 categ
cinch:
With categ
thus defined, graphic becomes a cinch:
dt[,barplot(table(categ,id))]
将工作。要获得所有的响铃和口哨:
Will work. To get all the bells and whistles:
dt[,barplot(table(categ,id)[,5:1],horiz=T,
col=c("blue","gold","springgreen",
"orange","orchid","red"),
names.arg=paste("ID: ",uniqueN(id):1),
las=1,main=paste0("Impact of Sample Restrictions",
"\nBy ID"),
xlab="Count",plot=T)]
这篇关于说明样品限制的影响:简化生产条形图的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!