对称的小提琴样的直方图? [英] Symmetrical, violin plot-like histogram?

查看:87
本文介绍了对称的小提琴样的直方图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何制作直方图,其中每个条形的中心都位于一条公共轴上?这看起来像是具有阶梯形边缘的小提琴图.

How can I make a histogram in which the center of each bar lies along a common axis? This would look like a violin plot with step-shaped edges.

我想在莱迪思(Lattice)中进行此操作,并且不介意自定义面板功能等,但是很乐意使用基本R图形甚至ggplot2. (我还没有投入到ggplot2中,但是会在某个时候跌入谷底.)

I'd like to do this in Lattice, and don't mind customizing panel functions, etc., but would be happy to use base R graphics or even ggplot2. (I haven't yet thrown myself into ggplot2, but will take the plunge at some point.)

(为什么要这样做?我认为当数据是离散的并且出现在几个[5-50]均匀间隔的数值处时,它可能是小提琴图的有用替代品.每个bin都代表一个点当然,我可以只生成一个正常的直方图,但我认为有时同时显示箱须图和小提琴图非常有用,对于固定间隔的离散数据,对称直方图的方向与就像小提琴图一样,boxplot允许将数据的详细结构与boxplot进行比较.在这种情况下,对称直方图可能比小提琴图更具信息性.实际上,我的数据不是真正的离散数据,它只是收敛到一系列常规值附近.这使R的beanplot程序包对我的用处不大,除非我通过将值映射到最接近的常规值来对其进行归一化.)

(Why do I want to do this? I think it might be a useful replacement for a violin plot when data is discrete and occurs at a few [5-50] evenly-spaced numeric values. Each bin then represents a point. Of course, I could just generate a normal histogram. But I think that sometimes it's useful to display both a box-and-whisker plot and a violin plot. With discrete data at regular intervals, a symmetrical histogram with the same orientation as a boxplot allows comparison of the detailed structure of the data with the boxplot, just as a violin plot does. In this case the symmetrical histogram could be more informative than a violin plot. (A beanplot might be another alternative for what I just described, although in fact my data is not literally discrete--it just converges to near a series of regular values. This makes R's beanplot package less useful for me, unless I normalize the values by mapping them to the nearest regular value.))

以下是一些数据的30个观察子集,这些数据是通过基于代理的模拟生成的:

Here is a 30-observation subset of some of the data, which is generated by an agent-based simulation:

df30 <- data.frame(crime.v=c(0.2069526, 0.2063516, 0.06919754,
0.2080366, -0.06975912, 0.206277, 0.3457634, 0.2058985, 0.3428499,
0.3428159, 0.06746109, -0.07068694, 0.4826098, -0.06910966, 0.06769761,
0.2098732, 0.3482267, 0.3483602, 0.4829777, 0.06844112, 0.2093492,
0.4845478, 0.2093505, 0.3482845, 0.3459249, 0.2106339, 0.2098397,
0.4844956, 0.2108985, 0.2107984), bias=c("beast", "beast", "beast",
"beast", "beast", "beast", "beast", "beast", "beast", "beast", "beast",
"beast", "beast", "beast", "beast", "virus", "virus", "virus", "virus",
"virus", "virus", "virus", "virus", "virus", "virus", "virus", "virus",
"virus", "virus", "virus"))

可以从此链接下载一个名为df的数据框,该数据框在Rdata文件中具有600个观测值的完整集合: CVexample.rdata .

A dataframe named df with a full set of 600 observations in an Rdata file can be downloaded from this link: CVexample.rdata.

crime.v值都接近以下值之一,我将其称为焦点:

The crime.v values are all near one of the following, which I'll call foci:

[1] -0.89115386 -0.75346155 -0.61576924 -0.47807693 -0.34038463 -0.20269232 -0.06500001
[8]  0.07269230  0.21038460  0.34807691  0.48576922  0.62346153  0.76115383  0.89884614

(crime.v值实际上是13个变量的平均值,其值的范围可以从-1到1,但最终收敛到.9或-.9附近的值.13个值的平均值大约在.9或-.9附近的焦点附近.实际上,我通过检查数据确定了焦点的适当值,因为其中还涉及其他一些变化.)

(The crime.v values are actually averages of 13 variables, whose values can range from -1 to 1, but which end up converging to values which are in the neighborhood of .9 or -.9. Averages of 13 values at around .9 or -.9 are somewhat near the foci. In practice I determined appropriate values for the foci by examining the data, since there's some additional variation involved.)

可以用以下方式制作小提琴图:

A violin plot can be produced with:

require(lattice)
bwplot(crime.v ~ bias, data=df30, ylim=c(-1,1), panel=panel.violin)

如果使用较大的数据集运行此操作,您会看到生成的小提琴图之一是多峰的,而另一个则不是.但是,这似乎并未反映出两个小提琴图的数据差异.据我所知,这是由于焦点相对于情节的位置造成的伪影.我可以通过调整传递给panel.violin的density的参数来消除差异,但是更清楚地表示每个簇中有多少个点会更清楚.

If you run this with the larger dataset, you'll see that one of the violin plots produced is multimodal, while the other isn't. However, this doesn't seem to reflect a difference in the data underlying the two violin plots; it's an artifact due to the locations of the foci in relation to the plot, as far as I can tell. I can smooth away the difference by tweaking the parameters of density passed to panel.violin, but it would be clearer to just represent how many points there are in each cluster.

谢谢!

推荐答案

以下是使用基本图形的一种可能性:

Here is one possibility using base graphics:

tmp <- tapply( iris$Petal.Length, iris$Species, function(x) hist(x, plot=FALSE) )

plot.new()
tmp.r <- do.call( range, lapply(tmp, `[[`, 'breaks') )
plot.window(xlim=c(1/2,length(tmp)+1/2), ylim=tmp.r)
abline(v=seq_along(tmp))

for( i in seq_along(tmp) ) {
    h <- tmp[[i]]
    rf <- h$counts/sum(h$counts)
    rect( i-rf/2, head(h$breaks, -1), i+rf/2, tail(h$breaks, -1) )
}

axis(1, at=seq_along(tmp), labels=names(tmp))
axis(2)
box()

您可以根据自己的喜好调整不同的部分,整个过程可以很容易地包装到一个函数中.

You can tweak the different parts to your preferences and the whole thing could easily be wrapped into a function.

这篇关于对称的小提琴样的直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆