如何在R中花费最少的精力为数据集中的所有变量创建直方图? [英] How can I create a histogram for all variables in a data set with minimal effort in R?

查看:84
本文介绍了如何在R中花费最少的精力为数据集中的所有变量创建直方图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

探索新的数据集:可视化许多(所有)变量的最简单,最快的方法是什么?

Exploring a new data set: What is the easiest, quickest way to visualise many (all) variables?

理想情况下,输出以最小的杂波和最大的信息来显示彼此相邻的直方图.这个问题的关键是处理大量不同数据集的灵活性和稳定性.我使用的是RStudio,通常会处理大而凌乱的调查数据.

Ideally, the output shows the histograms next to each other with minimal clutter and maximum information. Key to this question is flexibility and stability to deal with large and different data sets. I'm using RStudio and usually deal with large and messy survey data.

Hmisc的包装盒中取出的一个很好的例子是:

One example which comes out of the box of Hmisc and works quite well here is:

library(ggplot2)
str(mpg)

library(Hmisc)
hist.data.frame(mpg)

不幸的是,我在其他地方遇到了数据标签问题(plot.new()中的错误:图边距太大).对于比mpg大的数据集,它也崩溃了,我还没有弄清楚如何控制装箱.而且,我更喜欢ggplot2中的灵活解决方案.请注意,我刚开始学习R,并习惯了商业软件提供的舒适解决方案.

Unfortunately, somewhere else I run into problems with data lables (Error in plot.new() : figure margins too large). It also crashed for a larger data set than mpg and I haven't figured out how to control binning. Moreover, I'd prefer a flexible solution in ggplot2. Note that I just started learning R and am used to the comfortable solutions provided by commercial software.

有关此主题的更多问题:

More questions on this topic:

R直方图-变量太多

...?

推荐答案

可能有三种广泛的方法:

There may be three broad approaches:

  1. 来自诸如hist.data.frame()
  2. 之类的程序包的命令
  3. 遍历变量或类似的宏结构
  4. 堆叠变量并使用构面
  1. Commands from packages such as hist.data.frame()
  2. Looping over variables or similar macro constructs
  3. Stacking variables and using facets

包装

其他可能有用的命令:

library(plyr)
library(psych)
multi.hist(mpg) #error, not numeric
multi.hist(mpg[,sapply(mpg, is.numeric)])

plotrix中的multhist,我没有探讨过.他们两个都不提供我一直在寻找的灵活性.

or perhaps multhist from plotrix, which I haven't explored. Both of them do not offer the flexibilty I was looking for.

循环

作为R初学者,每个人都建议我远离循环.我做到了,但也许值得在这里尝试.任何建议都非常欢迎.也许您可以评论如何将图形组合到一个文件中.

As an R beginner everyone advised me to stay away from loops. So I did, but perhaps it is worth a try here. Any suggestions are very welcome. Perhaps you could comment on how to combine the graphs into one file.

堆叠

我的第一个怀疑是堆叠变量可能会失控.但是,对于合理的变量集来说,这可能是最好的策略.

My first suspicion was that stacking variables might get out of hand. However, it might be the best strategy for a reasonable set of variables.

我想到的一个示例使用melt函数.

One example I came up with uses the melt function.

library(reshape2)
mpgid <- mutate(mpg, id=as.numeric(rownames(mpg)))
mpgstack <- melt(mpgid, id="id")
pp <- qplot(value, data=mpgstack) + facet_wrap(~variable, scales="free")
# pp + stat_bin(geom="text", aes(label=..count.., vjust=-1))
ggsave("mpg-histograms.pdf", pp, scale=2)

(如您所见,我试图在条形图上放置值标签以获取更多的信息密度,但效果并不理想.x轴上的标签也不理想.)

(As you can see I tried to put value labels on the bars for more information density, but that didn't go so well. The labels on the x-axis are also less than ideal.)

这里没有完美的解决方案,不会有一个一刀切"的命令.但是也许我们可以更加轻松地探索新的数据集.

No solution here is perfect and there won't be a one-size-fits-all command. But perhaps we can get closer to ease exploring a new data set.

这篇关于如何在R中花费最少的精力为数据集中的所有变量创建直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆