在R中创建直方图时正确的参数有什么作用? [英] What does the right parameter do when creating a histogram in R?

查看:113
本文介绍了在R中创建直方图时正确的参数有什么作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚R中的hist函数中正确的参数是做什么的.不幸的是,对于那些不了解统计数据的人(例如我自己),该文档尚不清楚.

I am trying to figure out what the right parameter in the hist function in R does. The documentation is unfortunately unclear to someone without a deep understanding of statistics such as myself.

在线声明的文档是:

正确的逻辑;如果为TRUE,则直方图单元格为右关闭(左打开)间隔.

right logical; if TRUE, the histograms cells are right-closed (left open) intervals.

右关闭(或左打开)间隔是什么意思?

What does it mean to be right-closed (or left open) intervals?

推荐答案

在创建非分类数据的直方图时(例如pH值,温度等),您需要指定"bins".每个垃圾箱都为其指定了一个称为间隔的东西.例如,如果我有数据:

When creating histograms of non-categorial data (things like pH, temperature, etc.), you need to specify things called "bins". Each bin has something called an interval specified for it. For example, if I have the data:

11  12  13  14  15  16  17  18  19

我可以按左右打开的间隔创建5个垃圾箱,如下所示:

I can create 5 bins with right-open, left-closed intervals like this:

1st bin: [10, 12)
2nd bin: [12, 14)
3rd bin: [14, 16)
4th bin: [16, 18)
5th bin: [18, 20)

这意味着第一个bin将保存"介于10和12之间的值,包括10但不包括12 .上面使用的间隔符号是此的简写:

What this means is that the first bin will "hold" values between 10 and 12, including 10 but not including 12. The interval notation used above is shorthand for this:

1st bin: 10 ≤ x < 12
2nd bin: 12 ≤ x < 14
3rd bin: 14 ≤ x < 16
4th bin: 16 ≤ x < 18
5th bin: 18 ≤ x < 20

因此,这意味着值11将进入第一个仓位,但值12将进入第二个仓位,依此类推.R将为您进行此仓位过程,然后根据每个仓位中有多少个物品绘制直方图.对于以上数据,您会得到一个不太有趣的(或有趣的,取决于您的期望)直方图,除了第一个bin以外,其他大部分时间都是平坦的.

So that means the values 11 will go into the 1st bin, but the value 12 will go into the second bin, etc. R will do this binning process for you then draw the histogram based on how many items are in each bin. For the above data, you'll get a rather not-interesting (or interesting, depending on your expectations) histogram that is mostly flat except at the first bin.

以下示例说明了使用间隔表示法时括号和括号的不同组合的含义(假设x是实数行的元素):

The following examples illustrate what the different combinations of brackets and parentheses mean when using interval notation (assume x is an element of the real number line):

(1, 4) --> 1 < x < 4    left-open, right-open
[3, 7) --> 3 ≤ x < 7    left-closed, right-open
(2, 9] --> 2 < x ≤ 9    left-open, right-closed
[5, 6] --> 5 ≤ x ≤ 6    left-closed, right-closed

请注意,假设您未使用扩展的实数行,则不能将方括号用于无穷大

Note that you can't use brackets for infinities, assuming you're not using the extended real number line

(-∞, ∞)   -->   -∞ < x < ∞ 
(-∞, 20]  -->   -∞ < x ≤ 20 
[20, ∞)   -->   20 ≤ x < ∞
(1000, ∞) --> 1000 < x < ∞
(-∞, ∞]   -->   Invalid
(41, ∞]   -->   Invalid

如果我想要左开,右闭的时间间隔,则垃圾箱将如下所示:

If I want left-open, right-closed intervals, then the bins would look like this:

1st bin: (10, 12] i.e. 10 < x ≤ 12
2nd bin: (12, 14]      12 < x ≤ 14
3rd bin: (14, 16]      14 < x ≤ 16
4th bin: (16, 18]      16 < x ≤ 18
5th bin: (18, 20]      18 < x ≤ 20

看到区别了吗?在这种情况下,现在值11和12将进入第一个bin.这可能会改变直方图的外观,具体取决于您对数据进行分箱的方式.现在,这次您的直方图仍几乎是平坦的,但是现在第5个bin与其余的不一样(只有1个数据点,而不是其余2个).

See the difference? In this case, now values 11, and 12 will go into the first bin. This may change in the appearance of the histogram depending on how you bin the data. Now, this time your histogram is still almost flat but now the 5th bin is different from the rest (only 1 data point instead of 2 for the rest).

现在,幸运的是,在R中您不必自己指定垃圾箱,但是R足以询问您是否要将垃圾箱向左关闭,向右打开([a, b))或向左打开右关闭((a, b]).这就是hist()函数中正确"参数的区别.

Now, fortunately in R you don't have to specify the bins yourself, but R is nice enough to ask you whether you want the bins to be left-closed, right-open ([a, b)) or left-open, right-closed ((a, b]). That's the difference you get w.r.t the "right" parameter does in the hist() function.

这篇关于在R中创建直方图时正确的参数有什么作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆