R中的整数溢出是什么?它怎么会发生? [英] What is integer overflow in R and how can it happen?

查看:1300
本文介绍了R中的整数溢出是什么?它怎么会发生?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些计算正在进行并得到以下警告(即没有错误):

I have some calculation going on and get the following warning (i.e. not an error):

Warning messages:
1: In sum(myvar, na.rm = T) :
Integer overflow - use sum(as.numeric(.))

在这个线程中声明整数溢出根本不会发生。 R要么不过于现代,要么不对。但是,我应该在这做什么?如果我使用 as.numeric 作为警告提示我可能不会解释信息丢失的事实。 myvar 是从.csv文件中读取的,所以不应该知道需要更大的字段吗?它已经切断了什么吗?

In this thread people state that integer overflows simply don't happen. Either R isn't overly modern or they are not right. However, what am I supposed to do here? If I use as.numeric as the warning suggests I might not account for the fact that information is lost way before. myvar is read form a .csv file, so shouldn't R figure out that some bigger field is needed? Does it already cut off something?

整数数字的最大长度是多少?你会建议任何其他字段类型/模式吗?

What's the max length of integer or numeric? Would you suggest any other field type / mode?

编辑:我跑:

R版本2.13.2(2011-09-30)
平台:R Studio中的x86_64-apple-darwin9.8.0 / x86_64(64位)

R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) within R Studio

推荐答案

你可以回答很多通过阅读帮助页面?整数来解决您的问题。它说:

You can answer many of your questions by reading the help page ?integer. It says:


R对整数向量使用32位整数,因此
可表示整数的范围限制在大约+ / -2 * 10 ^ 9。

R uses 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9.

R Core正在考虑扩展到更大的整数,但它不会发生在附近未来。

Expanding to larger integers is under consideration by R Core but it's not going to happen in the near future.

如果你想要一个bignum容量,那么安装Martin Maechler的 Rmpfr package [PDF]。由于其作者的声誉,我推荐'Rmpfr'包。 Martin Maechler也积极参与Matrix包开发,也参与R Core。还有其他选择,包括算术包,如'gmp','Brobdingnag'和'Ryacas'包(后者也提供符号数学界面)。

If you want a "bignum" capacity then install Martin Maechler's Rmpfr package [PDF]. I recommend the 'Rmpfr' package because of its author's reputation. Martin Maechler is also heavily involved with the Matrix package development, and in R Core as well. There are alternatives, including arithmetic packages such as 'gmp', 'Brobdingnag' and 'Ryacas' package (the latter also offers a symbolic math interface).

接下来,到回答你所链接的答案中的批评性评论,以及如何评估与你的工作的相关性,请考虑这一点:如果R中有一种现代语言可用的统计功能相同,你可能会看到该方向的用户迁移。但我要说的是,此时的迁移,当然还有增长,正处于R方向。 R由统计人员建立,用于统计。

Next, to respond to the critical comments in the answer you linked to, and how to assess the relevance to your work, consider this: If there were the same statistical functionality available in one of those "modern" languages as there is in R, you would probably see a user migration in that direction. But I would say that migration, and certainly growth, is in the R direction at the moment. R was built by statisticians for statistics.

曾经有一个带有统计软件包的Lisp变种Xlisp-Stat,但它的主要开发者和支持者现在是R-Core的成员。另一方面,最早的R开发者之一,Ross Ihaka, 建议使用类似Lisp的语言开发 [PDF]。有一种名为Clojure的编译语言(发音为英语使用者称为闭包),带有实验界面,Rincanter。

There was at one time a Lisp variant with a statistics package, Xlisp-Stat, but its main developer and proponent is now a member of R-Core. On the other hand one of the earliest R developers, Ross Ihaka, suggests working toward development in a Lisp-like language [PDF]. There is a compiled language called Clojure (pronounced as English speakers would say "closure") with an experimental interface, Rincanter.

新版本的R(3.0。+)有53位整数(使用数字尾数)。当为整数向量元素分配超过'.Machine $ integer.max'的值时,整个向量被强制为数字,a.k.a。double。 整数的最大值保持不变,但是,可能会强制整数向量加倍,以便在以前产生溢出的情况下保持准确性。不幸的是,列表,矩阵和数组维度以及向量的长度仍设置为 integer.max

The new versions of R (3.0.+) has 53 bit integers of a sort (using the numeric mantissa). When an "integer" vector element is assigned a value in excess of '.Machine$integer.max', the entire vector is coerced to "numeric", a.k.a. "double". Maximum value for integers remains as it was, however, there may be coercion of integer vectors to doubles to preserve accuracy in cases that would formerly generate overflow. Unfortunately, the length of lists, matrix and array dimensions, and vectors is still set at integer.max.

从文件中读取大值时,使用character-class作为目标然后进行操作可能更安全。如果存在对NA值的强制,则会发出警告。

When reading in large values from files, it is probably safer to use character-class as the target and then manipulate. If there is coercion to NA values, there will be a warning.

这篇关于R中的整数溢出是什么?它怎么会发生?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆