处理numpy或pandas中的巨大数字 [英] Handling HUGE numbers in numpy or pandas

查看:214
本文介绍了处理numpy或pandas中的巨大数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在参加一项比赛,向我提供匿名数据.相当多的列具有HUGE值.最大的是40位数字!我使用了pd.read_csv,但结果这些列已被转换为对象.

I am doing a competition where I am provided data that is anonymized. Quite a few of the columns have HUGE values. The largest was 40 digits long! I used pd.read_csv but those columns have been converted to objects as a result.

我最初的计划是按比例缩小数据,但是由于它们被视为对象,因此我无法对此进行算术运算.

My original plan was to scale the data down but since they are seen as objects I can't do arithmetic on these.

有人对如何处理Pandas或Numpy中的大量数字提出建议吗?

Does anyone have a suggestion on how to handle huge numbers in Pandas or Numpy?

请注意,我尝试将值转换为uint64时没有运气.我收到错误消息:长度太大,无法转换"

Note that I've tried converting the value to a uint64 with no luck. I get the error "long too big to convert"

推荐答案

您可以使用熊猫

You can use Pandas converters to call int or some other custom converter function on the string as they are being imported:

import pandas as pd 
from StringIO import StringIO

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,1,"Tiny"
4,-9999999999999999999999999999999999999999,"Really negative"
'''

df=pd.read_csv(StringIO(txt), converters={'Big_Num':int})

print df

打印:

   line                                    Big_Num                           text
0     1   1234567890123456789012345678901234567890      That sure is a big number
1     2   9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                          1                           Tiny
3     4  -9999999999999999999999999999999999999999                Really negative

现在的测试算法:

n=df["Big_Num"][1]
print n,n+1 

打印:

9999999999999999999999999999999999999999 10000000000000000000000000000000000000000

如果该列中有任何可能导致int崩溃的值,则可以执行以下操作:

If you have any values in the column that might cause int to croak, you can do this:

txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,0.000000000000000001,"Tiny"
4,"a string","Use 0 for strings"
'''

def conv(s):
    try:
        return int(s)
    except ValueError:
        try:
            return float(s)
        except ValueError:
            return 0        

df=pd.read_csv(StringIO(txt), converters={'Big_Num':conv})
print df

打印:

   line                                   Big_Num                           text
0     1  1234567890123456789012345678901234567890      That sure is a big number
1     2  9999999999999999999999999999999999999999  That is an even BIGGER number
2     3                                     1e-18                           Tiny
3     4                                         0              Use 0 for strings

然后,列中的每个值都将是Python int或float并将支持算术运算.

Then every value in the column will be either a Python int or a float and will support arithmetic.

这篇关于处理numpy或pandas中的巨大数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆