处理numpy或pandas中的巨大数字 [英] Handling HUGE numbers in numpy or pandas
问题描述
我正在参加一项比赛,向我提供匿名数据.相当多的列具有HUGE值.最大的是40位数字!我使用了pd.read_csv
,但结果这些列已被转换为对象.
I am doing a competition where I am provided data that is anonymized. Quite a few of the columns have HUGE values. The largest was 40 digits long! I used pd.read_csv
but those columns have been converted to objects as a result.
我最初的计划是按比例缩小数据,但是由于它们被视为对象,因此我无法对此进行算术运算.
My original plan was to scale the data down but since they are seen as objects I can't do arithmetic on these.
有人对如何处理Pandas或Numpy中的大量数字提出建议吗?
Does anyone have a suggestion on how to handle huge numbers in Pandas or Numpy?
请注意,我尝试将值转换为uint64
时没有运气.我收到错误消息:长度太大,无法转换"
Note that I've tried converting the value to a uint64
with no luck. I get the error "long too big to convert"
推荐答案
You can use Pandas converters to call int
or some other custom converter function on the string as they are being imported:
import pandas as pd
from StringIO import StringIO
txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,1,"Tiny"
4,-9999999999999999999999999999999999999999,"Really negative"
'''
df=pd.read_csv(StringIO(txt), converters={'Big_Num':int})
print df
打印:
line Big_Num text
0 1 1234567890123456789012345678901234567890 That sure is a big number
1 2 9999999999999999999999999999999999999999 That is an even BIGGER number
2 3 1 Tiny
3 4 -9999999999999999999999999999999999999999 Really negative
现在的测试算法:
n=df["Big_Num"][1]
print n,n+1
打印:
9999999999999999999999999999999999999999 10000000000000000000000000000000000000000
如果该列中有任何可能导致int
崩溃的值,则可以执行以下操作:
If you have any values in the column that might cause int
to croak, you can do this:
txt='''\
line,Big_Num,text
1,1234567890123456789012345678901234567890,"That sure is a big number"
2,9999999999999999999999999999999999999999,"That is an even BIGGER number"
3,0.000000000000000001,"Tiny"
4,"a string","Use 0 for strings"
'''
def conv(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
return 0
df=pd.read_csv(StringIO(txt), converters={'Big_Num':conv})
print df
打印:
line Big_Num text
0 1 1234567890123456789012345678901234567890 That sure is a big number
1 2 9999999999999999999999999999999999999999 That is an even BIGGER number
2 3 1e-18 Tiny
3 4 0 Use 0 for strings
然后,列中的每个值都将是Python int或float并将支持算术运算.
Then every value in the column will be either a Python int or a float and will support arithmetic.
这篇关于处理numpy或pandas中的巨大数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!