Python Pandas:增加最大行数 [英] Python Pandas: Increase Maximum Number of Rows
问题描述
我正在处理一个大型文本文件(500k行),格式如下:
I am processing a large text file (500k lines), formatted as below:
S1_A16
0.141,0.009340221649748676
0.141,4.192618196894668E-5
0.11,0.014122135626540204
S1_A17
0.188,2.3292323316081486E-6
0.469,0.007928706856794138
0.172,3.726771730573038E-5
我正在使用以下代码返回每个系列的相关系数,例如S!_A16:
I'm using the code below to return the correlation coefficients of each series, e.g. S!_A16:
import numpy as np
import pandas as pd
import csv
pd.options.display.max_rows = None
fileName = 'wordUnigramPauseTEST.data'
df = pd.read_csv(fileName, names=['pause', 'probability'])
mask = df['pause'].str.match('^S\d+_A\d+')
df['S/A'] = (df['pause']
.where(mask, np.nan)
.fillna(method='ffill'))
df = df.loc[~mask]
result = df.groupby(['S/A']).apply(lambda grp: grp['pause'].corr(grp['probability']))
print(result)
但是,在某些大文件上,这将返回错误:
However, on some large files, this returns the error:
Traceback (most recent call last):
File "/Users/adamg/PycharmProjects/Subj_AnswerCorrCoef/GetCorrCoef.py", line 15, in <module>
print(result)
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 35, in __str__
return self.__bytes__()
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 47, in __bytes__
return self.__unicode__().encode(encoding, 'replace')
File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 857, in __unicode__
result = self._tidy_repr(min(30, max_rows - 4))
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
我知道这与print
语句有关,但是我该如何解决?
I understand that this is related to the print
statement, but how do I fix it?
编辑: 这与最大行数有关.有谁知道如何容纳更多的行?
EDIT: This is related to the maximum number of rows. Does anyone know how to accommodate a greater number of rows?
推荐答案
错误消息:
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'
表示None
减去int
是TypeError.如果您查看回溯中的倒数第二行,您会发现唯一的减法是
is saying None
minus an int
is a TypeError. If you look at the next-to-last line in the traceback you see that the only subtraction going on there is
max_rows - 4
因此,max_rows
必须为None
.如果您进入857行附近的/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py
并问自己max_rows
最终如何等于None
,您会以某种方式看到
So max_rows
must be None
. If you dive into /Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py
, near line 857 and ask yourself how max_rows
could end up being equal to None
, you'll see that somehow
get_option("display.max_rows")
必须返回None
.
这部分代码调用_tidy_repr
,该摘要用于总结系列. None
是要让熊猫显示Series
的 all 行时设置的正确值.
因此,当max_rows
为None时,不应到达代码的这一部分.
This part of the code is calling _tidy_repr
which is used to summarize the Series. None
is the correct value to set when you want pandas to display all lines of the Series
.
So this part of the code should not have been reached when max_rows
is None.
我已经提出了拉动请求来纠正此问题.
I've made a pull request to correct this.
这篇关于Python Pandas:增加最大行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!