Python Pandas:增加最大行数 [英] Python Pandas: Increase Maximum Number of Rows

查看:316
本文介绍了Python Pandas:增加最大行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个大型文本文件(500k行),格式如下:

I am processing a large text file (500k lines), formatted as below:

S1_A16
0.141,0.009340221649748676
0.141,4.192618196894668E-5
0.11,0.014122135626540204
S1_A17
0.188,2.3292323316081486E-6
0.469,0.007928706856794138
0.172,3.726771730573038E-5

我正在使用以下代码返回每个系列的相关系数,例如S!_A16:

I'm using the code below to return the correlation coefficients of each series, e.g. S!_A16:

import numpy as np
import pandas as pd
import csv
pd.options.display.max_rows = None
fileName = 'wordUnigramPauseTEST.data'

df = pd.read_csv(fileName, names=['pause', 'probability'])
mask = df['pause'].str.match('^S\d+_A\d+')
df['S/A'] = (df['pause']
              .where(mask, np.nan)
              .fillna(method='ffill'))
df = df.loc[~mask]

result = df.groupby(['S/A']).apply(lambda grp: grp['pause'].corr(grp['probability']))
print(result)

但是,在某些大文件上,这将返回错误:

However, on some large files, this returns the error:

Traceback (most recent call last):
  File "/Users/adamg/PycharmProjects/Subj_AnswerCorrCoef/GetCorrCoef.py", line 15, in <module>
    print(result)
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 35, in __str__
    return self.__bytes__()
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/base.py", line 47, in __bytes__
    return self.__unicode__().encode(encoding, 'replace')
  File "/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 857, in __unicode__
    result = self._tidy_repr(min(30, max_rows - 4))
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

我知道这与print语句有关,但是我该如何解决?

I understand that this is related to the print statement, but how do I fix it?

编辑: 这与最大行数有关.有谁知道如何容纳更多的行?

EDIT: This is related to the maximum number of rows. Does anyone know how to accommodate a greater number of rows?

推荐答案

错误消息:

TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

表示None减去int是TypeError.如果您查看回溯中的倒数第二行,您会发现唯一的减法是

is saying None minus an int is a TypeError. If you look at the next-to-last line in the traceback you see that the only subtraction going on there is

max_rows - 4

因此,max_rows必须为None.如果您进入857行附近的/Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py并问自己max_rows最终如何等于None,您会以某种方式看到

So max_rows must be None. If you dive into /Users/adamg/anaconda/lib/python2.7/site-packages/pandas/core/series.py, near line 857 and ask yourself how max_rows could end up being equal to None, you'll see that somehow

get_option("display.max_rows")

必须返回None.

这部分代码调用_tidy_repr,该摘要用于总结系列. None是要让熊猫显示Series all 行时设置的正确值. 因此,当max_rows为None时,不应到达代码的这一部分.

This part of the code is calling _tidy_repr which is used to summarize the Series. None is the correct value to set when you want pandas to display all lines of the Series. So this part of the code should not have been reached when max_rows is None.

我已经提出了拉动请求来纠正此问题.

I've made a pull request to correct this.

这篇关于Python Pandas:增加最大行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆