ipython pandas TypeError:read_csv()得到了一个意外的关键字参数'delim-whitespace'' [英] ipython pandas TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace''
问题描述
在尝试ipython.org笔记本时,PYTHON FOR DATA MINING介绍
以下代码:
data = pd.read_csv(http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original ,
delim_whitespace = True,header = None,
names = ['mpg','cylinder','displacement','horsepower','weight','acceleration',
' model','origin','car_name'])
产生以下错误:
TypeError:read_csv()得到一个意外的关键字参数'delim-whitespace'
不幸的是,数据集文件本身并不是真正的csv,我不知道为什么他们使用read_csv()来获取数据。
数据如下所示:
14.0 8. 454.0 220.0 4354. 9.0 70. 1.雪佛兰黑斑羚
环境是Debian稳定的python / 2.7,带有ipython 0.13。
在这里搜索之后,我意识到它很可能是版本问题,
作为参数'delim-whitespace'可能在pandas库的更高版本中,而不是APT包管理器可用的版本。 / p>
我尝试了几种解决方法,但没有成功。
-
首先,我尝试通过构建最新的源代码来升级pandas,但我发现我最终会得到一系列其他版本的依赖项,其版本需要升级并最终可能破坏环境。例如,我必须安装Cython,然后它报告它在APT包管理器上再次是
a版本太旧了,所以我将不得不重建Cython,+其他库/模块等等。 -
然后看了一下API后,我尝试使用其他参数:
在调用read_csv()时使用delimiter =''导致
将引号内的字符串分成几列,ValueError:期望9列,第13行得到13
-
我尝试使用
read_csv()
参数quotechar =''
,如API中所述,但同样无法识别(意外的关键字参数) -
最后我尝试使用不同的方式加载文件,
data = DataFrame()
data.from_csv(url)
我有,
Out [18]:
< class'pandas.core.frame.DataFrame'>
索引:405条目,15.0 8. 350.0 165.0 3693. 11.5 70. 1.buick skylark 320至31.0 4. 119.0 82.00 2720. 19.4 82. 1.chevy s-10
Empty DataFrame
在[19]中:print(data.shape)
(0,9)
-
或者,与from_csv()的w / sep参数,
在[20]中:data.from_csv(url, sep ='')
产生错误,
ValueError:期待31列,第1行得到35
在[21]中:print(data.shape)
(0,9)
-
另外,使用相同的否定结果:
在[32]中:data = DataFrame(columns = ['mpg','cylinder','displacement','horsepower','weight','acceleration','model ','origin','car_name'])
在[33]中:data.from_csv(url,sep =',\ t')Out [33]:
< class'pandas.core.frame.DataFrame'>
指数:405条,15.0 8. 350.0 165.0 3693. 11.5 70. 1.buick skylark 320至31.0 4. 119.0 82.00 2720. 19.4 82. 1.chevy s-10
Empty DataFrame
在[34]中:data.head()
Out [34]:
空DataFrame
-
我尝试使用ipython3代替
,但它无法找到/加载matplotlib,因为对于我的
系统,python3没有matplotlib。
非常感谢任何有关此问题的帮助。
您的代码使用 delim_whitespace
,但错误消息显示 delim-whitespace
。前者存在,后者不存在。
如果数据文件包含
14.0 8. 454.0 220.0 4354. 9.0 70. 1.雪佛兰黑斑羚
和你一起使用
定义数据
data = pd.read_csv('data', delim_whitespace = True,header = None,names = ['mpg','cylinder','displacement','horsepower','weight','acceleration','model','origin','car_name'])
然后DataFrame成功解析:
mpg气缸排量马力重量加速模型\
0 14 8 454 220 4354 9 70
原产地car_name
0 1雪佛兰impala
所以你只需将连字符更改为下划线。
请注意,当y ou指定 delim_whitespace = True
,使用纯Python解析器。在这种情况下,我认为没有必要。使用 delimiter = r'\ +'
正如史蒂夫霍华德所暗示的那样可能表现更好。 (源代码说,The C引擎更快,而python引擎是
目前功能更完整,但我想 C引擎没有的唯一特性是 skipfooter
。)
While trying the ipython.org notebook, "INTRODUCTION TO PYTHON FOR DATA MINING"
The following code:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original",
delim_whitespace = True, header=None,
names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model', 'origin', 'car_name'])
yields the following error:
TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'
Unfortunately the dataset file itself is not really csv, and I don't know why they used read_csv() to get its data.
The data looks like this line:
14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala"
The environment is python/2.7 on Debian stable w/ ipython 0.13. After searching here, I realize it's mostly likely a version problem, as the argument 'delim-whitespace' maybe in a later version of the pandas library, than the one available to the APT package manager.
I tried several workarounds, without success.
First, I tried to upgrade pandas, by building from latest source, but i found i would end up with a cascade of other builds of dependencies whose versions need upgrading and could end up breaking the environment. E.g., I had to install Cython, then it reported it was again a version too old on the APT package manager, so I would have to rebuild Cython, + other libs/modules and so on.
Then after looking at the API a bit, I tried using other arguments: using delimiter = ' ' in the call to read_csv() caused it to break up the strings inside quotes into several columns,
ValueError: Expecting 9 columns, got 13 in row 0
I tried using the
read_csv()
argumentquotechar='"'
, as documented in the API but again it was not recognized (unexpected keyword argument)Finally I tried using a different way to load the file,
data = DataFrame() data.from_csv(url)
I got,
Out[18]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [19]: print(data.shape) (0, 9)
alternatively, w/ sep argument to from_csv(),
In [20]: data.from_csv(url,sep=' ')
yields the error,
ValueError: Expecting 31 columns, got 35 in row 1 In [21]: print(data.shape) (0, 9)
Also alternatively, with the same negative result:
In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name']) In [33]: data.from_csv(url,sep=', \t')Out[33]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [34]: data.head() Out[34]: Empty DataFrame
I tried using ipython3 instead, but it cannot find/load matplotlib as there is not matplotlib for python3 for my system.
Any help with this problem would be greatly appreciated.
Your code uses delim_whitespace
but the error message says delim-whitespace
. The former exists, the latter does not.
If the data file contains
14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala"
and you define data
with
data = pd.read_csv('data', delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])
then the DataFrame does get parsed successfully:
mpg cylinders displacement horsepower weight acceleration model \
0 14 8 454 220 4354 9 70
origin car_name
0 1 chevrolet impala
So you just have change the hyphen to an underscore.
Note that when you specify delim_whitespace=True
, the pure Python parser is used. In this case I don't think that is necessary. Using delimiter=r'\s+'
as Steve Howard suggests would probably perform better. (The source code says, "The C engine is faster while the python engine is
currently more feature-complete", but I think the only feature that the python engine has that the C engine does not is skipfooter
.)
这篇关于ipython pandas TypeError:read_csv()得到了一个意外的关键字参数'delim-whitespace''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!