Julia:数据框包难以转换包含int和float的列 [英] Julia : Dataframes packages having trouble to convert column containing both int and float

查看:90
本文介绍了Julia:数据框包难以转换包含int和float的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个R用户,对Julia非常感兴趣.我没有计算机科学背景.我只是尝试使用以下命令在Juno中读取"csv"文件:

I'm a R user with great interest for Julia. I don't have a computer science background. I just tried to read a 'csv' file in Juno with the following command:

using CSV
using DataFrames

df = CSV.read(joinpath(Pkg.dir("DataFrames"), 
"path/to/database.csv"));

并收到以下错误消息

CSV.CSVError('error parsing a 'Int64' value on column 26, row 289; encountered '.'"
in read at CSV/src/Source.jl:294
in #read#29 at CSV/src/Source.jl:299
in stream! at DataStreams/src/DataStreams.jl:145
in stream!#5 at DataStreams/src/DataStreams.jl:151
in stream! at DataStreams/src/DataStreams.jl:187
in streamto! at DataStreams/src/DataStreams.jl:173
in streamfrom at CSV/src/Source.jl:195
in paresefield at CSV/src/paresefield.jl:107
in paresefield at CSV/src/paresefield.jl:127
in checknullend at CSV/src/paresefield.jl:56

我看一下数据框中指示的条目:第287、288行分别是这样的3033(似乎是Integer类型),而第289行是30.445(类型为float).

I look at the entry indicated in the data frame: the row 287, 288 are like this 30, 33 respectively (seem to be of type Integer) and the the row 289 is 30.445 (which is of type float).

DataFramesInt填充列并在看到Float时停止的问题吗?

Is the problem that DataFrames filling the column with Int and stopped when it saw an Float?

非常感谢

推荐答案

问题是浮点数在数据集中的发生太晚了.默认情况下,CSV.jl使用等于100rows_for_type_detect值.这意味着仅前100行用于确定输出中的列类型.将CSV.read中的rows_for_type_detect关键字参数设置为例如300,所有都应该正常工作.

The problem is that float happens too late in the data set. By default CSV.jl uses rows_for_type_detect value equal to 100. Which means that only first 100 rows are used to determine the type of a column in the output. Set rows_for_type_detect keyword parameter in CSV.read to e.g. 300 and all should work correctly.

或者,您可以传递types关键字参数来手动设置列类型(在这种情况下,此列的Float64是合适的).

Alternatively you can pass types keyword argument to manually set column type (in this case Float64 for this column would be appropriate).

这篇关于Julia:数据框包难以转换包含int和float的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆