Julia:数据框包难以转换包含int和float的列 [英] Julia : Dataframes packages having trouble to convert column containing both int and float
问题描述
我是一个R
用户,对Julia
非常感兴趣.我没有计算机科学背景.我只是尝试使用以下命令在Juno
中读取"csv"文件:
I'm a R
user with great interest for Julia
. I don't have a computer science background. I just tried to read a 'csv' file in Juno
with the following command:
using CSV
using DataFrames
df = CSV.read(joinpath(Pkg.dir("DataFrames"),
"path/to/database.csv"));
并收到以下错误消息
CSV.CSVError('error parsing a 'Int64' value on column 26, row 289; encountered '.'"
in read at CSV/src/Source.jl:294
in #read#29 at CSV/src/Source.jl:299
in stream! at DataStreams/src/DataStreams.jl:145
in stream!#5 at DataStreams/src/DataStreams.jl:151
in stream! at DataStreams/src/DataStreams.jl:187
in streamto! at DataStreams/src/DataStreams.jl:173
in streamfrom at CSV/src/Source.jl:195
in paresefield at CSV/src/paresefield.jl:107
in paresefield at CSV/src/paresefield.jl:127
in checknullend at CSV/src/paresefield.jl:56
我看一下数据框中指示的条目:第287、288行分别是这样的30
,33
(似乎是Integer
类型),而第289行是30.445
(类型为float
).
I look at the entry indicated in the data frame: the row 287, 288 are like this 30
, 33
respectively (seem to be of type Integer
) and the the row 289 is 30.445
(which is of type float
).
是DataFrames
用Int
填充列并在看到Float
时停止的问题吗?
Is the problem that DataFrames
filling the column with Int
and stopped when it saw an Float
?
非常感谢
推荐答案
问题是浮点数在数据集中的发生太晚了.默认情况下,CSV.jl使用等于100
的rows_for_type_detect
值.这意味着仅前100行用于确定输出中的列类型.将CSV.read
中的rows_for_type_detect
关键字参数设置为例如300
,所有都应该正常工作.
The problem is that float happens too late in the data set. By default CSV.jl uses rows_for_type_detect
value equal to 100
. Which means that only first 100 rows are used to determine the type of a column in the output. Set rows_for_type_detect
keyword parameter in CSV.read
to e.g. 300
and all should work correctly.
或者,您可以传递types
关键字参数来手动设置列类型(在这种情况下,此列的Float64
是合适的).
Alternatively you can pass types
keyword argument to manually set column type (in this case Float64
for this column would be appropriate).
这篇关于Julia:数据框包难以转换包含int和float的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!