在 Julia 中更快地读取 CSV 文件 [英] Read CSV files faster in Julia

查看:23
本文介绍了在 Julia 中更快地读取 CSV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到使用 CSV.read 加载 CSV 文件非常慢.作为参考,我附上了一个时间基准的例子:

I have noticed that loading a CSV file using CSV.read is quite slow. For reference, I am attaching one example of time benchmark:

using CSV, DataFrames
file = download("https://github.com/foursquare/twofishes")
@time CSV.read(file, DataFrame)

Output: 
9.450861 seconds (22.77 M allocations: 960.541 MiB, 5.48% gc time)
297 rows × 2 columns

这是一个随机数据集,与 Julia 相比,这种操作的 Python 替代品编译时间短.既然,julia 比 python 快,为什么这个操作要花这么多时间?另外,有没有更快的方法来减少编译时间?

This is a random dataset, and a python alternate of such operation compiles in fraction of time compared to Julia. Since, julia is faster than python why is this operation takes this much time? Moreover, is there any faster alternate to reduce the compile timing?

推荐答案

你正在测量编译和运行时.

You are measuring the compile together with runtime.

测量时间的一种正确方法是:

One correct way to measure the time would be:

@time CSV.read(file, DataFrame)
@time CSV.read(file, DataFrame)

在第一次运行时,该函数在第二次运行时编译,您可以使用它.

At the first run the function compiles at the second run you can use it.

另一种选择是使用 BenchmarkTools:

using BenchmarkTools
@btime CSV.read(file, DataFrame)

通常,人们使用 Julia 来处理庞大的数据集,因此单个初始编译时间并不重要.但是,可以将 CSV 和 DataFrame 编译到 Julia 的系统映像中,并且从第一次运行开始就可以快速执行,请参见此处:为什么 julia 需要很长时间才能导入一个包?(不过这个更高级,通常不需要它)

Normally, one uses Julia to work with huge datasets so that single initial compile time is not important. However, it is possible to compile CSV and DataFrame into Julia's system image and have fast execution from the first run, for isntructions see here: Why julia takes long time to import a package? (this is however more advanced usually one does not need it)

您还有另一个选择是降低编译器的优化级别(这适用于您的工作量很小且经常重新启动并且您不希望映像构建带来的所有复杂性的情况.在这个笼子里,您可以运行 Julia:

You also have yet another option which is reducing the optimization level for the compiler (this would be for scenarios where your workload is small and restarted frequently and you do not want all complexity that comes with image building. In this cage you would run Julia as:

julia --optimize=0 my_code.jl

最后,就像@Oscar Smith 在即将发布的 Julia 1.6 中提到的那样,编译时间会稍微短一些.

Finally, like mentioned by @Oscar Smith in the forthcoming Julia 1.6 the compile times will be slightly shorter.

这篇关于在 Julia 中更快地读取 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆