在Julia中更快地读取CSV文件 [英] Read CSV files faster in Julia

查看:250
本文介绍了在Julia中更快地读取CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到使用CSV.read加载CSV文件非常慢. 作为参考,我附上一个时间基准示例:

I have noticed that loading a CSV file using CSV.read is quite slow. For reference, I am attaching one example of time benchmark:

using CSV, DataFrames
file = download("https://github.com/foursquare/twofishes")
@time CSV.read(file, DataFrame)

Output: 
9.450861 seconds (22.77 M allocations: 960.541 MiB, 5.48% gc time)
297 rows × 2 columns

这是一个随机数据集,与Julia相比,这种操作的python替代方法在很短的时间内即可完成编译.由于julia比python快,为什么此操作需要这么多时间?而且,有没有更快的替代方法来减少编译时间?

This is a random dataset, and a python alternate of such operation compiles in fraction of time compared to Julia. Since, julia is faster than python why is this operation takes this much time? Moreover, is there any faster alternate to reduce the compile timing?

推荐答案

您正在与运行时一起评估编译.

You are measuring the compile together with runtime.

一种正确的时间测量方法是:

One correct way to measure the time would be:

@time CSV.read(file, DataFrame)
@time CSV.read(file, DataFrame)

在第一次运行时,该函数将在第二次运行时编译.

At the first run the function compiles at the second run you can use it.

另一个选择是使用BenchmarkTools:

using BenchmarkTools
@btime CSV.read(file, DataFrame)

通常,人们使用Julia来处理庞大的数据集,因此单个初始编译时间并不重要.但是,可以将CSV和DataFrame编译为Julia的系统映像,并从第一次运行就可以快速执行,有关说明,请参见:

Normally, one uses Julia to work with huge datasets so that single initial compile time is not important. However, it is possible to compile CSV and DataFrame into Julia's system image and have fast execution from the first run, for isntructions see here: Why julia takes long time to import a package? (this is however more advanced usually one does not need it)

您还有另一种选择,降低了编译器的优化级别(这适用于工作量较小且经常重新启动并且您不希望图像构建附带所有复杂性的情况.在这种情况下,您可以将Julia运行为:

You also have yet another option which is reducing the optimization level for the compiler (this would be for scenarios where your workload is small and restarted frequently and you do not want all complexity that comes with image building. In this cage you would run Julia as:

julia --optimize=0 my_code.jl

最后,就像@Oscar Smith在即将到来的Julia 1.6中提到的那样,编译时间将略短.

Finally, like mentioned by @Oscar Smith in the forthcoming Julia 1.6 the compile times will be slightly shorter.

这篇关于在Julia中更快地读取CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆