FSharp.Data的CSV typeprovider的性能问题 [英] Performance issue with CSV typeprovider from FSharp.Data

查看:75
本文介绍了FSharp.Data的CSV typeprovider的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图通过使用它读取CSV文件来了解有关FSharp.Data项目的更多信息. CSV文件是来自Kaggle上的数字识别器竞赛的数据的简化版本.

I am trying to learn more about the FSharp.Data project by using it for reading a CSV file. The CSV file is a simplified version of the data from the digit recognizer competition on Kaggle.

当我读取包含785列和113行(包括标题行)的CSV文件时,以下两行代码的执行速度非常慢:

When I read the CSV file which contains 785 columns and 113 rows (including header row) then the following two lines of code executes really slow:

type trainingSet = CsvProvider<"Data/trainSmall.csv", ",", CacheRows=false>
let data = trainingSet.Load("Data/trainSmall.csv")

当我将第一行发送到F#交互式程序时,它会在大约10秒钟内返回,而当我将第二行代码发送到F#交互式程序时,则需要5分钟以上的时间才能返回交互式提示.

When I sent the first line to the F# interactive it returns in about 10 seconds whereas when I sent the second line of code to the F# interactive it takes more than 5 minutes before the interactive prompt replies.

我从2013年开始在MacBook Pro上使用2.6 GHz I5处理器和16GB内存(使用F#3.0和Xamarin Studio)运行代码.我已经尝试过在相同硬件上的VM下运行Windows7/VS2013的相同实验.结果是可比的.当我使用同一台机器并尝试使用R做完全相同的事情时,它是如此之快,以至于我无法用普通的手表为其计时.

I am running the code on my MacBook Pro from 2013 with a 2.6 GHz I5 processor and 16GB ram using F# 3.0 and Xamarin Studio. I have tried the same experiment with Windows7 / VS2013 running under a VM on the same hardware. The results are comparable. When I use the same machine and try to do the exact same thing with R it is so fast that I cannot time it with an ordinary watch.

请为我提供有关Fsharp.Data CSV类型提供程序的正确用法的建议!

Please advice me on the proper usage of the CSV typeprovider from Fsharp.Data!

推荐答案

我建议您不要为此使用CsvProvider.您正在加载矩阵,因此推断每个列的类型将不会有任何好处,因为它们都是相同的.您仍然可以通过使用CsvFile使用F#数据的CSV解析器. CsvProvider已针对没有很多列但可能有很多行的文件进行了优化.代码的生成方式将尝试在您的示例中生成包含785个元素的元组,

I recommend that you don't use CsvProvider for this. You're loading a matrix so you won't get any benefit of having the type of each column inferred, as they are all the same. You can still use the CSV parser of F# Data by using CsvFile. CsvProvider is optimized for files with not many columns but potentially many rows. The way the code is generated will try to generate a tuple with 785 elements on your example, which just won't work

这篇关于FSharp.Data的CSV typeprovider的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆