如何使用Scala Stream类读取大型CSV文件? [英] How do I read a large CSV file with Scala Stream class?

查看:189
本文介绍了如何使用Scala Stream类读取大型CSV文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用Scala Stream读取大型CSV文件(> 1 Gb)?你有代码示例吗?

How do I read a large CSV file (> 1 Gb) with a Scala Stream? Do you have a code example? Or would you use a different way to read a large CSV file without loading it into memory first?

推荐答案

只需使用 Source.fromFile(...)。getLines

Just use Source.fromFile(...).getLines as you already stated.

返回一个迭代器,它已经是lazy您会使用流作为一个惰性集合,您希望以前检索的值被记住,所以你可以再次读取)

That returns an Iterator, which is already lazy (You'd use stream as a lazy collection where you wanted previously retrieved values to be memoized, so you can read them again)

如果你遇到内存问题,问题会在于你在 getLines之后所做的事情。任何操作,如 toList ,强制一个严格的收集,将导致问题。

If you're getting memory problems, then the problem will lie in what you're doing after getLines. Any operation like toList, which forces a strict collection, will cause the problem.

这篇关于如何使用Scala Stream类读取大型CSV文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆