使用 spark csv 包读取非常大的文件时出错 [英] Error while reading very large files with spark csv package

查看:28
本文介绍了使用 spark csv 包读取非常大的文件时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试使用 spark-csv 和 univocity 1.5.0 解析器读取一个 3 gb 文件,该文件的一个列中有多个换行符,但是该文件在某些​​行的多列中被拆分换行符.这种情况发生在大文件的情况下.

我们使用的是 spark 1.6.1 和 Scala 2.10

以下是我用来读取文件的代码:

sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("模式","FAILFAST").option("转义","\"").option("qoute"."\"").option("parserLib","univocity").load("abc.csv")

java.lang.exception:2015 年 1 月 20 日失败.

示例文件:AAAAAAAA"、AA999"、AA999"、AA999"、9999-99-99-99.99.99.999999"、AAAAAA99"、Aaaaa Aaaaaaaa

99/99/9999 - AAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Aaa aaaaaaaaaaaaaaaaaaaaaa

99/99/9999Aaaaa aaaaaa - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

99/99/9999Aaa'a aaaaaa/aaa aaaaaaa - AAA aaaaaaaaaaa'a aaaaaaa

99/99/9999AAA aaaaaa - aaaaaaaaaaaaaaa

99/99/9999AAAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaa啊啊啊啊啊.啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊A&Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa啊啊啊啊啊啊啊啊啊啊啊啊啊.

99/99/9999 - aaaaa aaaaaaaa.

99/99/9999 - AAA

99/99/9999AAA aaaaaa aaaaa aa Aaa 9999 aaaa aaaaaaaaa aaaaaaaaa - aa A&Aa.啊啊啊啊啊啊啊啊.

99/99/9999AAA aaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

99/99/9999 - 啊啊啊啊啊啊啊啊.Aaaaaaaaaaaaaaaa 99/99/9999 - 99/99/9999

99/99/9999 - aaaaaa aaaaaaa aa AAAA aa:AAAA aaaaa aaaa aaaaaa aaaa aaaa aaaaa aa aaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa了

99/99/9999啊啊啊啊/啊啊啊啊啊啊.啊啊啊啊啊啊啊啊啊啊.

99/99/9999啊啊啊啊啊啊啊啊啊.

99/99/9999啊啊啊啊啊啊啊啊啊啊啊

99/99/9999Aaaaaa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

99/99/9999AAA aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

99/99/9999AAA aaaaa aaaaaa - aaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAAA AAAAAA AAAAAAAAA AAAAAAAA AAAAAAAAA AAAAAAAA AAA AAAA AAAAAA AA AAAAAA AAAAAA AAAAA AAAA AA AAAAAA AAA AAAAAAAA AAAAAAAAA A和AA.AAA AAAAAAAAA,AAAAAAAAA AAAAA AAAAAAAAA

99/99/9999AAA aaaaaa aaaaaaa aaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

99/99/9999AAA - aaaaaaaaaaaaaaaaaaaa.

AAA aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

99/99/9999AAA aaaaa aaaaaa - Aaaaaaaaaaaa aaaaaa aa 99/99/9999 aaaaaa aaaa aaaa aaaaa aaa aaaaaaaaaaa/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaAaa级AAAA AAAAAAAAAAAA 99/99/9999 AA AAA AAA AAAAAAAAAAA aaaaaaaaaaaaa AAAAA 99/99/9999 AAAA AAA AA AAAAAAA AAAAAAAAA AAAAAAAA,AAAAAA AAA AA AAAAA AAAAAAAAA AA AA 99机管局AAA AAAAAAA AA AAAAAAAAA AAAAAAAA,AAA AAAAAAAAAA AAAAAAAA AAAAA AAAA AAAAAAAAAAA AAAA AAAAaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

99/99/9999AAA aaaa aaaaa - AAA aaaaaaa aaaa A&Aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 9999.

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa了Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.

99/99/9999啊啊啊啊啊啊啊啊

99/99/9999 - a/a aaaa aa aaaaaaaaaaa

99/99/9999啊啊啊啊啊啊啊啊啊啊啊

99/99/9999 - aaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa99/99/9999 - aaa aaaa aa aaaaaaaaaaaaaaaaa aaa aaaaaaaaaaaaaa aaaa aaaaa aaaa aaaa aaa aaa 99, 9999 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

99/99/9999 - aaaa aaa'a aaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa99/99/9999 - aaaa aaaaaa aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa了aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊.啊啊啊啊啊啊啊啊啊啊.

99/99/9999 - Aaaaa AAA aaaaaa aaaaaaaa.AAAAAAAAA AAAA AAAA AAAAA AAAAAAA AAAAAAAA AAA AAAAA AAAAAAAAAA AAAAAAAA,AAAAAAA,AAAAA AA的AAA AA AAAA AAAA AAAAAAA AA AAAAAAAA AA AAAAAAA,AAAA AAAAA,AAAAAA AAA,AAAA AA AAAAAAAA,AAAA AA AAAAAAAAAA,AAAAAAA AAAAA AAAAAA.AAAAA AAA AAAAA AAAA AAAAAAA AAAAAAAA AA AAA AAAAAAAAAA AA AAAAAAAAAAA AAAAAAAA AAAAAAAAA AAAAAAA(AAAAA AA AA AAAAAAAAAA AAAA级9999).Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.AAA AAAAAAAA AAA AAAAAAAAAA AA的AAAAAAAAAA AA AAAAAAAA AAAAAAAA,AAAAAA AAAAA AA AAA AAAAAA AAAAA AAAAAAAAAAA AAA AAAAAAAA AA AAA AAAAA AAAAAAAA AA AAA 9999 AA AAA AAAAAAA AA AAAAAAA AA AAAAAAA AAAAAAAA.Aa Aaa 9999,Aa.Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.啊啊啊啊啊啊啊啊.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊啊AAAAAAAAA:AAAAA AAA AAAAAAAA AA AAAAAAA AA AAAA AAAAA,AAAA AAAAAAA AAA AAAAAAAA AAAAAAA AAAAAAAAA AAA AAAA AA AAAA AAAAAAAA AA AAAAA AAAAAAAAA AAAAAAA AA AAAA-AAAAAAAAAA AAAAAAAAAA,AAA AAAAAAAAA AAAAAAA AAAA."

解决方案

Spark 的 CSV 关系基于它的 TextBasedFileFormat 并且只逐行查看输入,所以它不支持多行记录.如果您需要支持多行记录,您可以考虑使用 wholeTextFiles 代替并手动解析输入(但理想情况下,这应该作为预处理数据清理工作来完成).

We are trying to read a 3 gb file which has multiple new line character in one its column using spark-csv and univocity 1.5.0 parser, but the file is getting split in the multiple column in some row on the basis of newline character. This scenario is occurring in case of large file.

We are using spark 1.6.1 and scala 2.10

Following code i'm using for reading the file :

sqlContext.read
    .format("com.databricks.spark.csv")
    .option("header", "true") 
    .option("inferSchema", "true")
    .option("mode","FAILFAST")
    .option("escape","\"")
    .option("qoute"."\"")
    .option("parserLib","univocity")
    .load("abc.csv")

java.lang.exception: FAILFAST at 01/20/2015 .

Sample File : "A AAAAAAAA","AA999","AA999","AA999","9999-99-99-99.99.99.999999","AAAAAA99","Aaaaa Aaaaaaaa

99/99/9999 - AAA Aaaaaaa Aa: aaaaaaaaa aa A aaaaa, aaaaaaaa aaa aaaaaaa aaaaaaaaaa

Aaa aaaaa aa AAA aaa aaaaaaaaaaa

99/99/9999 Aaaaa aaaaaa - aa aaaaaaaa aaaaaaaaa aaaaaaaa aaaaa aaa aaaaaa aa aaaaaaaaaa aaaaaa aa aaaaaaa aaaaaaaaa.

99/99/9999 Aaa'a aaaaaa a/ aaa aaaaaaa - AAA aaaaaaaaa aaa'a aaaaaaa

99/99/9999 AAA aaaaaa - aaaaaaa aaaaaaaaa

99/99/9999 AAA aaaaaa. Aaa aaaa Aa. Aaaaaa Aa: aaaaaaaaa aaaaaaaa aaaaaa, A aaaaaaa aaaa aaaaaaaaaa, aaaaa aaaaaaa aaaa aaaaaaaaaa (aaaa aaaaaaaaaaaa aaaaaaa). A&Aa aaaaaa aa aaaaaaaaaa aaa aaaa aaaaaa aaaa aaaaa aa aaaaaaaaa, A aaaaaaaa aaaaa aaa aaaaa aaaaaaaa aaaaa aaaa aaaaa aa aaaaaaaaa. Aaa aaaaaa aaaaaa aaaaaa aaaa aaaaaa.

99/99/9999 - aaaaa aaaaaaaa.

99/99/9999 - AAA

99/99/9999 AAA aaaaaa aaaaa aa Aaa 9999 aaaa aaaaaaaaa aaaaaaaaaa - aa A&Aa. Aaaaaaaaaa aaaaa aaaaaa.

99/99/9999 AAA aaaaa aaaaaa - aa aaaaaaa aa aaaaa aaaaaa aa AAA aa AAA aaa aa aaaaaa aaaaaa aaaa-aaaaaaaaaaa. Aa aaaaaaaa aa aaaaaa A&Aa aaaaa aa aaaaa aaaaaaa.

99/99/9999 - Aaaaaa aaaaaa aaaa. Aaaaaaaa aaaa aaaa 99/99/9999 - 99/99/9999

99/99/9999 - aaaaaa aaaaaaa aa AAAA aa: AAAA aaaaa aaaa aaaaaa aaaa aaaa aaaaa aa aaa aaaaaaaaa.

99/99/9999 Aaaaaa a/ aaa aaaaaaa. Aaaa aaaaaaaa aa aaaaaaaaaaaa aa AA.

99/99/9999 Aaaaaa aaaaaa aaaaaa aaaa.

99/99/9999 Aaaaaaaa aaaaaa aa aaaaaa aaaa

99/99/9999 Aaaaaa a/ aaa aaaaaaa aaa'a aaaaaaaaa aaaaaaaaaaa aaaaaaa

99/99/9999 AAA aaaaaa A&Aa aaaaaa aaa aaaaaaaaaaaaaa aaa aaaaa aaaaaa

99/99/9999 AAA aaaaa aaaaaa - aaaaa aaaaaaaaaaaaaaa aaa aaaaaaaaaaaa aa aaaaaaaaaaa. Aaa aaaaaa aaaaaaaaa aaaaaaaa aaaaaaaaa aaaaaaaa aaa aaaa aaaaaa aa aaaaaa aaaaaa aaaaa aaaa aa aaaaaa aaa aaaaaaaa aaaaaaaaa A&Aa aaa aaaaaaaaa, aaaaaaaaa aaaaa aaaaaaaaa.

99/99/9999 AAA aaaaaa aaaaaaa aaaa aaaaaa aa Aaa 9. A&Aa aaaaaa aa aaaaa aaaaa aaaa aaaaaaaa, aaaaaaaaaa aaaa aaaaaaaa aaa aaaa aaaaa aaaaaaaa aaaaaa.

99/99/9999 AAA - aaaaaaaaaaa aaaaaaaaaa.

AAA aaaaaaaaa aaaaaaaaaa aaaaaaa aaaa aaaaaaaaaaaa aaaaa aa aaaa aaaaaa aa aaaaaaa aa aaaaa aaaaaaaaa aaaaa aa aaaaaaaaaaa aa aaaa.

99/99/9999 AAA aaaaa aaaaaa - Aaaaaaaaaaaa aaaaaa aa 99/99/9999 aaaaaa aaaa aaaa aaaaa aaa aaaaaaaaaa a/ aaaaaaaaa aaaaaaaaa aaaaaaaa. Aaa aaaaaaaaaaaa aaaa aa 99/99/9999 aaa aaa aaaaaaaaaaa aaaaaaaaaaaaa aaaaa 99/99/9999 aaaa aaa aaaaaaa aa aaaaaaaaa aaaaaaaa, aaaaaa AAA aa aaaaa aaaaaaaaa aa aa 99. Aa aaa aaaaaaa aa aaaaaaaaa aaaaaaaa, aaa aaaaaaaaaa aaaaaaaa aaaaa aaaa aaaaaaaaaaa aaaa aaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa.

99/99/9999 AAA aaaa aaaaa - AAA aaaaaaa aaaa A&Aa aaaaaaaaaa aa aaa aaaaaaaaaaaa aaaaa aaaa aaaa aaaaaaa aa Aaaaa 9999.

Aaaaaaaaa aaaaaa aa aaaaa aa aa Aaa 9, 9999 aaa aaaaaaa aaaaaaaa aaaaa aaa aaaaaaaa aaaa Aa. Aaaaaaaa aaa aaaa aaaaaa aa aaaaaaa aaaaaa aa A&Aa aaa aaaaaaaa aaaaaa aaaa aaaa. Aaaa aa aaaaaaa aaa aaaaa aa aaaaaaaaaa aaaa aaa aaaaaaaaaa aa aaaaa aa aaaaaaaaaa aaaaa aa aaaaaaaaaaaa.

99/99/9999 Aaaaaaa aaa'a aaaa AA

99/99/9999 - a/a aaaa aa aaaaaaaaaaaa

99/99/9999 Aaaaaaa aaa'a aaaa aaaaaaaaaaaa

99/99/9999 - aaaa aaaaaa aa aaaaaaaaaaaa aaaaaaaa aaa aaa aaaaaaaaaa 99/99/9999 - aaa aaaa aa aaaaaaaaaaaa aaaaaa aaa aaaaaaaaaaaa aaaa aaaa aaaaa aaaa aaaa aaa Aaa 99, 9999 aaaaa aaa aaa aaaaaaaaaa

99/99/9999 - aaaa aaa'a aaaa aaaaaaaaaaaa aaaaaaaa aa aaaa aaaa aaaaaaa aaaaaaaaaaa 99/99/9999 - aaaa aaaaaa aa aaaaaaaaaaaa aa: a/a aaaa aa aa aaaa. Aaaaaaaaa aaaaaaa aaa aaaaaa aaaa aaa aaaaaaaaaaa aaa aaa aaaaaaa aaa aa aaa aaaaaa aa aaaaa. aaa aaaa aaa aaaa aaaaa aaaaa aaaaaaaa aaa aaaa Aaaa aaa aaaa aa Aaaaaaaaa. Aaaa aaa aa aaaaa a/a aaaaa aaaaa. Aaa aaaaaa aa aaaa aaaaa aaaaa.

99/99/9999 - Aaaaa AAA aaaaaa aaaaaaaa. Aaaaaaaaa aaaa aaaa aaaaa aaaaaaa aaaaaaaa aaa Aaaaa Aaaaaaaaaa Aaaaaaaa, aaaaaaa, aaaaa aa a aaa aa aaaa aaaa aaaaaaa aa aaaaaaaa aa aaaaaaa, aaaa aaaaa, aaa aaaaaa, aaaa aa aaaaaaaa, aaaa aa aaaaaaaaaa, aaaaaaa aaaaa aaaaaa. Aaaaa aaa aaaaa aaaa aaaaaaa aaaaaaaa aa aaa aaaaaaaaaa aaaaaaaaaaa aa aaaaaaaa aaaaaaaaa aaaaaaa (aaaaa aa aaaaaaaaaa aa Aaaa 9999). Aaaa aa aaaaa aa aaaa aa aaaaaa aa aaaa. Aaa aaaaaaaa aaa aaaaaaaaaa aa a aaaaaaaaaa aa aaaaaaaa aaaaaaaa, aaaaaa aaaaa aa aaa aaaaaa aaaaaaaaaaa aaaaa aaa aaaaaaaa aa aaa aaaaaaaa aaaaa aa Aaa 9999 aa aaa aaaaaaa aa aaaaaaa aa aaaaaaa aaaaaaaa. Aa Aaa 9999, Aa. Aaaaaaaa aaaaa aaaaaaaaaa aaa aaaaaaaa aaaaaaaa, aaa aa aaaa aaa aaaaaaa aa aaaa aaa aa aaa aaaaaaaa. Aa aa aaaaa aa Aa. A aaaa aaaaaaaaaa aaaaaaaa aaaaaaaaa aaaa aaaa. Aaa A/Aa aaa aaaaa aaaaa Aaa 9999 aaaaa aaaa aaaaaaaa aaaa aa aaaaaaaaaa, aaaa aa aaaaaaaaaaaaa aaa aaaaaaaaa, aaaaaaa, aaaaaaaaa, aaaaaaaaa aaaa, aaaaaaaaaaaaa. Aaaaaaaaa: Aaaaa aaa aaaaaaaa aa aaaaaaa aa aaaa aaaaa, aaaaaaa aaaa aaa aa-aaaaaa aaaaaaa aaaaaaaaa aaa aaaa aa aaaa aaaaaaaa aa aaaaa aaaaaaaaa aaaaaaa aa aaaa-aaaaaaaaaa aaaaaaaaaa, aaa aaaaaaaaa aaaaaaa aaaa. "

解决方案

Spark's CSV relation is based on its TextBasedFileFormat and only looks at the input on a line-by-line basis, so it does not support multi-line records. If you need to support multi-line records you can look at using wholeTextFiles instead and manually parsing the input (but ideally this should be done as a pre-processing data cleanup job).

这篇关于使用 spark csv 包读取非常大的文件时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆