Spark-shell:列数不匹配 [英] Spark-shell : The number of columns doesn't match

查看：53 发布时间：2021/4/8 20:06:32 scala apache-spark apache-spark-sql

本文介绍了Spark-shell:列数不匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有csv格式文件，并由定界符"|"分隔.数据集有2列，如下所示.

I have csv format file and is separated by delimiter pipe "|". And the dataset has 2 column, like below .

Column1|Column2
1|Name_a
2|Name_b

但是有时我们仅收到一个列值，而其他值则丢失，如下所示

But sometimes we receive only one column value and other is missing like below

Column1|Column2
1|Name_a
2|Name_b
3
4
5|Name_c
6
7|Name_f

因此，对于上面的示例，任何具有不匹配的列号的行都是无用的值，对于我们来说，将是列值为 3、4和6 的行，我们希望丢弃这些行.有什么直接的方法可以丢弃这些行，而不会像下面这样从spark-shell读取数据时出现异常.

So any row having mismatched column no is a garbage value for us for the above example it will be rows having column value as 3, 4, and 6 and we want to discard these rows. Is there any direct way I can discard those rows, without having a exception while reading the data from spark-shell like below.

val readFile = spark.read.option("delimiter", "|").csv("File.csv").toDF(Seq("Column1", "Column2"): _*)

当我们尝试读取文件时，出现以下异常.

When we are trying to read the file we are getting the below exception.

java.lang.IllegalArgumentException: requirement failed: The number of columns doesn't match.
Old column names (1): _c0
New column names (2): Column1, Column2
  at scala.Predef$.require(Predef.scala:224)
  at org.apache.spark.sql.Dataset.toDF(Dataset.scala:435)
  ... 49 elided

推荐答案

您可以指定数据文件的架构，并允许某些列为空.在scala中，它可能看起来像:

You can specify schema of your data file and allow some columns to be nullable. In scala it may look like:

val schm = StructType(
  StructField("Column1", StringType, nullable = true) ::
  StructField("Column3", StringType, nullable = true) :: Nil)

val readFile = spark.read.
option("delimiter", "|")
.schema(schm)
.csv("File.csv").toDF

比您可以按列过滤数据集的方法不为空.

Than you can filter your dataset by column is not null.

这篇关于Spark-shell:列数不匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark-shell:列数不匹配 [英] Spark-shell : The number of columns doesn't match

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark-shell:列数不匹配 [英] Spark-shell : The number of columns doesn&#39;t match

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Spark-shell:列数不匹配 [英] Spark-shell : The number of columns doesn't match

登录关闭