迭代Spark数据框中的行和列 [英] Iterate rows and columns in Spark dataframe

查看:97
本文介绍了迭代Spark数据框中的行和列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下动态创建的Spark数据框:

I have the following Spark dataframe that is created dynamically:

val sf1 = StructField("name", StringType, nullable = true)
val sf2 = StructField("sector", StringType, nullable = true)
val sf3 = StructField("age", IntegerType, nullable = true)

val fields = List(sf1,sf2,sf3)
val schema = StructType(fields)

val row1 = Row("Andy","aaa",20)
val row2 = Row("Berta","bbb",30)
val row3 = Row("Joe","ccc",40)

val data = Seq(row1,row2,row3)

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")

现在,我需要遍历sqlDF中的每一行和每一列以打印每一列,这是我的尝试:

Now, I need to iterate each row and column in sqlDF to print each column, this is my attempt:

sqlDF.foreach { row =>
  row.foreach { col => println(col) }
}

row的类型为Row,但不可迭代,这就是为什么此代码在row.foreach中引发编译错误的原因.如何迭代Row中的每一列?

row is type Row, but is not iterable that's why this code throws a compilation error in row.foreach. How to iterate each column in Row?

推荐答案

您可以使用toSeqRow转换为Seq.转到Seq后,您可以像往常一样使用foreachmap或任何您需要的内容进行迭代

You can convert Row to Seq with toSeq. Once turned to Seq you can iterate over it as usual with foreach, map or whatever you need

    sqlDF.foreach { row => 
           row.toSeq.foreach{col => println(col) }
    }

输出:

Berta
bbb
30
Joe
Andy
aaa
20
ccc
40

这篇关于迭代Spark数据框中的行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆