如何使用Scala从Spark中的列表或数组创建行 [英] How to create a Row from a List or Array in Spark using Scala
问题描述
我正在尝试根据用户输入创建行(org.apache.spark.sql.catalyst.expressions.Row
).我无法随机创建行".
I'm trying to create a Row (org.apache.spark.sql.catalyst.expressions.Row
) based on the user input. I'm not able to create a Row randomly.
是否有任何功能可以从List
或Array
创建行.
Is there any functionality to create a Row from List
or Array
.
例如,如果我有一个具有以下格式的.csv
文件,
For eg., If I have a .csv
file with the following format,
"91xxxxxxxxxx,21.31,15,0,0"
如果用户输入[1, 2]
,那么我只需要第二列和第三列以及第一列customer_id
If the user input [1, 2]
then I need to take only 2nd column and 3rd column along with the customer_id
which is the first column
我尝试用代码解析它:
val l3 = sc.textFile("/SparkTest/abc.csv").map(_.split(" ")).map(r => (foo(input,r(0)))) `
其中foo被定义为
def f(n: List[Int], s: String) : Row = {
val n = input.length
var out = new Array[Any](n+1)
var r = s.split(",")
out(0) = r(0)
for (i <- 1 to n)
out(i) = r(input(i-1)).toDouble
Row(out)
}
并且输入的是列表内容
val input = List(1,2)
执行此代码,我得到的l3为:
Executing this code I get l3 as:
Array[org.apache.spark.sql.Row] = Array([[Ljava.lang.Object;@234d2916])
但是我想要的是:
Array[org.apache.spark.sql.catalyst.expressions.Row] = Array([9xxxxxxxxxx,21.31,15])`
必须通过此操作才能在Spark SQL中创建架构
This has to be passed to create a schema in Spark SQL
推荐答案
类似以下的方法应该起作用:
Something like the following should work:
import org.apache.spark.sql._
def f(n: List[Int], s: String) : Row =
Row.fromSeq(s.split(",").zipWithIndex.collect{case (a,b) if n.contains(b) => a}.toSeq)
这篇关于如何使用Scala从Spark中的列表或数组创建行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!