在星火数据框功能创建新列 [英] Create new column with function in Spark Dataframe
问题描述
我试图找出在星火新的数据帧API。似乎是前锋,但有麻烦做的事情应该是pretty简单的一个很好的一步。我有2列,ID和金额一数据帧。作为一个普通的例子,比如说我要回了一个名为code的新列返回基于金额的价值code。我可以写一个functiin是这样的:
I'm trying to figure out the new dataframe API in Spark. seems like a good step forward but having trouble doing something that should be pretty simple. I have a dataframe with 2 columns, "ID" and "Amount". As a generic example, say I want to return a new column called "code" that returns a code based on the value of "Amt". I can write a functiin something like this:
def coder(myAmt:Integer):String {
if (myAmt > 100) "Little"
else "Big"
}
当我尝试使用这样的:
val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")
myDF.withColumn("Code", coder(myDF("Amt")))
我收到类型不匹配错误
I get type mismatch errors
found : org.apache.spark.sql.Column
required: Integer
我试过在我的功能改变输入类型org.apache.spark.sql.Column但我转念一开始得到wrrors witht他编译的功能,因为它想在if语句的布尔。
I've tried changing the input type on my function to org.apache.spark.sql.Column but I then I start getting wrrors witht he function compiling because it wants a boolean in the if statement.
我这样做不对吗?有没有更好的/另一种方式来做到这一点比使用withColumn?
Am I doing this wrong? Is there a better/another way to do this than using withColumn?
感谢您的帮助。
推荐答案
比方说,你在你的模式有金额一栏:
Let's say you have "Amt" column in your Schema:
import org.apache.spark.sql.functions._
val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")
val coder: (Int => String) = (arg: Int) => {if (arg < 100) "little" else "big"}
val sqlfunc = udf(coder)
myDF.withColumn("Code", sqlfunc(col("Amt")))
我觉得withColumn是添加一列以正确的方式
I think withColumn is the right way to add a column
这篇关于在星火数据框功能创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!