在星火数据框功能创建新列 [英] Create new column with function in Spark Dataframe

查看:233
本文介绍了在星火数据框功能创建新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出在星火新的数据帧API。似乎是前锋,但有麻烦做的事情应该是pretty简单的一个很好的一步。我有2列,ID和金额一数据帧。作为一个普通的例子,比如说我要回了一个名为code的新列返回基于金额的价值code。我可以写一个functiin是这样的:

I'm trying to figure out the new dataframe API in Spark. seems like a good step forward but having trouble doing something that should be pretty simple. I have a dataframe with 2 columns, "ID" and "Amount". As a generic example, say I want to return a new column called "code" that returns a code based on the value of "Amt". I can write a functiin something like this:

def coder(myAmt:Integer):String {
  if (myAmt > 100) "Little"
  else "Big"
}

当我尝试使用这样的:

val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")

myDF.withColumn("Code", coder(myDF("Amt")))

我收到类型不匹配错误

I get type mismatch errors

found   : org.apache.spark.sql.Column
required: Integer

我试过在我的功能改变输入类型org.apache.spark.sql.Column但我转念一开始得到wrrors witht他编译的功能,因为它想在if语句的布尔。

I've tried changing the input type on my function to org.apache.spark.sql.Column but I then I start getting wrrors witht he function compiling because it wants a boolean in the if statement.

我这样做不对吗?有没有更好的/另一种方式来做到这一点比使用withColumn?

Am I doing this wrong? Is there a better/another way to do this than using withColumn?

感谢您的帮助。

推荐答案

比方说,你在你的模式有金额一栏:

Let's say you have "Amt" column in your Schema:

import org.apache.spark.sql.functions._
val myDF = sqlContext.parquetFile("hdfs:/to/my/file.parquet")
val coder: (Int => String) = (arg: Int) => {if (arg < 100) "little" else "big"}
val sqlfunc = udf(coder)
myDF.withColumn("Code", sqlfunc(col("Amt")))

我觉得withColumn是添加一列以正确的方式

I think withColumn is the right way to add a column

这篇关于在星火数据框功能创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆