Scala Spark.通过DataType创建具有默认值的对象 [英] Scala Spark. Create object with default value by DataType

查看:78
本文介绍了Scala Spark.通过DataType创建具有默认值的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个org.apache.spark.sql.types.DataType对象的列表,例如,
val tps = [FloatType,LongType,FloatType,DoubleType] ,我从这样的数据帧中收到的信息:

I have a list of org.apache.spark.sql.types.DataType objects, say,
val tps = [FloatType, LongType, FloatType, DoubleType], which I receive from dataframe like this:

val tps = dataFrame.schema
      .filter(f => f.dataType.isInstanceOf[NumericType])
      .map(f => f.dataType)

,对于此列表中的每种类型,我都需要创建一个具有默认值的相应类型的对象:
[0.0,0l,0.0,0.0] .我该怎么办?

and for every type in this list I need to create an object of the corresponding type with default value:
[0.0, 0l, 0.0, 0.0]. How can I do that?

我尝试做

tps.map(t => t.getClass.newInstance())

,但没有成功,因为私有成员(无法使用修饰符"private" 访问类org.apache.spark.sql.types.LongType $的成员),并且因为此语句尝试创建DataType对象,所以我需要为其使用相应类型的对象.

, but it didn't work out, because private members (can not access a member of class org.apache.spark.sql.types.LongType$ with modifiers "private"), and because this statement tries to create objects of DataType, and I need objects of the corresponding types for them.

我对Scala还是很陌生,有人可以帮忙吗?

I'm fairly new to scala, can someone help?

推荐答案

我出于测试目的有这样的东西

I have for testing purposes something like this

object RowSampleMaker {

  var makerRunNumber = 1

  def apply(schema: StructType): Row = new GenericRowWithSchema(schema.map(field => {
      makerRunNumber += 1
      field.dataType match {
        case ShortType => makerRunNumber.toShort
        case IntegerType => makerRunNumber
        case LongType => makerRunNumber.toLong
        case FloatType => makerRunNumber.toFloat
        case DecimalType() => d(makerRunNumber)
        case DateType => new Date(System.currentTimeMillis)
        case TimestampType => new Timestamp(System.currentTimeMillis)
        case StringType => s"arbitrary-$makerRunNumber"
        case BooleanType => false
        case StructType(fields) => apply(StructType(fields))
        case t => throw new Exception(s"Maker doesn't support generating $t")
      }
    }).toArray, schema)

  implicit class RowManipulation(row: Row) {

    def update(fieldName: String, value: Any): Row = new GenericRowWithSchema(
      row.toSeq.updated(row.fieldIndex(fieldName), value).toArray,
      row.schema
    )
  }
}

您可以添加类型,并将随机性替换为0.或者具有另一个方法调用.zero,该方法返回所有中性值.隐式类的update方法是因为我出于测试目的通常会更新几个值.

You can add types, and replace the randomness for 0. Or have another method call .zero which returns all neutral values. The update method on the implicit class is because I generally update a couple of the values for the purpose of testing.

您将调用 RowSampleMaker(schema).update("field1",value1).update("field2",value2)对于您要生成的每一行,然后在该行之外创建一个数据框

You'd call RowSampleMaker(schema).update("field1", value1).update("field2", value2) For each row you want to generate and then create a dataframe off that

这篇关于Scala Spark.通过DataType创建具有默认值的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆