Scala Spark.通过DataType创建具有默认值的对象 [英] Scala Spark. Create object with default value by DataType
问题描述
我有一个org.apache.spark.sql.types.DataType对象的列表,例如, val tps = [FloatType,LongType,FloatType,DoubleType]
,我从这样的数据帧中收到的信息:
I have a list of org.apache.spark.sql.types.DataType objects, say,
val tps = [FloatType, LongType, FloatType, DoubleType]
,
which I receive from dataframe like this:
val tps = dataFrame.schema
.filter(f => f.dataType.isInstanceOf[NumericType])
.map(f => f.dataType)
,对于此列表中的每种类型,我都需要创建一个具有默认值的相应类型的对象: [0.0,0l,0.0,0.0]
.我该怎么办?
and for every type in this list I need to create an object of the corresponding type with default value:
[0.0, 0l, 0.0, 0.0]
.
How can I do that?
我尝试做
tps.map(t => t.getClass.newInstance())
,但没有成功,因为私有成员(无法使用修饰符"private"
访问类org.apache.spark.sql.types.LongType $的成员),并且因为此语句尝试创建DataType对象,所以我需要为其使用相应类型的对象.
, but it didn't work out, because private members (can not access a member of class org.apache.spark.sql.types.LongType$ with modifiers "private"
), and because this statement tries to create objects of DataType, and I need objects of the corresponding types for them.
我对Scala还是很陌生,有人可以帮忙吗?
I'm fairly new to scala, can someone help?
推荐答案
我出于测试目的有这样的东西
I have for testing purposes something like this
object RowSampleMaker {
var makerRunNumber = 1
def apply(schema: StructType): Row = new GenericRowWithSchema(schema.map(field => {
makerRunNumber += 1
field.dataType match {
case ShortType => makerRunNumber.toShort
case IntegerType => makerRunNumber
case LongType => makerRunNumber.toLong
case FloatType => makerRunNumber.toFloat
case DecimalType() => d(makerRunNumber)
case DateType => new Date(System.currentTimeMillis)
case TimestampType => new Timestamp(System.currentTimeMillis)
case StringType => s"arbitrary-$makerRunNumber"
case BooleanType => false
case StructType(fields) => apply(StructType(fields))
case t => throw new Exception(s"Maker doesn't support generating $t")
}
}).toArray, schema)
implicit class RowManipulation(row: Row) {
def update(fieldName: String, value: Any): Row = new GenericRowWithSchema(
row.toSeq.updated(row.fieldIndex(fieldName), value).toArray,
row.schema
)
}
}
您可以添加类型,并将随机性替换为0.或者具有另一个方法调用.zero,该方法返回所有中性值.隐式类的update方法是因为我出于测试目的通常会更新几个值.
You can add types, and replace the randomness for 0. Or have another method call .zero which returns all neutral values. The update method on the implicit class is because I generally update a couple of the values for the purpose of testing.
您将调用 RowSampleMaker(schema).update("field1",value1).update("field2",value2)
对于您要生成的每一行,然后在该行之外创建一个数据框
You'd call RowSampleMaker(schema).update("field1", value1).update("field2", value2)
For each row you want to generate and then create a dataframe off that
这篇关于Scala Spark.通过DataType创建具有默认值的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!