Spark Dataframe列可为空的属性更改 [英] Spark Dataframe column nullable property change

查看:463
本文介绍了Spark Dataframe列可为空的属性更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想更改Spark数据框中特定列的可空属性.

I want to change the nullable property of a particular column in a Spark Dataframe.

如果我当前打印数据框的架构,则如下所示.

If I print schema of the dataframe currently it looks like below.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = false)
col4: float (nullable = true)

我只想更新col3可为空的属性.

I just want col3 nullable property to be updated.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = true)
col4: float (nullable = true)

我在网上检查了一些链接,但似乎他们对所有列都执行了此操作,但不是针对特定列,请参阅 更改spark数据框中列的可空属性. 有人可以在这方面帮助我吗?

I checked online here are some links, but seems like they are doing it for all the columns but not to a specific column, see Change nullable property of column in spark dataframe. Can any one please help me in this regard?

推荐答案

没有清晰"的方法可以做到这一点.您可以在此处

There is no "clear" way to do this. You can use trick like here

该答案的相关代码:

def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField] with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

它将复制DataFrame并复制架构,但会以编程方式指定可空值

It would copy DataFrame and copy schema, but with specyfying nullable programatically

许多列的版本:

def setNullableStateOfColumn(df: DataFrame, nullValues: Map[String, Boolean]) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField]s with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if nullValues.contains(c) => StructField( c, t, nullable = nullValues.get(c), m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

用法: setNullableStateOfColumn(df1,Map("col1"-> true,"col2"-> true,"col7"-> false));

Usage: setNullableStateOfColumn(df1, Map ("col1" -> true, "col2" -> true, "col7" -> false));

这篇关于Spark Dataframe列可为空的属性更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆