Spark Dataframe 列可空属性更改 [英] Spark Dataframe column nullable property change

查看:71
本文介绍了Spark Dataframe 列可空属性更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想更改 Spark 数据帧中特定列的可为空属性.

I want to change the nullable property of a particular column in a Spark Dataframe.

如果我当前打印数据帧的架构,它看起来如下所示.

If I print schema of the dataframe currently it looks like below.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = false)
col4: float (nullable = true)

我只想更新 col3 可为空的属性.

I just want col3 nullable property to be updated.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = true)
col4: float (nullable = true)

我在网上查看了一些链接,但似乎他们是针对所有列执行此操作,而不是针对特定列执行此操作,请参阅更改 spark 数据框中列的可为空属性.任何人都可以在这方面帮助我吗?

I checked online here are some links, but seems like they are doing it for all the columns but not to a specific column, see Change nullable property of column in spark dataframe. Can any one please help me in this regard?

推荐答案

没有明确"的方法可以做到这一点.您可以使用此处

There is no "clear" way to do this. You can use trick like here

来自该答案的相关代码:

Relevant code from that answer:

def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField] with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

它会复制 DataFrame 和复制模式,但以编程方式指定可为空

It would copy DataFrame and copy schema, but with specyfying nullable programatically

多列的版本:

def setNullableStateOfColumn(df: DataFrame, nullValues: Map[String, Boolean]) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField]s with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if nullValues.contains(c) => StructField( c, t, nullable = nullValues.get(c), m)
    case y: StructField => y
  })
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )
}

用法:setNullableStateOfColumn(df1, Map ("col1" -> true, "col2" -> true, "col7" -> false));

Usage: setNullableStateOfColumn(df1, Map ("col1" -> true, "col2" -> true, "col7" -> false));

这篇关于Spark Dataframe 列可空属性更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆