用另一个字符串替换正则表达式模式有效,但用 NONE 替换会替换所有值 [英] Replacing regex pattern with another string works, but replacing with NONE replaces all values

查看:37
本文介绍了用另一个字符串替换正则表达式模式有效,但用 NONE 替换会替换所有值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将列中以 'DEL_' 开头的所有字符串替换为 NULL 值.

I am trying to replace all strings in a column that start with 'DEL_' with a NULL value.

我已经试过了:

customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", ""))

按预期工作,新列现在看起来像这样:

Which works as expected and the new column now looks like this:

+--------------+
|  phone_number|
+--------------+
|00971585059437|
|00971559274811|
|00971559274811|
|              |
|00918472847271|
|              |
+--------------+

但是,如果我将代码更改为:

However, if I change the code to:

customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))

这将替换列中的所有值:

This now replaces all values in the column:

+------------+
|phone_number|
+------------+
|        null|
|        null|
|        null|
|        null|
|        null|
|        null|
+------------+

推荐答案

试试这个-

scala

df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), null)
          .otherwise(col("phone_number"))
      )

蟒蛇

df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), None)
          .otherwise(col("phone_number"))
      )

更新

查询-

你能解释为什么我原来的解决方案不起作用吗?customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))

Can you explain why my original solution doesn't work? customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))

Ans- 所有的三元表达式(带 3 个参数的函数)都是 null-safe.这意味着如果 spark 找到任何参数 null,它确实会在没有任何实际处理的情况下返回 null(例如,regexp_replace 的模式匹配).你可能想看看 这个spark repo

Ans- All the ternary expressions(functions taking 3 arguments) are all null-safe. That means if spark finds any of the arguments null, it will indeed return null without any actual processing (eg. pattern matching for regexp_replace). you may wanted to look at this piece of spark repo

  override def eval(input: InternalRow): Any = {
    val exprs = children
    val value1 = exprs(0).eval(input)
    if (value1 != null) {
      val value2 = exprs(1).eval(input)
      if (value2 != null) {
        val value3 = exprs(2).eval(input)
        if (value3 != null) {
          return nullSafeEval(value1, value2, value3)
        }
      }
    }
    null
  }

这篇关于用另一个字符串替换正则表达式模式有效,但用 NONE 替换会替换所有值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆