用另一个字符串替换正则表达式模式有效,但用 NONE 替换会替换所有值 [英] Replacing regex pattern with another string works, but replacing with NONE replaces all values
问题描述
我正在尝试将列中以 'DEL_' 开头的所有字符串替换为 NULL 值.
I am trying to replace all strings in a column that start with 'DEL_' with a NULL value.
我已经试过了:
customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", ""))
按预期工作,新列现在看起来像这样:
Which works as expected and the new column now looks like this:
+--------------+
| phone_number|
+--------------+
|00971585059437|
|00971559274811|
|00971559274811|
| |
|00918472847271|
| |
+--------------+
但是,如果我将代码更改为:
However, if I change the code to:
customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))
这将替换列中的所有值:
This now replaces all values in the column:
+------------+
|phone_number|
+------------+
| null|
| null|
| null|
| null|
| null|
| null|
+------------+
推荐答案
试试这个-
scala
df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), null)
.otherwise(col("phone_number"))
)
蟒蛇
df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), None)
.otherwise(col("phone_number"))
)
更新
查询-
你能解释为什么我原来的解决方案不起作用吗?customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))
Can you explain why my original solution doesn't work?
customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))
Ans- 所有的三元表达式(带 3 个参数的函数)都是 null-safe
.这意味着如果 spark 找到任何参数 null
,它确实会在没有任何实际处理的情况下返回 null(例如,regexp_replace 的模式匹配).你可能想看看 这个spark repo
Ans- All the ternary expressions(functions taking 3 arguments) are all null-safe
. That means if spark finds any of the arguments null
, it will indeed return null without any actual processing (eg. pattern matching for regexp_replace).
you may wanted to look at this piece of spark repo
override def eval(input: InternalRow): Any = {
val exprs = children
val value1 = exprs(0).eval(input)
if (value1 != null) {
val value2 = exprs(1).eval(input)
if (value2 != null) {
val value3 = exprs(2).eval(input)
if (value3 != null) {
return nullSafeEval(value1, value2, value3)
}
}
}
null
}
这篇关于用另一个字符串替换正则表达式模式有效,但用 NONE 替换会替换所有值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!