如何有条件地从列中删除前两个字符 [英] How to conditionally remove the first two characters from a column
本文介绍了如何有条件地从列中删除前两个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一些电话记录的以下数据,我想从每个记录中删除前两个值,因为它们是国家/地区代码.使用Scala, Spark 或蜂巢?
I have the below data of some phone records, and I want to remove the first two values from each record as they are a country code. What is the way by which I can do this using Scala, Spark, or Hive?
phone
|917799423934|
|019331224595|
| 8981251522|
|917271767899|
我希望结果是:
phone
|7799423934|
|9331224595|
|8981251522|
|7271767899|
如何从该列的每条记录或每一行中删除前缀91,01?
How can we remove the prefix 91,01 from each record or each row of this column?
推荐答案
我相信有一个改进,它希望包含或包含等价物的列表,但是这里有:
An improvement I believe, would prefer a list with contains or the equivalent of, but here goes:
import org.apache.spark.sql.functions._
case class Tel(telnum: String)
val ds = Seq(
Tel("917799423934"),
Tel("019331224595"),
Tel("8981251522"),
Tel("+4553")).toDS()
val ds2 = ds.withColumn("new_telnum", when(expr("substring(telnum,1,2)") === "91" || expr("substring(telnum,1,2)") === "01", expr("substring(telnum,3,length(telnum)-2)")).otherwise(col("telnum")))
ds2.show
返回:
+------------+----------+
| telnum|new_telnum|
+------------+----------+
|917799423934|7799423934|
|019331224595|9331224595|
| 8981251522|8981251522|
| +4553| +4553|
+------------+----------+
我们可能需要考虑+,但是没有说明.
We may need to think of the +, but nothing was stated.
这篇关于如何有条件地从列中删除前两个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文