如何有条件地从列中删除前两个字符 [英] How to conditionally remove the first two characters from a column

查看:68
本文介绍了如何有条件地从列中删除前两个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些电话记录的以下数据,我想从每个记录中删除前两个值,因为它们是国家/地区代码.使用Scala, Spark 蜂巢?

I have the below data of some phone records, and I want to remove the first two values from each record as they are a country code. What is the way by which I can do this using Scala, Spark, or Hive?

phone
|917799423934|
|019331224595|
|  8981251522|
|917271767899|

我希望结果是:

phone
|7799423934|
|9331224595|
|8981251522|
|7271767899|

如何从该列的每条记录或每一行中删除前缀91,01?

How can we remove the prefix 91,01 from each record or each row of this column?

推荐答案

我相信有一个改进,它希望包含或包含等价物的列表,但是这里有:

An improvement I believe, would prefer a list with contains or the equivalent of, but here goes:

import org.apache.spark.sql.functions._

case class Tel(telnum: String)
val ds = Seq(
     Tel("917799423934"),
     Tel("019331224595"),
     Tel("8981251522"),
     Tel("+4553")).toDS()

val ds2 = ds.withColumn("new_telnum", when(expr("substring(telnum,1,2)") === "91" || expr("substring(telnum,1,2)") === "01", expr("substring(telnum,3,length(telnum)-2)")).otherwise(col("telnum"))) 

ds2.show

返回:

+------------+----------+
|      telnum|new_telnum|
+------------+----------+
|917799423934|7799423934|
|019331224595|9331224595|
|  8981251522|8981251522|
|       +4553|     +4553|
+------------+----------+

我们可能需要考虑+,但是没有说明.

We may need to think of the +, but nothing was stated.

这篇关于如何有条件地从列中删除前两个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆