Spark 1.3 Dataframe 中的 Strip 或 Regex 函数 [英] Strip or Regex function in Spark 1.3 Dataframe

查看：86 发布时间：2021/11/14 22:12:33 regex apache-spark dataframe pyspark apache-spark-sql

本文介绍了Spark 1.3 Dataframe 中的 Strip 或 Regex 函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些来自 PySpark 1.5 的代码，不幸的是我不得不向后移植到 Spark 1.3.我有一个包含字母数字元素的列，但我只想要数字.'df' 的 'old_col' 中的元素示例如下:

I have some code from PySpark 1.5 that I unfortunately have to port backwards to Spark 1.3. I have a column with elements that are alpha-numeric but I only want the digits. An example of the elements in 'old_col' of 'df' are:

 '125 Bytes'

在 Spark 1.5 中我可以使用

In Spark 1.5 I was able to use

df.withColumn('new_col',F.regexp_replace('old_col','(\D+)','').cast("long"))

但是，我似乎无法使用旧的 1.3 方法(如 SUBSTR 或 RLIKE)提出解决方案.原因是字节"前面的位数会因长度而异，所以我真正需要的是在 Spark 1.3 中找不到的替换"或条带"功能有什么建议吗?

However, I cannot seem to come up with a solution using old 1.3 methods like SUBSTR or RLIKE. Reason being the number of digits in front of "Bytes" will vary in length, so what I really need is the 'replace' or 'strip' functionality I can't find in Spark 1.3 Any suggestions?

推荐答案

只要你使用 HiveContext 你就可以使用 selectExpr 来执行相应的 Hive UDF:

As long as you use HiveContext you can execute corresponding Hive UDFs either with selectExpr:

df.selectExpr("regexp_extract(old_col,'([0-9]+)', 1)")

或使用纯 SQL:

df.registerTempTable("df")
sqlContext.sql("SELECT regexp_extract(old_col,'([0-9]+)', 1) FROM df")

这篇关于Spark 1.3 Dataframe 中的 Strip 或 Regex 函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 1.3 Dataframe 中的 Strip 或 Regex 函数 [英] Strip or Regex function in Spark 1.3 Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 1.3 Dataframe 中的 Strip 或 Regex 函数 [英] Strip or Regex function in Spark 1.3 Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭