Pyspark:将列转换为小写 [英] Pyspark: Convert column to lowercase

查看：251 发布时间：2021/6/24 20:34:02 pyspark

本文介绍了Pyspark:将列转换为小写的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将列中的值转换为小写.目前，如果我使用 lower() 方法，它会抱怨列对象不可调用.由于在 SQL 中有一个名为 lower() 的函数，我假设有一个不涉及 UDF 或编写任何 SQL 的原生 Spark 解决方案.

I want to convert the values inside a column to lowercase. Currently if I use the lower() method, it complains that column objects are not callable. Since there's a function called lower() in SQL, I assume there's a native Spark solution that doesn't involve UDFs, or writing any SQL.

推荐答案

将 lower 与 col 一起导入:

Import lower alongside col:

from pyspark.sql.functions import lower, col

使用lower(col("bla")) 将它们组合在一起.在一个完整的查询中:

Combine them together using lower(col("bla")). In a complete query:

spark.table('bla').select(lower(col('bla')).alias('bla'))

相当于SQL查询

SELECT lower(bla) AS bla FROM bla

要保留其他列，请执行

spark.table('foo').withColumn('bar', lower(col('bar')))

不用说，这种方法比使用 UDF 更好，因为 UDF 必须调用 Python(这是一个缓慢的操作，Python 本身也很慢)，并且比用 SQL 编写它更优雅.

Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.

这篇关于Pyspark:将列转换为小写的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark:将列转换为小写 [英] Pyspark: Convert column to lowercase

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Pyspark:将列转换为小写 [英] Pyspark: Convert column to lowercase

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭