pySpark(v2.4)DataFrameReader在列名中添加了前导空格 [英] pySpark (v2.4) DataFrameReader adds leading whitespace to column names

查看:141
本文介绍了pySpark(v2.4)DataFrameReader在列名中添加了前导空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我拥有的CSV文件的摘要:

Here is a snippet of a CSV file that I have:

"Index", "Living Space (sq ft)", "Beds", "Baths", "Zip", "Year", "List Price ($)"
 1,       2222,                   3,      3.5,    32312, 1981,    250000
 2,       1628,                   3,      2,      32308, 2009,    185000
 3,       3824,                   5,      4,      32312, 1954,    399000
 4,       1137,                   3,      2,      32309, 1993,    150000
 5,       3560,                   6,      4,      32309, 1973,    315000

奇怪的是,当我执行以下pySpark(v2.4)语句时,标题列名称(减去第一列)具有前导空格.我尝试了不同的quoteescape options,但无济于事.

Oddly, when I perform the following pySpark (v2.4) statements, the header column names (minus the first column) have leading whitespaces. I've tried different quote and escape options, but to no avail.

有人知道为什么会这样吗,以及如何在加载时去除多余的空格吗?预先谢谢你!

Does anyone know why this is happening and how to strip the extra whitespaces on load? Thank you in advance!

>>> csv_file = '/tmp/file.csv'

>>> spark_reader.format('csv')

>>> spark_reader.option("inferSchema", "true")
>>> spark_reader.option("header", "true")
>>> spark_reader.option("quote", '"')

>>> df = spark_reader.load(csv_file)

>>> df.columns
['Index', ' "Living Space (sq ft)"', ' "Beds"', ' "Baths"', ' "Zip"', ' "Year"', ' "List Price ($)"']

推荐答案

,则可以使用ignoreLeadingWhiteSpace参数.

From the docs for pyspark.sql.DataFrameReader, you can use the ignoreLeadingWhiteSpace parameter.

ignoreLeadingWhiteSpace –一个标志,指示是否应跳过正在读取的值中的前导空格.如果设置为None,则使用默认值false.

ignoreLeadingWhiteSpace – A flag indicating whether or not leading whitespaces from values being read should be skipped. If None is set, it uses the default value, false.

在您的情况下,您只需要添加:

In your case, you just need to add:

spark_reader.option("ignoreLeadingWhiteSpace", "true")

这篇关于pySpark(v2.4)DataFrameReader在列名中添加了前导空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆