PySpark SQL中的LEFT和RIGHT函数 [英] LEFT and RIGHT function in PySpark SQL

查看:144
本文介绍了PySpark SQL中的LEFT和RIGHT函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是PySpark的新手.我使用熊猫提取了一个csv文件. 并使用registerTempTable函数创建了一个临时表.

I am new for PySpark. I pulled a csv file using pandas. And created a temp table using registerTempTable function.

from pyspark.sql import SQLContext
from pyspark.sql import Row
import pandas as pd
sqlc = SQLContext(sc)

aa1 = pd.read_csv("D:\mck1.csv")

aa2 = sqlc.createDataFrame(aa1)

aa2.show()

+--------+-------+----------+------------+---------+------------+-------------------+
|    City|     id|First_Name|Phone_Number|new_date|new      code|           New_date|
+--------+-------+----------+------------+---------+------------+-------------------+
|KOLKATTA|9000007|       AAA|  1111119411| 20080714|          13|2016-08-16 00:00:00|
|KOLKATTA|9000007|       BBB|  1111119421| 20080714|          13|2016-08-06 00:00:00|
|KOLKATTA|9000007|       CCC|  1111119461| 20080714|          13|2016-08-13 00:00:00|
|KOLKATTA|9000007|       DDD|  1111119471| 20080714|          13|2016-08-27 00:00:00|
|KOLKATTA|9000007|       EEE|  1111119491| 20080714|          13|2016-08-15 00:00:00|
|KOLKATTA|9111147|       FFF|  1111119401| 20080714|          13|2016-08-24 00:00:00|
|KOLKATTA|9585458|   FORMULA|  1111110112| 19990930|          13|2016-08-16 00:00:00|
|KOLKATTA|9569878|   APPLEII|  1111110132| 19990930|          13|2016-08-06 00:00:00|

aa3 = aa2.registerTempTable("mytable1")

sqlc.sql(""" select right(phone_number,4) from mytable1 """).show()

现在,我尝试使用right(phone_number,4)的电话号码的权利拉出最后四个字符,并遇到跟随错误

Now I try to pull last four character using right of phone number using right(phone_number,4) and facing followung error

---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-18-07f08e3d0a8f> in <module>()
----> 1 sqlc.sql(""" select right(Phone_number,4) from mytable1 """).show()

C:\spark-1.4.1-bin-hadoop2.6\python\pyspark\sql\context.pyc in sql(self, sqlQuery)
    500         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
    501         """
--> 502         return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
    503 
    504     @since(1.0)

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py in __call__(self, *args)
    536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
--> 538                 self.target_id, self.name)
    539 
    540         for temp_arg in temp_args:

C:\spark-1.4.1-bin-hadoop2.6\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
    298                 raise Py4JJavaError(
    299                     'An error occurred while calling {0}{1}{2}.\n'.
--> 300                     format(target_id, '.', name), value)
    301             else:
    302                 raise Py4JError(

Py4JJavaError: An error occurred while calling o55.sql.
: java.lang.RuntimeException: [1.9] failure: ``union'' expected but `right' found

 select right(Phone_number,4) from mytable1 
        ^
    at scala.sys.package$.error(package.scala:27)
    at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:36)
    at org.apache.spark.sql.catalyst.DefaultParserDialect.parse(ParserDialect.scala:67)
    at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:145)

为什么pyspark不支持RIGHT和LEFT功能? 如何为一列取四个字符?

Why pyspark is not supporting RIGHT and LEFT function? How can I take right of four character for a column?

推荐答案

查看文档,您是否尝试过substring函数?

looking at the documentation, have you tried the substring function?

pyspark.sql.functions.substring(str, pos, len)[source]

编辑

根据您的评论,您可以得到如下的最后四个:

per your comment, you can get the last four like this:

from pyspark.sql.functions import substring

df = sqlContext.createDataFrame([('abcdefg',)], ['s',])
df.select(substring(df.s, -4, 4).alias('s')).collect()

这篇关于PySpark SQL中的LEFT和RIGHT函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆