如何删除从Spark数据帧创建的表中的行? [英] How to delete rows in a table created from a Spark dataframe?

查看:46
本文介绍了如何删除从Spark数据帧创建的表中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我想使用SQL语句进行简单的删除,但是当我执行sql脚本时,会引发以下错误:

Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error:

pyspark.sql.utils.ParseException:u"\在'a'处缺少'FROM'(第2行,pos 23)\ n \ n == SQL == \ n \ n删除a.* FROM adsquare a\ n ----------------------- ^^^ \ n"

pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n"

这些是我正在使用的脚本:

These is the script that I'm using:

sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")   

sql = """
    DELETE a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 > 1 """

sq.sql(sql)

注意:代码点表是在执行期间创建的.

Note: The codepoint table is created during the execution.

还有其他方法可以删除符合上述条件的行吗?

Is there any other way I can delete the rows with the above conditions?

推荐答案

您不能从数据框中删除行.但是您可以创建新的数据框,以排除不需要的记录.

You can not delete rows from Data Frame. But you can create new Data Frame which exclude unwanted records.

sql = """
    Select a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 <= 1 """

sq.sql(sql)

通过这种方式,您可以创建新的数据框.在这里,我使用了逆向条件 dis2< = 1

In this way you can create new data frame. Here I used reverse condition dis2 <= 1

这篇关于如何删除从Spark数据帧创建的表中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆