如何删除从 Spark 数据帧创建的表中的行? [英] How to delete rows in a table created from a Spark dataframe?

查看:23
本文介绍了如何删除从 Spark 数据帧创建的表中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我想使用 SQL 语句进行简单的删除,但是当我执行 sql 脚本时,它会引发以下错误:

Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error:

pyspark.sql.utils.ParseException: u"\n在 'a' 处缺少 'FROM'(第 2 行,pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a\n----------------------^^^\n"

pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n"

这些是我正在使用的脚本:

These is the script that I'm using:

sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")   

sql = """
    DELETE a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 > 1 """

sq.sql(sql)

注意:代码点表是在执行过程中创建的.

Note: The codepoint table is created during the execution.

有没有其他方法可以删除具有上述条件的行?

Is there any other way I can delete the rows with the above conditions?

推荐答案

您不能从 Data Frame 中删除行.但是您可以创建排除不需要的记录的新数据框.

You can not delete rows from Data Frame. But you can create new Data Frame which exclude unwanted records.

sql = """
    Select a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 <= 1 """

sq.sql(sql)

通过这种方式,您可以创建新的数据框.这里我使用了反向条件 dis2 <= 1

In this way you can create new data frame. Here I used reverse condition dis2 <= 1

这篇关于如何删除从 Spark 数据帧创建的表中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆