如何删除从Spark数据帧创建的表中的行? [英] How to delete rows in a table created from a Spark dataframe?
问题描述
基本上,我想使用SQL语句进行简单的删除,但是当我执行sql脚本时,会引发以下错误:
Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error:
pyspark.sql.utils.ParseException:u"\在'a'处缺少'FROM'(第2行,pos 23)\ n \ n == SQL == \ n \ n删除a.* FROM adsquare a\ n ----------------------- ^^^ \ n"
pyspark.sql.utils.ParseException: u"\nmissing 'FROM' at 'a'(line 2, pos 23)\n\n== SQL ==\n\n DELETE a.* FROM adsquare a \n-----------------------^^^\n"
这些是我正在使用的脚本:
These is the script that I'm using:
sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")
sql = """
DELETE a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 > 1 """
sq.sql(sql)
注意:代码点表是在执行期间创建的.
Note: The codepoint table is created during the execution.
还有其他方法可以删除符合上述条件的行吗?
Is there any other way I can delete the rows with the above conditions?
推荐答案
您不能从数据框中删除行.但是您可以创建新的数据框,以排除不需要的记录.
You can not delete rows from Data Frame. But you can create new Data Frame which exclude unwanted records.
sql = """
Select a.* FROM adsquare a
INNER JOIN codepoint c ON a.grid_id = c.grid_explode
WHERE dis2 <= 1 """
sq.sql(sql)
通过这种方式,您可以创建新的数据框.在这里,我使用了逆向条件 dis2< = 1
In this way you can create new data frame. Here I used reverse condition dis2 <= 1
这篇关于如何删除从Spark数据帧创建的表中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!