如何将PySpark数据帧写入DynamoDB表? [英] How to write PySpark dataframe to DynamoDB table?
问题描述
如何将PySpark数据帧写入DynamoDB表?找不到太多有关此的信息.根据我的要求,我必须将PySpark数据帧写入Dynamo db表.总的来说,我需要从我的PySpark代码读取/写入dynamo.
How to write PySpark dataframe to DynamoDB table? Did not find much info on this. As per my requirement, i have to write PySpark dataframe to Dynamo db table. Overall i need to read/write to dynamo from my PySpark code.
谢谢.
推荐答案
Ram,无法直接从pyspark做到这一点.如果您正在运行管道软件,则可以通过一系列步骤来完成.这是可以完成的方法:
Ram, there's no way to do that directly from pyspark. If you have pipeline software running it can be done in a series of steps. Here is how it can be done:
-
创建一个临时配置单元表,如
Create a temporary hive table like
CREATE TABLE TEMP(
column1 type,
column2 type...)
STORED AS ORC;
CREATE TABLE TEMP(
column1 type,
column2 type...)
STORED AS ORC;
运行pySpark作业并将数据写入其中
Run your pySpark job and write your data to it
dataframe.createOrReplaceTempView("df")
spark.sql("INSERT OVERWRITE TABLE temp SELECT * FROM df")
dataframe.createOrReplaceTempView("df")
spark.sql("INSERT OVERWRITE TABLE temp SELECT * FROM df")
创建发电机连接器表
CREATE TABLE TEMPTODYNAMO(
column1 type,
column2 type...)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "temp-to-dynamo",
"dynamodb.column.mapping" = "column1:column1,column2:column2...";
CREATE TABLE TEMPTODYNAMO(
column1 type,
column2 type...)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "temp-to-dynamo",
"dynamodb.column.mapping" = "column1:column1,column2:column2...";
用临时表覆盖该表
INSERT OVERWRITE TABLE TEMPTODYNAMO SELECT * FROM TEMP;
更多信息在这里: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/EMR_Hive_Commands.html
这篇关于如何将PySpark数据帧写入DynamoDB表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!