从Databricks(Apache Spark)写入Cosmos DB Graph API [英] Writing to Cosmos DB Graph API from Databricks (Apache Spark)

查看:202
本文介绍了从Databricks(Apache Spark)写入Cosmos DB Graph API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Databricks中有一个DataFrame,我想用它在Cosmos中创建图形,DataFrame中的一行等于Cosmos中的1个顶点.

I have a DataFrame in Databricks which I want to use to create a graph in Cosmos, with one row in the DataFrame equating to 1 vertex in Cosmos.

当我写Cosmos时,我看不到顶点上的任何属性,只是生成的ID.

When I write to Cosmos I can't see any properties on the vertices, just a generated id.

获取数据:

data = spark.sql("select * from graph.testgraph")

配置:

writeConfig = {
 "Endpoint" : "******",
 "Masterkey" : "******",
 "Database" : "graph",
 "Collection" : "TestGraph",
 "Upsert" : "true",
 "query_pagesize" : "100000",
 "bulkimport": "true",
 "WritingBatchSize": "1000",
 "ConnectionMaxPoolSize": "100",
 "partitionkeydefinition": "/id"
}

写给宇宙:

data.write.
  format("com.microsoft.azure.cosmosdb.spark").
  options(**writeConfig).
  save()

推荐答案

下面是将记录插入到cosmos DB中的工作代码. 转到以下站点,单击下载选项,然后选择uber.jar https://search.maven.org/artifact/com.microsoft.azure/azure-cosmosdb-spark_2.3.0_2.11/1.2.2/jar 然后添加您的依赖项

Below is the working code to insert records into cosmos DB. go to the below site, click on the download option and select the uber.jar https://search.maven.org/artifact/com.microsoft.azure/azure-cosmosdb-spark_2.3.0_2.11/1.2.2/jar then add in your dependency

spark-shell --master yarn --executor-cores 5 --executor-memory 10g --num-executors 10 --driver-memory 10g --jars"path/to/jar/dependency/azure-cosmosdb- spark_2.3.0_2.11-1.2.2-uber.jar"-打包"com.google.guava:guava:18.0,com.google.code.gson:gson:2.3.1,com.microsoft.azure:azure -documentdb:1.16.1"

spark-shell --master yarn --executor-cores 5 --executor-memory 10g --num-executors 10 --driver-memory 10g --jars "path/to/jar/dependency/azure-cosmosdb-spark_2.3.0_2.11-1.2.2-uber.jar" --packages "com.google.guava:guava:18.0,com.google.code.gson:gson:2.3.1,com.microsoft.azure:azure-documentdb:1.16.1"

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val data = Seq(
Row(2, "Abb"),
Row(4, "Bcc"),
Row(6, "Cdd")
)

val schema = List(
StructField("partitionKey", IntegerType, true),
StructField("name", StringType, true)
)

val DF = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)

val writeConfig = Map("Endpoint" -> "https://*******.documents.azure.com:443/",
"Masterkey" -> "**************",
"Database" -> "db_name",
"Collection" -> "collection_name",
"Upsert" -> "true",
"query_pagesize" -> "100000",
"bulkimport"-> "true",
"WritingBatchSize"-> "1000",
"ConnectionMaxPoolSize"-> "100",
"partitionkeydefinition"-> "/partitionKey")

DF.write.format("com.microsoft.azure.cosmosdb.spark").mode("overwrite").options(writeConfig).save()

这篇关于从Databricks(Apache Spark)写入Cosmos DB Graph API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆