如何使用OrientDB ETL仅创建边缘 [英] How to use OrientDB ETL to create edges only
问题描述
我有两个CSV文件:
首先包含约500M条以下格式的记录
First containing ~ 500M records in the following format
id,名称
10000023432,Tom用户
13943423235,Blah Person
id,name
10000023432,Tom User
13943423235,Blah Person
第二个以以下格式包含〜1.5B朋友关系
Second containing ~ 1.5B friend relationships in the following format
fromId,toId
10000023432,13943423235
fromId,toId
10000023432,13943423235
我使用OrientDB ETL工具从第一个CSV文件创建顶点.现在,我只需要创建边缘以在它们之间建立友谊联系即可.
I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.
到目前为止,我已经尝试了ETL json文件的多种配置,最新的就是这个配置:
I have tried multiple configuration of the ETL json file so far, the latest being this one:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}
但是不幸的是,除了创建边缘之外,这还会创建具有"from"和"to"属性的顶点.
But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.
当我尝试移除顶点转换器时,ETL过程会引发错误:
When I try removing the vertex transformer, ETL process throws an error:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more
我在这里想念什么?
推荐答案
您可以使用以下ETL转换器导入边:
You can import the edges with these ETL transformers:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]
合并"转换器会将当前的csv行与相关的Person记录结合在一起(这有点奇怪,但是出于某种原因,这需要将fromId与源person关联起来.)
The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).
字段"转换器将删除由合并部分添加的csv字段.您也可以尝试在不带现场"变压器的情况下进行导入,以了解两者之间的区别.
The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.
这篇关于如何使用OrientDB ETL仅创建边缘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!