如何使用OrientDB ETL仅创建边缘 [英] How to use OrientDB ETL to create edges only

查看:134
本文介绍了如何使用OrientDB ETL仅创建边缘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个CSV文件:

首先包含约500M条以下格式的记录

First containing ~ 500M records in the following format

id,名称
10000023432,Tom用户
13943423235,Blah Person

id,name
10000023432,Tom User
13943423235,Blah Person

第二个以以下格式包含〜1.5B朋友关系

Second containing ~ 1.5B friend relationships in the following format

fromId,toId
10000023432,13943423235

fromId,toId
10000023432,13943423235

我使用OrientDB ETL工具从第一个CSV文件创建顶点.现在,我只需要创建边缘以在它们之间建立友谊联系即可.

I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.

到目前为止,我已经尝试了ETL json文件的多种配置,最新的就是这个配置:

I have tried multiple configuration of the ETL json file so far, the latest being this one:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

但是不幸的是,除了创建边缘之外,这还会创建具有"from"和"to"属性的顶点.

But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.

当我尝试移除顶点转换器时,ETL过程会引发错误:

When I try removing the vertex transformer, ETL process throws an error:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

我在这里想念什么?

推荐答案

您可以使用以下ETL转换器导入边:

You can import the edges with these ETL transformers:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

合并"转换器会将当前的csv行与相关的Person记录结合在一起(这有点奇怪,但是出于某种原因,这需要将fromId与源person关联起来.)

The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).

字段"转换器将删除由合并部分添加的csv字段.您也可以尝试在不带现场"变压器的情况下进行导入,以了解两者之间的区别.

The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.

这篇关于如何使用OrientDB ETL仅创建边缘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆