OrientDB ETL在一个文件中的顶点加载CSV并在另一个文件中的边缘加载CSV [英] OrientDB ETL loading CSV with vertices in one file and edges in another

查看:248
本文介绍了OrientDB ETL在一个文件中的顶点加载CSV并在另一个文件中的边缘加载CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个CSV文件中的一些数据,一个包含顶点,另一个包含边缘的文件位于另一个文件中.我正在研究如何使用ETL进行设置,并且已经很接近了,但还不完全到位-它可以正常工作,但是我的边缘具有属性,并且不确定它们是否正确加载. 此问题很有帮助,但我仍然失踪东西...

I have some data that is in 2 CSV files, one contains the vertices and the other file contains the edges are in the other file. I'm working out how to set this up using ETL and am close but not quite there yet--it mostly works but my edges have properties and I'm not sure that they're loading right. This question was helpful but I'm still missing something...

这是我的数据:

vertices.csv :

label,data,date
v01,0.1234,2015-01-01
v02,0.5678,2015-01-02
v03,0.9012,2015-01-03

edges.csv :

u,v,weight,date
v01,v02,12.4,2015-06-17
v02,v03,17.9,2015-09-14

我使用以下命令导入顶点:

I import my vertices using this:

commonVertices.json :

{
"begin": [ 
             { "let": { "name":       "$filePath",  
                        "expression": "$fileDirectory.append($fileName)" 
                      } 
             },
         ],
"config": { "log": "info"},
"source": { "file": { "path": "$filePath" } },
"extractor": { "csv": { "ignoreEmptyLines": true,
                        "nullValue": "N/A",
                        "dateFormat": "yyyy-mm-dd"
                      }
             },
"transformers": [
                    { "vertex": { "class": "myVertex" } },
                    { "code":   { "language": "Javascript",
                                  "code":     "print('    Current record: ' + record); record;" }
                    }
                ],
"loader": { "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ { "name": "myVertex", "extends", "V" },
                       ],
            "indexes": []
            }
          }
}

vertices.json :

{ "config": { "log":           "info",
              "fileDirectory": "./",
              "fileName":      "vertices.csv"
            }
}

commonEdges.json :

{
    "begin": [
        { "let": { "name": "$filePath",
                   "expression": "$fileDirectory.append($fileName )"
                 }
        },
    ],

    "config": { "log": "info"
              },

    "source": { "file": { "path": "$filePath" } },

    "extractor": { "csv": { "ignoreEmptyLines": true,
                            "nullValue": "N/A",
                            "dateFormat": "yyyy-mm-dd"
                          }
                 },

    "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }
            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

    "loader": {
        "orientdb": {
            "dbURL": "plocal:my_orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "useLightweightEdges": false,
            "classes": [
                { "name": "myEdge",   "extends", "E" }
            ],
            "indexes": []
        }
    }
}

edges.json :

{
    "config": {
        "log": "info",
        "fileDirectory": "./",
        "fileName": "edges.csv"
    }
}

我正在使用oetl.sh来运行它,

I am running it with oetl.sh like this:

$ oetl.sh vertices.json commonVertices.json
$ oetl.sh edges.json commonEdges.json

一切正常,但是当我查询边缘时...我是OrientDB的新手,所以也许它正在获取边缘中的属性,但是当我查询边缘时,我看不到weight和date字段:

Everything runs, but when I query the edges... I'm new to OrientDB, so maybe it is getting the properties in my edges, but when I query the edges I don't see the weight and date fields:

orientdb {db=my_orientdb}> SELECT FROM myEdge
+----+-----+------+-----+-----+
|#   |@RID |@CLASS|out  |in   |
+----+-----+------+-----+-----+
|0   |#33:0|myEdge|#25:0|#26:0|
|1   |#34:0|myEdge|#26:0|#27:0|
+----+-----+------+-----+-----+

顶点表包含来自我的edges.csv的[weight]字段,并且[date]字段正在以一种怪异的方式被破坏.每月的那一天会被edge.csv文件的那一天覆盖,这是不希望的,但是让我感到奇怪的是,月份本身也没有变化:

The vertex table contains the [weight] field from my edges.csv and the [date] field is getting clobbered in a weird way. The day of the month is getting overwritten to the day from the edge.csv file, which is undesirable, but it's odd to me that the month itself isn't also getting change:

orientdb {db=my_orientdb}> SELECT FROM myVertex
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|#   |@RID |@CLASS  |data  |date               |label|weight|out_myEdge|in_myEdge|
+----+-----+--------+------+-------------------+-----+------+----------+---------+
|0   |#25:0|myVertex|0.1234|2015-01-17 00:06:00|v01  |12.4  |[#33:0]   |         |
|1   |#26:0|myVertex|0.5678|2015-01-14 00:09:00|v02  |17.9  |[#34:0]   |[#33:0]  |
|2   |#27:0|myVertex|0.9012|2015-01-03 00:01:00|v03  |      |          |[#34:0]  |
+----+-----+--------+------+-------------------+-----+------+----------+---------+

我敢肯定这可能是一个简单的调整,任何帮助都会很棒!

I'm sure it's probably a simple tweak, any help would be great!

推荐答案

在边缘转换器中,使用 edgeFields 绑定边缘中的属性.示例:

In edge transformer use edgeFields to bind properties in edges. Example:

 "transformers": [
            { "merge":  { "joinFieldName": "u", "lookup": "myVertex.label" } },
            { "edge":   { "class":         "myEdge",
                          "joinFieldName": "v",
                          "lookup":        "myVertex.label",
                          "edgeFields": { "weight": "${input.weight}", "date": "${input.date}" },
                          "direction":     "out",
                          "unresolvedLinkAction": "NOTHING"
                        }

            },
            { "field": { "fieldNames": ["u", "v"], "operation": "remove" } }
        ],

希望有帮助.

这篇关于OrientDB ETL在一个文件中的顶点加载CSV并在另一个文件中的边缘加载CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆