使用OrientDB ETL将简单的csv文件导入图形的最简单方法 [英] Easiest way to import a simple csv file to a graph with OrientDB ETL

查看:178
本文介绍了使用OrientDB ETL将简单的csv文件导入图形的最简单方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将csv中的一个非常简单的有向图文件导入OrientDB。具体而言,该文件是SNAP集合 https://snap.stanford的roadNet-PA数据集.edu / data / roadNet-PA.html 。文件的第一行如下:

I would like to import a very simple directed graph file in csv to OrientDB. Concretely, the file is the roadNet-PA dataset from the SNAP collection https://snap.stanford.edu/data/roadNet-PA.html. The first lines of the file are as follows:

# Directed graph (each unordered pair of nodes is saved once)
# Pennsylvania road network
# Nodes: 1088092 Edges: 3083796
# FromNodeId    ToNodeId
0       1
0       6309
0       6353
1       0
6353    0
6353    6354

只有一种类型的顶点(道路交叉点)并且边缘没有信息(我认为OrientDB轻量级边缘是对此的最佳选择)。还要注意,顶点之间用制表符隔开。

There is only one type of vertex (a road intersection) and edges have no information (I suppose OrientDB lightweight edges are the best option for this). Note also that vertices are spaced with tabs.

我试图创建一个简单的etl来成功导入文件。这是etl:

I've tried to create a simple etl to import the file with no success. Here is the etl:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "   ", "skipFrom": 1, "skipTo": 4 } },
    { "vertex": { "class": "Intersection" } },
    { "edge": { "class": "Road" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

etl可以工作,但不能像我一样导入文件期望。我想问题出在变压器上。我的想法是逐行读取csv并创建和边缘连接两个顶点,但是我不确定如何在etl文件中表达这一点。有任何想法吗?

The etl works but it does not import the file as I expect. I suppose the problem is in the transformers. My idea is to read the csv line by line and create and edge connecting both vertices, but I'm not sure how to express this in an etl file. Any ideas?

推荐答案

尝试一下:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "\t", "skipFrom": 1, "skipTo": 4,
               "columnsOnFirstLine": false, 
               "columns":["id", "to"] } },
    { "vertex": { "class": "Intersection" } },
    { "merge": { "joinFieldName":"id", "lookup":"Intersection.id" } },
    { "edge": {
       "class": "Road",
       "joinFieldName": "to",
       "lookup": "Intersection.id",
       "unresolvedLinkAction": "CREATE"
      }
    },
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "wal": false,
       "batchCommit": 1000,
       "tx": true,
       "txUseLog": false,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

要加快加载速度,建议您关闭服务器,并使用 plocal:而不是 remote:导入ETL。示例用以下项替换现有项:

To speedup loading I suggest you to shutdown the server, and import the ETL by using "plocal:" instead of "remote:". Example replacing the existent with:

       "dbURL": "plocal:/orientdb/databases/roads",

这篇关于使用OrientDB ETL将简单的csv文件导入图形的最简单方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆