neo4j-使用neo4j rest graph db批量插入 [英] neo4j - batch insertion using neo4j rest graph db

查看:112
本文介绍了neo4j-使用neo4j rest graph db批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用2.0.1版本.

I'm using version 2.0.1 .

我有数十万个节点需要插入.我的neo4j图数据库位于独立服务器上,我正在通过neo4j其余图数据库库使用RestApi来实现这一目标.

I have like hundred of thousands of nodes that needs to be inserted. My neo4j graph db is on a stand alone server, and I'm using RestApi through the neo4j rest graph db library to achieved this.

但是,我面临性能下降的问题.我将查询分为几批,在一个http调用中发送了500条cypher语句.我得到的结果是:

However, I'm facing a slow performance result. I've chopped my queries into batches, sending 500 cypher statements in a single http call. The result that I'm getting is like:

10:38:10.984 INFO commit
10:38:13.161 INFO commit
10:38:13.277 INFO commit
10:38:15.132 INFO commit
10:38:15.218 INFO commit
10:38:17.288 INFO commit
10:38:19.488 INFO commit
10:38:22.020 INFO commit
10:38:24.806 INFO commit
10:38:27.848 INFO commit
10:38:31.172 INFO commit
10:38:34.767 INFO commit
10:38:38.661 INFO commit

以此类推. 我正在使用的查询如下:

And so on. The query that I'm using is as follows:

MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);

我的代码是这样的:

private RestAPI restAPI;
private RestCypherQueryEngine engine;
private GraphDatabaseService graphDB = new RestGraphDatabase("http://localdomain.com:7474/db/data/");

...

restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
engine = new RestCypherQueryEngine(restAPI);

...

    Transaction tx = graphDB.getRestAPI().beginTx();

    try {
        int ctr = 0;
        while (isExists) {
            ctr++;
            //excute query here through engine.query()
            if (ctr % 500 == 0) {
                tx.success();
                tx.close();
                tx = graphDB.getRestAPI().beginTx();
                LOGGER.info("commit");
            }
        }
        tx.success();
    } catch (FileNotFoundException | NumberFormatException | ArrayIndexOutOfBoundsException e) {
        tx.failure();
    } finally {
        tx.close();            
    }

谢谢!

更新的基准. 抱歉,我所发布的基准测试不够准确,并且不能用于500个查询.我的ctr变量实际上并不是指密码查询的数量.

UPDATED BENCHMARK. Sorry for the confusion, the benchmark that I've posted isn't accurate, and is not for 500 queries. My ctr variable isn't actually referring to the number of cypher queries.

所以现在,我每3秒要查询 500个查询,而且3秒也在不断增加.与嵌入式neo4j相比,它仍然很慢.

So now, I'm having like 500 queries per 3 seconds and that 3 seconds keeps on increasing as well. It's still way slow compared to the embedded neo4j.

推荐答案

如果您必须能够使用Neo4j 2.1.0-M01(尚未在产品中使用它!),您可以从新功能中受益.如果您要像这样创建/生成CSV文件:

If you have to ability to use Neo4j 2.1.0-M01 (don't use it in prod yet!!), you could benefit from new features. If you'd create/generate a CSV file like this:

val1,val2,val3
a_value,another_value,yet_another_value
a,b,c
....

您只需要启动以下代码:

you'd only need to launch the following code:

final GraphDatabaseService graphDB = new RestGraphDatabase("http://server:7474/db/data/");
final RestAPI restAPI = ((RestGraphDatabase) graphDB).getRestAPI();
final RestCypherQueryEngine engine = new RestCypherQueryEngine(restAPI);
final String filePath = "file://C:/your_file_path.csv";
engine.query("USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM '" + filePath
    + "' AS csv MERGE (a{main:csv.val1,prop2:csv.val2}) MERGE (b{main:csv.val3})"
    + " CREATE UNIQUE (a)-[r:relationshipname]->(b);", null);

您必须确保可以从安装服务器的计算机上访问该文件.

You'd have to make sure that the file can be accessed from the machine where your server is installed on.

看看我的服务器插件,它在服务器.如果构建此文件并将其放入plugins文件夹,则可以按以下方式使用java中的插件:

Take a look at my server plugin that does this for you on the server. If you build this and put in the plugins folder, you could use the plugin in java as follows:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
final RequestResult result = restAPI.execute(RequestType.POST, "ext/CSVBatchImport/graphdb/csv_batch_import",
    new HashMap<String, Object>() {
        {
            put("path", "file://C:/.../neo4j.csv");
        }
    });

您还可以在Java REST包装器中使用BatchCallback来提高性能,并且还删除事务样板代码.您可以编写类似于以下内容的脚本:

You can also use a BatchCallback in the java REST wrapper to boost the performance and it removes the transactional boilerplate code as well. You could write your script similar to:

final RestAPI restAPI = new RestAPIFacade("http://server:7474/db/data");
int counter = 0;
List<Map<String, Object>> statements = new ArrayList<>();
while (isExists) {
    statements.add(new HashMap<String, Object>() {
        {
            put("val1", "abc");
            put("val2", "abc");
            put("val3", "abc");
        }
    });
    if (++counter % 500 == 0) {
        restAPI.executeBatch(new Process(statements));
        statements = new ArrayList<>();
    }
}

static class Process implements BatchCallback<Object> {

    private static final String QUERY = "MERGE (a{main:{val1},prop2:{val2}}) MERGE (b{main:{val3}}) CREATE UNIQUE (a)-[r:relationshipname]-(b);";

    private List<Map<String, Object>> params;

    Process(final List<Map<String, Object>> params) {
        this.params = params;
    }

    @Override
    public Object recordBatch(final RestAPI restApi) {
        for (final Map<String, Object> param : params) {
            restApi.query(QUERY, param);
        }
        return null;
    }    
}

这篇关于neo4j-使用neo4j rest graph db批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆