Neo4j:插入7k节点很慢(Spring Data Neo4j/SpringRestGraphDatabase) [英] Neo4j: Inserting 7k nodes is slow (Spring Data Neo4j / SpringRestGraphDatabase)

查看:469
本文介绍了Neo4j:插入7k节点很慢(Spring Data Neo4j/SpringRestGraphDatabase)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个应用程序,我的用户可以在其中管理字典.一种功能是上传文件以初始化或更新词典的内容.

I'm building an application where my users can manage dictionaries. One feature is uploading a file to initialize or update the dictionary's content.

我首先关注的结构部分是Dictionary -[:CONTAINS]->Word. 从一个空的数据库(Neo4j 1.9.4,但也尝试了2.0.0M5)开始,在分布式环境中(因此使用SpringRestGraphDatabase,但使用localhost进行了测试),通过Spring Data Neo4j 2.3.1对其进行了访问,我正在尝试加载7k个字在1本字典中.但是,在具有核心i7、8Gb RAM和SSD驱动器(ulimit提高到40000)的Linux上,我无法在不到8/9分钟的时间内完成此操作.

The part of the structure I'm focusing on for a start is Dictionary -[:CONTAINS]->Word. Starting from an empty database (Neo4j 1.9.4, but also tried 2.0.0M5), accessed via Spring Data Neo4j 2.3.1 in a distributed environment (therefore using SpringRestGraphDatabase, but testing with localhost), I'm trying to load 7k words in 1 dictionary. However I can't get it done in less than 8/9 minutes on a linux with core i7, 8Gb RAM and SSD drive (ulimit raised to 40000).

我已经阅读了许多有关使用REST加载/插入性能的文章,并且尝试应用发现的建议,但运气不好.由于我的应用程序限制,BatchInserter工具对我来说似乎不是一个好选择.

I've read lots of posts about loading/inserting performance using REST and I've tried to apply the advices I found but without better luck. The BatchInserter tool doesn't seem to be a good option to me due to my application constraints.

我能希望在几秒钟而不是几分钟内加载1万个节点吗?

Can I hope to load 10k nodes in a matter of seconds rather than minutes ?

在阅读完所有内容后,这是我想出的代码:

Here is the code I came up with, after all my readings :

Map<String, Object> dicProps = new HashMap<String, Object>();
dicProps.put("locale", locale);
dicProps.put("category", category);
Dictionary dictionary = template.createNodeAs(Dictionary.class, dicProps);
Map<String, Object> wordProps = new HashMap<String, Object>();
Set<Word> words = readFile(filename); 
for (Word gw : words) {
  wordProps.put("txt", gw.getTxt());
  Word w = template.createNodeAs(Word.class, wordProps);
  template.createRelationshipBetween(dictionary, w, Contains.class, "CONTAINS", true);
}

推荐答案

我通过创建一些CSV文件并从Neo4j读取CSV文件来解决此问题.需要执行以下步骤:

I resolve such problem by just creating some CSV file and after that read it from Neo4j. It is needed to make such steps:

  1. 编写一些获取输入数据并基于它的类,以创建CSV文件(每种节点类型可以是一个文件,甚至可以创建用于建立关系的文件).

  1. Write some class which get input data and base on it create CSV file (it can be one file per node kind or even you can create file which will be used to build relation).

就我而言,我还创建了servlet,该servlet允许Neo4j通过HTTP读取该文件.

In my case I have also create servlet which allow Neo4j to read that file by HTTP.

创建适当的Cypher语句,以读取和解析该CSV文件.我使用了一些示例(如果您使用Spring Data,还请记住有关标签的信息):

Create proper Cypher statements which allow to read and parse that CSV file. There are some samples of which I use (if you use Spring Data also remember about labels):

  • 简单的一个:

  • simple one:

load csv with headers from {fileUrl} as line 
   merge (:UserProfile:_UserProfile {email: line.email})

  • 更复杂:

  • more complicated:

    load csv with headers from {fileUrl} as line 
         match (c:Calendar {calendarId: line.calendarId})
         merge (a:Activity:_Activity {eventId: line.eventId})
    on create set  a.eventSummary = line.eventSummary,
         a.eventDescription = line.eventDescription,
         a.eventStartDateTime = toInt(line.eventStartDateTime),
         a.eventEndDateTime = toInt(line.eventEndDateTime),
         a.eventCreated = toInt(line.eventCreated), 
         a.recurringId = line.recurringId
    merge (a)-[r:EXPORTED_FROM]->c
    return count(r)
    

  • 这篇关于Neo4j:插入7k节点很慢(Spring Data Neo4j/SpringRestGraphDatabase)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆