Cypher 加载 CSV 急切且动作持续时间长 [英] Cypher load CSV eager and long action duration

查看:23
本文介绍了Cypher 加载 CSV 急切且动作持续时间长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在加载一个 85K 行的文件 - 19M,服务器有 2 个内核,14GB RAM,运行 centos 7.1 和 oracle JDK 8使用以下服务器配置可能需要 5-10 分钟:

im loading a file with 85K lines - 19M, server has 2 cores, 14GB RAM, running centos 7.1 and oracle JDK 8 and it can take 5-10 minutes with the following server config:

dbms.pagecache.memory=8g                  
cypher_parser_version=2.0  
wrapper.java.initmemory=4096  
wrapper.java.maxmemory=4096

磁盘挂载在/etc/fstab:

disk mounted in /etc/fstab:

UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata            
ext4    defaults,noatime,barrier=0      1  2

将此添加到/etc/security/limits.conf:

added this to /etc/security/limits.conf:

*                soft      memlock         unlimited
*                hard      memlock         unlimited
*                soft      nofile          40000
*                hard      nofile          40000

将此添加到/etc/pam.d/su

added this to /etc/pam.d/su

session         required        pam_limits.so

将此添加到/etc/sysctl.conf:

added this to /etc/sysctl.conf:

vm.dirty_background_ratio = 50
vm.dirty_ratio = 80

通过运行禁用日志:

 sudo e2fsck /dev/sdc1
 sudo tune2fs /dev/sdc1
 sudo tune2fs -o journal_data_writeback /dev/sdc1
 sudo tune2fs -O ^has_journal /dev/sdc1
 sudo e2fsck -f /dev/sdc1
 sudo dumpe2fs /dev/sdc1

除此之外,运行分析器时,我得到了很多渴望",我真的不明白为什么:

besides that, when running a profiler, i get lots of "Eagers", and i really cant understand why:

 PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
 FIELDTERMINATOR '|'
 WITH line limit 0
 MERGE (session :Session { wz_session:line.wz_session })
 MERGE (page :Page { page_key:line.domain+line.page }) 
   ON CREATE SET page.name=line.page, page.domain=line.domain, 
 page.protocol=line.protocol,page.file=line.file


Compiler CYPHER 2.3

Planner RULE

Runtime INTERPRETED

+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator      | Rows | DB Hits | Identifiers         | Other                                                  |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult  |    0 |       0 |                     |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Eager        |    9 |       0 | line, session       |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, session       | MergeNode; line.wz_session; :Session(wz_session)       |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter |    9 |       0 | line                | keep columns line                                      |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Filter       |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Extract      |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV      |    9 |       0 | line                |                                                        |
+---------------+------+---------+---------------------+--------------------------------------------------------+

所有标签和属性都有索引/约束谢谢您的帮助利奥

all the labels and properties have indices / constrains thanks for the help Lior

推荐答案

He Lior,

我们试图在这里解释 Eager Loading:

we tried to explain the Eager Loading here:

Marks 的原始博客文章在这里:http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

And Marks original blog post is here: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

Rik 试图用更简单的术语解释它:

Rik tried to explain it in easier terms:

http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html

我之前读过这个,但直到安德烈斯再次向我解释后才真正理解它:在所有正常操作中,Cypher 延迟加载数据.例如,参见手册中的这个页面 - 它基本上只是在执行操作时尽可能少地加载到内存中.这种懒惰通常是一件非常好的事情.但它也会给你带来很多麻烦——正如迈克尔向我解释的那样:

I had read about this before, but did not really understand it until Andres explained it to me again: in all normal operations, Cypher loads data lazily. See for example this page in the manual - it basically just loads as little as possible into memory when doing an operation. This laziness is usually a really good thing. But it can get you into a lot of trouble as well - as Michael explained it to me:

"Cypher 试图履行不同操作之间的契约一个语句内互不影响.否则你可能处理非确定性行为或无限循环.想象一个像这样的声明:
MATCH (n:Foo) WHERE n.value >100 CREATE (m:Foo {m.value = n.value + 100});

"Cypher tries to honor the contract that the different operations within a statement are not affecting each other. Otherwise you might up with non-deterministic behavior or endless loops. Imagine a statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});

如果这两个语句不会隔离,那么 CREATE 生成的每个节点都会导致 MATCH再次匹配等无限循环.这就是为什么在这种情况下,Cypher急切地将所有 MATCH 语句运行到筋疲力尽,以便所有中间结果被累积并保存(在内存中).

If the two statements would not be isolated, then each node the CREATE generates would cause the MATCH to match again etc. an endless loop. That's why in such cases, Cypher eagerly runs all MATCH statements to exhaustion so that all the intermediate results are accumulated and kept (in memory).

通常大多数操作都不是问题,因为我们大多只匹配少数最多十万个元素

Usually with most operations that's not an issue as we mostly match only a few hundred thousand elements max.

使用 LOAD CSV 导入数据,但是,此操作将拉入 CSV 的所有行(其中可能是数百万),急切地执行所有操作(这可能是数以百万计的创建/合并/匹配)并保留中间结果在内存中以供后续操作.

With data imports using LOAD CSV, however, this operation will pull in ALL the rows of the CSV (which might be millions), execute all operations eagerly (which might be millions of creates/merges/matches) and also keeps the intermediate results in memory to feed the next operations in line.

这也有效地禁用 PERIODIC COMMIT 因为当我们到达语句执行所有创建操作都已经有发生了,并且已经积累了巨大的 tx 状态."

This also disables PERIODIC COMMIT effectively because when we get to the end of the statement execution all create operations will already have happened and the gigantic tx-state has accumulated."

这就是我的加载 csv 查询中发生的事情.MATCH/MERGE/CREATE 导致将一个急切的管道添加到执行计划中,并且它有效地禁用了使用定期提交"的操作批处理.显然,即使使用看似简单的 LOAD CSV 语句,也有不少用户遇到了这个问题.很多时候你可以避免它,但有时你不能."

So that's what's going on my load csv queries. MATCH/MERGE/CREATE caused an eager pipe to be added to the execution plan, and it effectively disables the batching of my operations "using periodic commit". Apparently quite a few users run into this issue even with seemingly simple LOAD CSV statements. Very often you can avoid it, but sometimes you can't."

这篇关于Cypher 加载 CSV 急切且动作持续时间长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆