密码加载CSV渴望且动作持续时间长 [英] Cypher load CSV eager and long action duration

查看:64
本文介绍了密码加载CSV渴望且动作持续时间长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

im加载具有85K行的文件-19M, 服务器具有2个内核,14GB RAM,运行centos 7.1和oracle JDK 8 并且使用以下服务器配置可能需要 5-10分钟:

im loading a file with 85K lines - 19M, server has 2 cores, 14GB RAM, running centos 7.1 and oracle JDK 8 and it can take 5-10 minutes with the following server config:

dbms.pagecache.memory=8g                  
cypher_parser_version=2.0  
wrapper.java.initmemory=4096  
wrapper.java.maxmemory=4096

在/etc/fstab中安装的磁盘:

disk mounted in /etc/fstab:

UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata            
ext4    defaults,noatime,barrier=0      1  2

将此添加到/etc/security/limits.conf:

added this to /etc/security/limits.conf:

*                soft      memlock         unlimited
*                hard      memlock         unlimited
*                soft      nofile          40000
*                hard      nofile          40000

将此添加到/etc/pam.d/su

added this to /etc/pam.d/su

session         required        pam_limits.so

将此添加到/etc/sysctl.conf:

added this to /etc/sysctl.conf:

vm.dirty_background_ratio = 50
vm.dirty_ratio = 80

通过运行禁用日记:

 sudo e2fsck /dev/sdc1
 sudo tune2fs /dev/sdc1
 sudo tune2fs -o journal_data_writeback /dev/sdc1
 sudo tune2fs -O ^has_journal /dev/sdc1
 sudo e2fsck -f /dev/sdc1
 sudo dumpe2fs /dev/sdc1

除此之外, 运行探查器时,我遇到很多渴望者",而我真的不明白为什么:

besides that, when running a profiler, i get lots of "Eagers", and i really cant understand why:

 PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
 FIELDTERMINATOR '|'
 WITH line limit 0
 MERGE (session :Session { wz_session:line.wz_session })
 MERGE (page :Page { page_key:line.domain+line.page }) 
   ON CREATE SET page.name=line.page, page.domain=line.domain, 
 page.protocol=line.protocol,page.file=line.file


Compiler CYPHER 2.3

Planner RULE

Runtime INTERPRETED

+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator      | Rows | DB Hits | Identifiers         | Other                                                  |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult  |    0 |       0 |                     |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Eager        |    9 |       0 | line, session       |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, session       | MergeNode; line.wz_session; :Session(wz_session)       |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter |    9 |       0 | line                | keep columns line                                      |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Filter       |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Extract      |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV      |    9 |       0 | line                |                                                        |
+---------------+------+---------+---------------------+--------------------------------------------------------+

所有标签和属性都有索引/约束 谢谢您的帮助 骗子

all the labels and properties have indices / constrains thanks for the help Lior

推荐答案

He Lior,

我们试图在这里解释渴望加载:

we tried to explain the Eager Loading here:

和Marks的原始博客文章在这里: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

And Marks original blog post is here: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

Rik试图用更简单的术语来解释它:

Rik tried to explain it in easier terms:

http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html

我以前已经阅读过有关此内容的信息,但直到Andres再次向我解释时才真正理解它:在所有正常操作中,Cypher延迟加载数据.例如,请参见手册中的此页面-进行操作时,它基本上只会尽可能少地加载到内存中.这种懒惰通常是一件非常好的事情.但这也会给您带来很多麻烦-正如迈克尔向我解释的那样:

I had read about this before, but did not really understand it until Andres explained it to me again: in all normal operations, Cypher loads data lazily. See for example this page in the manual - it basically just loads as little as possible into memory when doing an operation. This laziness is usually a really good thing. But it can get you into a lot of trouble as well - as Michael explained it to me:

"Cypher尝试履行不同运营的合同 声明中的内容不会互相影响.否则你可能 带有不确定性行为或无止境的循环.想象一个 这样的声明:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});

"Cypher tries to honor the contract that the different operations within a statement are not affecting each other. Otherwise you might up with non-deterministic behavior or endless loops. Imagine a statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});

如果这两个语句不是 隔离,那么CREATE生成的每个节点都将导致MATCH 再次匹配,等等.一个无尽的循环.这就是为什么在这种情况下,Cypher 急切地运行所有的MATCH语句,以使所有 中间结果被累积并保存(在内存中).

If the two statements would not be isolated, then each node the CREATE generates would cause the MATCH to match again etc. an endless loop. That's why in such cases, Cypher eagerly runs all MATCH statements to exhaustion so that all the intermediate results are accumulated and kept (in memory).

通常 大多数操作都不是问题,因为我们只匹配少数几个 十万个以下

Usually with most operations that's not an issue as we mostly match only a few hundred thousand elements max.

使用LOAD CSV导入数据时, 但是,此操作将提取CSV的所有行(其中 可能是数百万个),急切地执行所有操作(可能是 数百万的创建/合并/匹配),并保留中间 导致内存中的数据被送入行中的下一个操作.

With data imports using LOAD CSV, however, this operation will pull in ALL the rows of the CSV (which might be millions), execute all operations eagerly (which might be millions of creates/merges/matches) and also keeps the intermediate results in memory to feed the next operations in line.

这也是 有效地禁用PERIODIC COMMIT,因为当我们到达末尾时 执行所有创建操作的语句将已经具有 发生了,巨大的TX状态已经累积."

This also disables PERIODIC COMMIT effectively because when we get to the end of the statement execution all create operations will already have happened and the gigantic tx-state has accumulated."

这就是我的加载csv查询的过程. MATCH/MERGE/CREATE导致一个渴望的管道被添加到执行计划中,并且有效地禁用了使用定期提交"操作的批处理.显然,即使看似简单的LOAD CSV语句,也有很多用户遇到了此问题.通常,您可以避免这种情况,但有时却无法避免."

So that's what's going on my load csv queries. MATCH/MERGE/CREATE caused an eager pipe to be added to the execution plan, and it effectively disables the batching of my operations "using periodic commit". Apparently quite a few users run into this issue even with seemingly simple LOAD CSV statements. Very often you can avoid it, but sometimes you can't."

这篇关于密码加载CSV渴望且动作持续时间长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆