Neo4j 批量导入“neo4j-admin 导入" OutOfMemoryError:Java 堆空间和 OutOfMemoryError:超出 GC 开销限制 [英] Neo4j bulk import “neo4j-admin import” OutOfMemoryError: Java heap space and OutOfMemoryError: GC overhead limit exceeded

查看:19
本文介绍了Neo4j 批量导入“neo4j-admin 导入" OutOfMemoryError:Java 堆空间和 OutOfMemoryError:超出 GC 开销限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的单机可用资源是:

Total machine memory: 2.00 TB
Free machine memory: 1.81 TB
Max heap memory : 910.50 MB
Processors: 192
Configured max memory: 1.63 TB

我的 file1.csv 文件大小是 600GB

My file1.csv file size is 600GB

我的 csv 文件中的条目数 = 3 000 000 000

Number of entries in my csv file = 3 000 000 000

标题结构

attempt1 
item_col1:ID(label),item_col2,item_col3:IGNORE,item_col4:IGNORE,item_col5,item_col6,item_col7,item_col8:IGNORE
Attempt2
item_col1:ID,item_col2,item_col3:IGNORE,item_col4:IGNORE,item_col5,item_col6,item_col7,item_col8:IGNORE
Attempt3
item_col1:ID,item_col2,item_col3:IGNORE,item_col4:IGNORE,item_col5:LABEL,item_col6,item_col7,item_col8:IGNORE`

Neo4j 版本:3.2.1

Neo4j version: 3.2.1

尝试使用配置组合 1

Tried with Configuration combination 1

 cat ../conf/neo4j.conf | grep "memory"
 dbms.memory.heap.initial_size=16000m
 dbms.memory.heap.max_size=16000m
 dbms.memory.pagecache.size=40g

尝试使用配置组合 2

cat ../conf/neo4j.conf | grep "memory"
dbms.memory.heap.initial_size=900m
dbms.memory.heap.max_size=900m
dbms.memory.pagecache.size=4g

尝试使用配置组合 3

dbms.memory.heap.initial_size=1000m
dbms.memory.heap.max_size=1000m
dbms.memory.pagecache.size=1g

尝试使用配置组合 4

dbms.memory.heap.initial_size=10g
dbms.memory.heap.max_size=10g 
dbms.memory.pagecache.size=10g

尝试使用配置组合 5(已注释)(无输出)

Tried with Configuration combination 5 ( commented) (no output)

   # dbms.memory.heap.initial_size=10g
   # dbms.memory.heap.max_size=10g 
   # dbms.memory.pagecache.size=10g

尝试过的命令

kaushik@machine1:/neo4j/import$ cl
kaushik@machine1:/neo4j/import$ rm -r ../data/databases/
kaushik@machine1:/neo4j/import$ mkdir ../data/databases/
kaushik@machine1:/neo4j/import$ cat ../conf/neo4j.conf | grep active
dbms.active_database=graph.db


kaushik@machine1:/neo4j/import$ ../bin/neo4j-admin import --mode csv --    database social.db --nodes head.csv,file1.csv
Neo4j version: 3.2.1
Importing the contents of these files into /neo4j/data/databases/social.db:
Nodes:
  /neo4j/import/head.csv
  /neo4j/import/file1.csv



Available resources:
Total machine memory: 2.00 TB
Free machine memory: 1.79 TB
Max heap memory : 910.50 MB
Processors: 192
Configured max memory: 1.61 TB

错误 1

Nodes, started 2017-07-14 05:32:51.736+0000
[*NODE:7.63 MB---------------------------------------------------|PROPERTIE|LABEL SCAN--------]    0 ?    0
Done in 40s 439ms
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.csv.reader.Extractors$StringArrayExtractor.extract0(Extractors.java:739)
at org.neo4j.csv.reader.Extractors$ArrayExtractor.extract(Extractors.java:680)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:239)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.deserializeNextFromSource(InputEntityDeserializer.java:138)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:77)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:41)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer$$Lambda$150/1372918763.apply(Unknown Source)
at org.neo4j.unsafe.impl.batchimport

错误 2

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.csv.reader.Extractors$StringArrayExtractor.extract0(Extractors.java:739)
at org.neo4j.csv.reader.Extractors$ArrayExtractor.extract(Extractors.java:680)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:239)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.deserializeNextFromSource(InputEntityDeserializer.java:138)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:77)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:41)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer$$Lambda$150/1372918763.apply(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$0(TicketedProcessing.java:110)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing$$Lambda$154/1949503798.run(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)

错误 3

Nodes, started 2017-07-14 05:39:48.602+0000
[NODE:7.63 MB-----------------------------------------------|PROPER|*LABEL SCAN---------------]    0 ?    0
Done in 42s 140ms
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at org.neo4j.csv.reader.Extractors$StringExtractor.extract0(Extractors.java:328)
at org.neo4j.csv.reader.Extractors$AbstractSingleValueExtractor.extract(Extractors.java:287)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:239)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.deserializeNextFromSource(InputEntityDeserializer.java:138)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:77)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:41)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer$$Lambda$150/310855317.apply(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$0(TicketedProcessing.java:110)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing$$Lambda$154/679112060.run(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)

错误 4

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.csv.reader.Extractors$StringExtractor.extract0(Extractors.java:328)
at org.neo4j.csv.reader.Extractors$AbstractSingleValueExtractor.extract(Extractors.java:287)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:239)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.deserializeNextFromSource(InputEntityDeserializer.java:138)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:77)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:41)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer$$Lambda$118/69048864.apply(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$0(TicketedProcessing.java:110)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing$$Lambda$122/951451297.run(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237) 

错误 5

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at org.neo4j.csv.reader.Extractors$StringExtractor.extract0(Extractors.java:328)
at org.neo4j.csv.reader.Extractors$AbstractSingleValueExtractor.extract(Extractors.java:287)
at org.neo4j.csv.reader.BufferedCharSeeker.tryExtract(BufferedCharSeeker.java:239)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.deserializeNextFromSource(InputEntityDeserializer.java:138)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:77)
at org.neo4j.unsafe.impl.batchimport.input.csv.InputEntityDeserializer.fetchNextOrNull(InputEntityDeserializer.java:41)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer.lambda$new$0(ParallelInputEntityDeserializer.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.ParallelInputEntityDeserializer$$Lambda$118/950986004.apply(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing.lambda$submit$0(TicketedProcessing.java:110)
at org.neo4j.unsafe.impl.batchimport.staging.TicketedProcessing$$Lambda$122/151277029.run(Unknown Source)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:237)

一般来说,如果你能解释第 9 章.性能 9.1.举个例子进行内存调优,对很多初学者很有帮助.https://neo4j.com/docs/operations-manual/current/performance/

In general if you could explain the Chapter 9. Performance 9.1. Memory tuning with an example, it will be helpful for lot of beginners. https://neo4j.com/docs/operations-manual/current/performance/

你能举一个例子来计算 dbms.memory.heap.initial_size, dbms.memory.heap.max_size, dbms.memory.pagecache.size 对于一个 500 GB 的样本数据集,其中有 30 亿个条目,具有 10 个相同大小的列在 1TB RAM 机器和 100 个处理器中.

could you give an example to calculate dbms.memory.heap.initial_size, dbms.memory.heap.max_size, dbms.memory.pagecache.size for a sample data set of 500 GB with 3Billion entries having 10 columns of equal size in 1TB RAM machine and 100 processors.

推荐答案

如果你只做节点,其实计算很简单:

Actually the calculation is pretty simple if you're only doing nodes :

3 * 10^9 * 20 / 1024^3

所以我会选择至少 55Gb 的堆大小.你可以试试吗?

So I would go with a heap size of at least 55Gb. Can you try that ?

问候,汤姆

这篇关于Neo4j 批量导入“neo4j-admin 导入" OutOfMemoryError:Java 堆空间和 OutOfMemoryError:超出 GC 开销限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆