导入关系的阶段出了什么问题? [英] What's going wrong with the phase of importing relationships?
问题描述
我终于克服了导入节点的阶段.现在,我正在尝试导入关系.可能存在1B关系.
I finally conquer the phase of importing nodes. Now I am trying to import relationships. There might be 1B relationships.
#!/bin/bash
cd /home/luning/neo4j-enterprise-2.2.0-RC01-unix/neo4j-enterprise-2.2.0-RC01/bin
users="/data/weibo/user-header.csv"
for i in /data/weibo/users/*
do
users=$users,$i
done
edges=/data/weibo/edge-header.csv,/data/weibo/ego/000000_0
./neo4j-import --stacktrace --into ../data/weibo_bak.db --nodes:User $users --relationships:Follow $edges --delimiter TAB --quote \' --bad-tolerance 50000 --id-type STRING
但是总是说节点丢失.难以理解的是,为两个试验导入相同的文件,这给了我不同的丢失节点. 1.第一次
But there always says node missing. Unintelligibly, with importing same file for two trials, it gave me different missing node. 1. First Time
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:56)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
2.第二次
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:59)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep$2.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
但是对于这两个节点1587438071和2765561213,我可以确保它们都在我的文件中.因为我可以找到它们.
But for these two nodes 1587438071 and 2765561213, I can make sure their are in my files. Because i can find them.
[luning@pinnacle data]$ grep 1587438071 /data/weibo/users/*
/data/weibo/users/000024_0:1587438071 琬童沛胜 浙江 杭州 http://tp4.sinaimg.cn/1587438071/50/40024579617/0 f 147 60 272 false LV2 31 一举成名| 正常 80 2014-02-17 04:17:38
[luning@pinnacle data]$ grep 1589699375 /data/weibo/users/*
/data/weibo/users/000010_0:1589699375 在行动Isabella 吉林 http://tp4.sinaimg.cn/1589699375/50/5633181098/0 女 297 438 4729 1981-01-17 false LV7 2014-08-13 21:43:34 2014-01-28 10:18:52
那么,有谁能弄清楚它将如何发生?
So, anybody who can figure it out how it would happen?
推荐答案
可能是您的节点输入文件包含未正确关闭引号的字段,其中某些行被其他行吞噬"了,有效地不导入那些节点(如果字段的对齐恰好那样结束,否则抛出异常).或者面对这些汉字,解析器可能出了点问题.
Could be that your node input file(s) contains fields that doesn't close their quotes properly, which would have some lines "eaten" by other lines, effectively not importing those nodes (if the alignment of the fields would happen to end up like that, otherwise throw exception). Or it could be something wrong with the parser in the face of these chinese characters.
是否有机会与我(解析器的主要作者和导入工具)共享您的输入数据以进行调查?
Any chance you could share you input data with me (the main author of parser and the import tool) for investigation?
这篇关于导入关系的阶段出了什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!