将大型CSV文件加载到核心数据的最快方法是什么 [英] What is the fastest way to load a large CSV file into core data

查看:96
本文介绍了将大型CSV文件加载到核心数据的最快方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结论

问题已关闭,我认为。

看起来问题与方法无关,但XCode未清除

在所有这些测试之后,所使用的sqlite文件仍然是未编入索引的第一个文件......

当心XCode 4.3.2,我只有问题,清洁不清洗,或添加文件到项目不会自动添加到捆绑资源...

感谢不同的答案..

Conclusion
Problem closed, I think.
Looks like the problem had nothing to do with the methodology, but that the XCode did not clean the project correctly in between builds.
It looks like after all those tests, the sqlite file that was being used was still the very first one that wasn't indexed......
Beware of XCode 4.3.2, I have nothing but problems with Clean not cleaning, or adding files to project not automatically being added to the bundle resources...
Thanks for the different answers..

更新3

由于我邀请任何人只是尝试相同的步骤,看看他们是否得到相同的结果,让我详细说明我做了什么:

我从空白项目开始

我定义了一个具有一个实体的数据模型,3个属性(2个字符串,1个浮点)

第一个字符串已建立索引




在finishLaunchingWithOptions,我调用:

Update 3
Since I invite anybody to just try the same steps to see if they get the same results, let me detail what I did:
I start with blank project
I defined a datamodel with one Entity, 3 attributes (2 strings, 1 float)
The first string is indexed


In did finishLaunchingWithOptions, I am calling:

[self performSelectorInBackground:@selector(populateDB) withObject:nil];

populateDb的代码如下:

The code for populateDb is below:

-(void)populateDB{
NSLog(@"start");
NSPersistentStoreCoordinator *coordinator = [self persistentStoreCoordinator];
NSManagedObjectContext *context;
if (coordinator != nil) {
    context = [[NSManagedObjectContext alloc] init];
    [context setPersistentStoreCoordinator:coordinator];
}

NSString *filePath = [[NSBundle mainBundle] pathForResource:@"input" ofType:@"txt"];  
if (filePath) {  
    NSString * myText = [[NSString alloc]
                               initWithContentsOfFile:filePath
                               encoding:NSUTF8StringEncoding
                               error:nil];
    if (myText) {
        __block int count = 0;


        [myText enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) {
            line=[line stringByReplacingOccurrencesOfString:@"\t" withString:@" "];
            NSArray *lineComponents=[line componentsSeparatedByString:@" "];
            if(lineComponents){
                if([lineComponents count]==3){
                    float f=[[lineComponents objectAtIndex:0] floatValue];
                    NSNumber *number=[NSNumber numberWithFloat:f];
                    NSString *string1=[lineComponents objectAtIndex:1];
                    NSString *string2=[lineComponents objectAtIndex:2];
                    NSManagedObject *object=[NSEntityDescription insertNewObjectForEntityForName:@"Bigram" inManagedObjectContext:context];
                    [object setValue:number forKey:@"number"];
                    [object setValue:string1 forKey:@"string1"];
                    [object setValue:string2 forKey:@"string2"];
                    NSError *error;
                    count++;
                    if(count>=1000){
                        if (![context save:&error]) {
                            NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]);
                        }
                        count=0;

                    }
                }
            }



        }];
        NSLog(@"done importing");
        NSError *error;
        if (![context save:&error]) {
            NSLog(@"Whoops, couldn't save: %@", [error localizedDescription]);
        }

    }  
}
NSLog(@"end");
}

其他都是默认核心数据代码,没有添加。

我在模拟器中运行。

我转到〜/ Library / Application Support / iPhone Simulator / 5.1 /应用程序//文档

有一个sqlite文件生成


我将其复制到我的包中



我注释掉对populateDb的调用


我编辑persistentStoreCoordinator,以便在首次运行时将sqlite文件从bundle复制到文档

Everything else is default core data code, nothing added.
I run that in the simulator.
I go to ~/Library/Application Support/iPhone Simulator/5.1/Applications//Documents
There is the sqlite file that is generated

I take that and I copy it in my bundle

I comment out the call to populateDb

I edit persistentStoreCoordinator to copy the sqlite file from bundle to documents at first run

- (NSPersistentStoreCoordinator *)persistentStoreCoordinator 
{
@synchronized (self)
{
    if (__persistentStoreCoordinator != nil)
        return __persistentStoreCoordinator;

    NSString *defaultStorePath = [[NSBundle mainBundle] pathForResource:@"myProject" ofType:@"sqlite"];
    NSString *storePath = [[[self applicationDocumentsDirectory] path] stringByAppendingPathComponent: @"myProject.sqlite"];

    NSError *error;
    if (![[NSFileManager defaultManager] fileExistsAtPath:storePath]) 
    {
        if ([[NSFileManager defaultManager] copyItemAtPath:defaultStorePath toPath:storePath error:&error])
            NSLog(@"Copied starting data to %@", storePath);
        else 
            NSLog(@"Error copying default DB to %@ (%@)", storePath, error);
    }

    NSURL *storeURL = [NSURL fileURLWithPath:storePath];

    __persistentStoreCoordinator = [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:[self managedObjectModel]];

    NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:
                             [NSNumber numberWithBool:YES], NSMigratePersistentStoresAutomaticallyOption,
                             [NSNumber numberWithBool:YES], NSInferMappingModelAutomaticallyOption, nil];

    if (![__persistentStoreCoordinator addPersistentStoreWithType:NSSQLiteStoreType configuration:nil URL:storeURL options:options error:&error]) 
    {

        NSLog(@"Unresolved error %@, %@", error, [error userInfo]);
        abort();
    }    

    return __persistentStoreCoordinator;
}    
}



从模拟器,我检查〜/ Library /应用程序支持/ iPhone模拟器/ 5.1 /应用程序/现在已删除
我重新生成并再次启动

正如预期的那样,sqlite文件被复制到〜 / Library / Application Support / iPhone Simulator / 5.1 /应用程序//文档


I remove the app from the simulator, I check that ~/Library/Application Support/iPhone Simulator/5.1/Applications/ is now removed
I rebuild and launch again
As expected, the sqlite file is copied over to ~/Library/Application Support/iPhone Simulator/5.1/Applications//Documents

但是文件的大小比包中的大!
此外,使用这样的谓词做一个简单的查询:[NSPredicate predicateWithFormat:@string1 ==%@,string1];清楚地显示string1不再被索引




接下来,我创建一个新版本的数据模型,无意义的更新,只是为了强制轻量级迁移

如果在模拟器上运行,迁移需要几秒钟,数据库的大小翻倍,同一查询现在只需不到一秒钟而不是分钟。

这将解决我的问题,强制迁移,但同样的迁移在iPad上需要3分钟,并发生在前台。

现在的帽子我现在的位置,我最好的解决方案仍然要防止索引被删除,任何其他导入解决方案在启动时只需要太多的时间。

让我知道如果你需要更多的澄清...

However the size of the file is smaller than in the bundle, significantly! Also, doing a simple query with a predicate like this predicate = [NSPredicate predicateWithFormat:@"string1 == %@", string1]; clearly shows that string1 is not indexed anymore

Following that, I create a new version of the datamodel, with a meaningless update, just to force a lightweight migration
If run on the simulator, the migration takes a few seconds, the database doubles in size and the same query now takes less than a second to return instead of minutes.
This would solve my problem, force a migration, but that same migration takes 3 minutes on the iPad and happens in the foreground.
So hat's where I am at right now, the best solution for me would still be to prevent the indexes to be removed, any other importing solution at launch time just takes too much time.
Let me know if you need more clarifications...

更新2

所以我迄今为止最好的结果是使用从具有相似数据模型的快速工具生成的sqlite文件生成核心数据库,但在生成sqlite文件时没有设置索引。然后,我在索引设置的核心数据应用程序中导入此sqlite文件,并允许轻量级迁移。在新iPad上的2百万记录,此迁移仍需要3分钟。最终的应用程序应该有这个数量的记录的5倍,所以我们仍然在看长时间的处理时间。
如果我走的路线,新的问题是:可以在后台执行轻量级迁移

Update 2
So the best result I have had so far is to seed the core data database with the sqlite file produced from a quick tool with similar data model, but without the indexes set when producing the sqlite file. Then, I import this sqlite file in the core data app with the indexes set, and allowing for a lightweight migration. For 2 millions record on the new iPad, this migration stills take 3 minutes. The final app should have 5 times this number of records, so we're still looking at a long long processing time. If I go that route, the new question would be: can a lightweight migration be performed in the background?

更新 br>
我的问题不是如何创建一个工具来填充Core Data数据库,然后将sqlite文件导入我的应用程序。
我知道如何做到这一点,我做了无数次。 br>但是到目前为止,我还没有意识到这种方法可能会有一些副作用:在我的情况下,在导入数据库中的索引属性清楚地得到'unindexed'当导入sqlite文件的方式。

如果您能够验证任何索引的数据在这样的转移后仍然建立索引,我有兴趣知道您如何进行,否则什么是最有效的种子数据的策略。



原始

我有一个大型CSV文件列,字符串和浮动。
这是一个iOS应用程序。


我需要在第一次加载应用程序时将其加载到核心数据中。


应用程序在数据可用之前几乎不起作用,所以加载时间很重要,因为第一次用户显然不希望应用程序在运行它之前加载20分钟。


现在,我的当前代码需要20分钟在新的iPad上处理一个2百万行csv文件。


我使用一个后台上下文来锁定UI,并保存上下文每1000记录


我的第一个想法是在模拟器上生成数据库,然后在首次启动时将其复制/粘贴到文档文件夹中,因为这是常见的非官方播种方式大数据库。不幸的是,索引似乎没有幸免于这样的转移,尽管数据库只在几秒钟后可用,性能是可怕的,因为我的索引丢失。我发布了一个关于索引的问题,但是似乎没有一个很好的答案。



所以我正在寻找,或者:

I have a large CSV file (millions of lines) with 4 columns, strings and floats. This is for an iOS app.

I need this to be loaded into core data the first time the app is loaded.

The app is pretty much non functional until the data is available, so loading time matters, as a first time user obviously does not want the app to take 20 minutes to load before being able to run it.

Right now, my current code takes 20 min on the new iPad to process a 2 millions line csv file.

I am using a background context to not lock the UI, and save the context every 1,000 records

The first idea I had was to generate the database on the simulator, then to copy/paste it in the document folder at first launch, as this is the common non official way of seeding a large database. Unfortunately, the indexes don't seem to survive such a transfer, and although the database was available after just a few seconds, performance is terrible because my indexes were lost. I posted a question about the indexes already, but there doesn't seem to be a good answer to that.

So what I am looking for, either:


  • 如果数据库已预加载,则可以提高在核心数据中加载数百万条记录的性能

  • 在首次启动时移动,保留索引的方式

  • 处理此类情况的最佳做法。我不记得使用任何需要我在第一次使用之前等待x分钟的应用程序(但也许The Daily,这是一个可怕的经历)。

  • 任何创造性的方式,使用户等待没有他意识到:通过教程等背景导入...

  • 不使用核心数据?

  • ...

  • a way to improve performance on loading millions of records in core data
  • if the database is pre-loaded and moved at first startup, a way to keep my indexes
  • best practices for handling this kind of scenario. I don't remember using any app that requires me to wait for x minutes before first use (but maybe The Daily, and that was a terrible experience).
  • Any creative way to make the user wait without him realizing it: background import while going through tutorial, etc...
  • Not Using Core Data?
  • ...

推荐答案

您的数据库使用在Cocoa中编写的离线应用程序(例如,命令行实用程序),该程序在OS X上运行,并使用iOS使用的相同的Core Data框架。你不需要担心索引幸存或任何东西 - 输出是一个核心数据生成的.sqlite数据库文件,直接并可立即由iOS应用程序使用。

Pre-generate your database using an offline application (say, a command-line utility) written in Cocoa, that runs on OS X, and uses the same Core Data framework that iOS uses. You don't need to worry about "indexes surviving" or anything -- the output is a Core Data-generated .sqlite database file, directly and immediately usable by an iOS app.

只要你可以做离线DB数据库,它是迄今为止最好的解决方案。我已经成功地使用这种技术预先生成的数据库为iOS部署自己。请检查我以前的问题/答案,了解详情。

As long as you can do the DB generation off-line, it's the best solution by far. I have successfully used this technique to pre-generated databases for iOS deployment myself. Check my previous questions/answers for a bit more detail.

这篇关于将大型CSV文件加载到核心数据的最快方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆