如何在iPhone上读取大型UTF-8文件? [英] How can I read a large UTF-8 file on an iPhone?

查看:124
本文介绍了如何在iPhone上读取大型UTF-8文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序以UTF-8格式下载文件,该文件太大而无法使用 NSString initWithContentsOfFile 方法读取。我遇到的问题是 NSFileHandle readDataOfLength 方法读取指定数量的字节,我最终可能只读取部分UTF-8字符。这里最好的解决方案是什么?

My app downloads a file in UTF-8 format, which is too large to read using the NSString initWithContentsOfFile method. The problem I have is that the NSFileHandle readDataOfLength method reads a specified number of bytes, and I may end up only reading part of a UTF-8 character. What is the best solution here?

以后:

让它记录在船舶的日志中以下内容代码有效:

Let it be recorded in the ship's log that the following code works:

    NSData *buf = [NSData dataWithContentsOfFile:path
                                      options:NSDataReadingMappedIfSafe
                                        error:nil];

NSString *data = [[[NSString alloc] 
                   initWithBytesNoCopy:(void *)buf.bytes 
                   length:buf.length 
                   encoding:NSUTF8StringEncoding 
                   freeWhenDone:NO] autorelease];

我的主要问题实际上与编码有关,而不是读取文件的任务。

My main problem was actually to do with the encoding, not the task of reading the file.

推荐答案

您可以使用 NSData + dataWithContentsOfFile:options:error: with NSDataReadingMappedIfSafe 选项将文件映射到内存而不是加载它。因此,我们将使用iOS中的虚拟内存管理器来确保文件的各个部分以与桌面操作系统处理其磁盘上虚拟内存文件相同的方式交换进RAM。因此,您不需要足够的RAM来将整个文件保存在内存中,您只需要将文件足够小以适应处理器的地址空间(因此,千兆字节)。您将获得一个与普通 NSData 完全相同的对象,它可以为您节省大部分与使用 NSFileHandle 并手动流式传输。

You can use NSData +dataWithContentsOfFile:options:error: with the NSDataReadingMappedIfSafe option to map your file to memory rather than loading it. So that'll use the virtual memory manager in iOS to ensure that bits of the file are swapped in and out of RAM in the same way that a desktop OS handles its on-disk virtual memory file. So you don't need enough RAM to keep the entire file in memory at once, you just need the file to be small enough to fit in the processor's address space (so, gigabytes). You'll get an object that acts exactly like a normal NSData, which should save you most of the hassle related to using an NSFileHandle and manually streaming.

您可能需要将部分转换为 NSString ,因为您可以实际期望从UTF-8转换为另一种格式(虽然它可能没有;它值得用 -initWithData:encoding:并看看NSString是否足够聪明只是为了保留对原始数据的引用并按需扩展UTF-8),我认为这正是你的问题所在。

You'll probably then need to convert portions to NSString since you can realistically expect that to convert from UTF-8 to another format (though it might not; it's worth having a go with -initWithData:encoding: and seeing whether NSString is smart enough just to keep a reference to the original data and to expand from UTF-8 on demand), which I think is what your question is really getting at.

我建议你使用 -initWithBytes:length:encoding:将合理的字节数转换为字符串。然后,您可以使用 -lengthOfBytesUsingEncoding:来查找它实际感知的字节数并适当地提前读取指针。这是一个安全的假设, NSString 将丢弃您提供的字节末尾的任何部分字符。

I'd suggest you use -initWithBytes:length:encoding: to convert a reasonable number of bytes to a string. You can then use -lengthOfBytesUsingEncoding: to find out how many bytes it actually made sense of and advance your read pointer appropriately. It's a safe assumption that NSString will discard any part characters at the end of the bytes you provide.

编辑:所以,比如:

// map the file, rather than loading it
NSData *data = [NSData dataWithContentsOfFile:...whatever...
                         options:NSDataReadingMappedIfSafe
                         error:&youdDoSomethingSafeHere];

// we'll maintain a read pointer to our current location in the data
NSUinteger readPointer = 0;

// continue while data remains
while(readPointer < [data length])
{
    // work out how many bytes are remaining
    NSUInteger distanceToEndOfData = [data length] - readPointer;

    // grab at most 16kb of them, being careful not to read too many
    NSString *newPortion = 
         [[NSString alloc] initWithBytes:(uint8_t *)[data bytes] + readPointer
                 length:distanceToEndOfData > 16384 ? 16384 : distanceToEndOfData
                 encoding:NSUTF8StringEncoding];

    // do whatever we want with the string
    [self doSomethingWithFragment:newPortion];

    // advance our read pointer by the number of bytes actually read, and
    // clean up
    readPointer += [newPortion lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
    [newPortion release];
}

当然,隐含的假设是所有UTF-8编码都是唯一的,我不得不承认自己没有足够的知识可以绝对肯定地说出来。

Of course, an implicit assumption is that all UTF-8 encodings are unique, which I have to admit not to being knowledgable enough to say for absolute certain.

这篇关于如何在iPhone上读取大型UTF-8文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆