如何在后台线程上有效地将大文件写入磁盘(Swift) [英] How to efficiently write large files to disk on background thread (Swift)

查看:244
本文介绍了如何在后台线程上有效地将大文件写入磁盘(Swift)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新



我已解决并删除了令人分心的错误。如果有任何问题,请阅读整篇文章并随时发表评论。



背景



我正在尝试使用Swift 2.0,GCD和完成处理程序在iOS上将相对较大的文件(视频)写入磁盘。我想知道是否有更有效的方法来执行此任务。在使用完成逻辑的同时,需要在不阻塞主UI的情况下完成任务,并确保尽可能快地执行操作。我有自定义对象与NSData属性,所以我目前正在尝试使用NSData上的扩展。作为示例,替代解决方案可能包括使用NSFilehandle或NSStreams以及某种形式的线程安全行为,这导致比基于当前解决方案的NSData writeToURL函数更快的吞吐量。



无论如何NSData出了什么问题?



请注意以下从NSData类参考中获得的讨论,(保存数据)。我确实对我的临时目录执行写操作,但是我遇到问题的主要原因是我在处理大文件时可以看到UI明显滞后。这种滞后正是因为NSData不是异步的(Apple Docs注意到原子写入会导致大文件的性能问题〜> 1mb)。因此,在处理大型文件时,无论内部机制在NSData方法中起作用,都可以使用。



我做了一些挖掘并从Apple发现了这个信息。 ..此方法非常适合将数据:// URL转换为NSData对象,也可用于同步读取短文件。如果需要读取可能较大的文件,请使用inputStreamWithURL:打开一个流,然后一次读取一个文件。 ( NSData Class Reference ,Objective-C,+ dataWithContentsOfURL )。这个信息似乎暗示我可以尝试使用流将文件写在后台线程上,如果将writeToURL移动到后台线程(由@jtbandes建议)是不够的。


NSData类及其子类提供快速方法,并且
可以轻松地将其内容保存到磁盘。为了最大限度地降低数据丢失的风险,这些方法提供了以原子方式保存数据的选项。原子
写保证数据保存完整,或者
完全失败。通过将数据写入
临时文件开始原子写入。如果此写入成功,则该方法将
临时文件移动到其最终位置。



虽然原子写操作可最大程度地降低因$ b而导致数据丢失的风险$ b损坏或部分写入的文件,当
写入临时目录,用户的主目录或其他
可公开访问的目录时,它们可能不合适。每当您使用公开的
可访问文件时,您应该将该文件视为不受信任的
潜在危险资源。攻击者可能会破坏或损坏这些文件的
。攻击者还可以使用hard或
符号链接替换文件,导致您的写入操作覆盖或损坏
其他系统资源。



避免使用在可公开访问的目录中工作时,writeToURL:atomically:方法(以及相关的
方法)。相反,
使用现有文件描述符初始化NSFileHandle对象,
使用NSFileHandle方法安全地写入文件。


其他替代方案



一个文章提供了有关高级:后台文件I / O的有趣选项。一些选项也涉及使用InputStream。 Apple也有一些较早的参考文献异步读取和写入文件 。我发布这个问题是为了期待Swift的替代品。



适当答案的例子



以下是可能满足此类问题的适当答案的示例。 (针对流编程指南,写入输出流



使用NSOutputStream实例写入输出流需要几个步骤:


  1. 使用
    存储库为写入数据创建并初始化NSOutputStream实例。还设置了一个代表。

  2. 在运行循环上安排
    流对象并打开流。

  3. 处理流对象的事件
    报告给它的代表。

  4. 如果流对象
    已将数据写入内存,请通过请求
    NSStreamDataWrittenToMemoryStreamKey属性来获取数据。

  5. 当没有更多的
    数据要写时,处理流对象。




我正在寻找最熟练的算法,适用于使用Swift,API或甚至
C / ObjC将
极大文件写入iOS就足够了。我可以将算法转换为适当的
Swift兼容结构。


Nota Bene


我理解下面的信息错误。包含它是为了完整性。这个
问题是询问是否有更好的算法使用
将大文件写入具有保证依赖序列的磁盘(例如NSOperation依赖)。如果有
请提供足够的信息(描述/样本给我
重建相关的Swift 2.0兼容代码)。请告知我是否
缺少任何有助于回答问题的信息。


关于扩展程序的说明


我已经在基本writeToURL中添加了一个完成处理程序,以确保
不会发生意外的资源共享。我使用文件
的依赖任务不应该面临竞争条件。




  extension NSData {

func writeToURL(named:String,completion :( result:bool,url:NSURL?) - > Void){

let filePath = NSTemporaryDirectory() +命名为
// var成功:bool = false
让tmpURL = NSURL(fileURLWithPath:filePath)
弱变弱weakSelf = self


dispatch_async( dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT,0),{
//原则上写入URL
if weakSelf!.writeToURL(tmpURL,atomically:true){

if NSFileManager.defaultManager()。 fileExistsAtPath(filePath){
completion(result:true,url:tmpURL)
} else {
completion(result:false,url:tmpURL)
}
}
})

}
}

此方法用于处理来自控制器的自定义对象数据:

  var items = [AnyObject] ()
if let video = myCustomClass.data {

//视频属于NSData
video.writeToURL(shared.mp4,完成:{(结果,网址) ) - >如果结果为{
items.append(url!)
if items.count> 0 {

let sharedActivityView = UIActivityViewController(activityItems:items,applicationActivities:nil)

self.presentViewController(sharedActivityView,animated:true){() - >无效
//已完成
}
}
}
})
}

结论



核心数据性能提供了一些有关交易的好建议内存压力和管理BLOB。这真的是一篇文章,其中包含大量行为线索以及如何缓解应用程序中大文件的问题。现在虽然它特定于Core Data而不是文件,但是关于原子写作的警告确实告诉我,我应该实现非常谨慎地原子写入的方法。



对于大文件,管理写入的唯一安全方法似乎是在完成处理程序(写入方法)中添加并在主线程上显示活动视图。无论是通过流做到这一点还是通过修改现有API来添加完成逻辑都取决于读者。我在过去和过去都在测试最佳性能。



在那之前,我正在改变解决方案,从Core Data中删除所有二进制数据属性,并用字符串替换它们以保存磁盘上的资产URL。我还利用Assets Library和PHAsset的内置功能来获取和存储所有相关的资产URL。当我需要复制任何资产时,我将使用标准API方法(PHAsset /资源库上的导出方法)和完成处理程序来通知用户主线程上的已完成状态。



(Core Data Performance文章中非常有用的摘录)


减少内存开销



有时您希望在
临时基础上使用托管对象,例如计算
特定属性的平均值。这会导致对象图和内存
消耗增长。您可以通过
减少不再需要的单个托管对象来减少内存开销,或者
可以重置托管对象上下文以清除整个对象图。
您也可以使用适用于Cocoa编程的模式。



您可以使用
NSManagedObjectContext的refreshObject:mergeChanges重新验证单个托管对象: 方法。这有
清除其内存中属性值的效果,从而减少
的内存开销。 (请注意,这与将
属性值设置为nil不同 - 如果
错误被触发,将按需检索值 - 请参阅Faulting和Uniquing。)



创建获取请求时,可以将includesPropertyValues设置为NO>,以避免创建表示属性值的对象,从而减少内存开销。但是,如果您确定要么不需要实际的属性数据,要么已经拥有行缓存中的信息,那么通常只应该这样做,否则您将产生多次
到持久性存储的行程。



您可以使用NSManagedObjectContext的reset方法删除与上下文关联的所有托管对象,并重新开始,就像您刚刚创建它一样。请注意,与该上下文关联的任何托管对象都将失效,因此您将需要放弃对您仍感兴趣的上下文关联的任何对象的引用和重新获取。如果迭代大量对象,则可能需要使用本地自动释放池块来确保尽快释放临时对象。



如果您不打算使用Core Data的撤消功能,
您可以通过将
上下文的撤消管理器设置为nil来减少应用程序的资源需求。这对于
后台工作线程以及大型导入或批量
操作尤其有用。



最后,Core Data默认情况下不会对托管对象保留强大的
引用(除非它们有未保存的更改)。如果
你在内存中有很多对象,你应该确定拥有
引用。托管对象通过关系保持对彼此的强引用
,这可以轻松创建强引用
周期。您可以通过重新断层对象来中断循环(再次使用
,刷新对象:NSManagedObjectContext的mergeChanges:方法)。



大型数据对象(BLOB )



如果您的应用程序使用大型BLOB(二进制大对象,如
图像和声音数据),则需要注意最小化开销。
小,适度和大的确切定义是流动的,
取决于应用程序的用法。一个宽松的经验法则是,大小为千字节的
对象是适度大小的,而大小为兆字节的
大小是大的。一些
开发人员在
数据库中使用10MB BLOB取得了良好的性能。另一方面,如果一个应用程序在
a表中有数百万行,那么甚至128字节可能是一个适度大小的CLOB(字符
大对象),需要将其规范化为一个单独的表。 / p>

通常,如果需要将BLOB存储在持久存储中,则
应使用SQLite存储。 XML和二进制存储要求
整个对象图驻留在内存中,并且存储写入是原子的(参见
持久存储功能),这意味着它们不能有效地
处理大型数据对象。 SQLite可以扩展以处理极其
的大型数据库。正确使用,SQLite为
数据库提供了高达100GB的良好性能,单行可以容纳1GB(尽管
当然将1GB数据读入内存是一项昂贵的操作,但没有b $ b问题存储库的效率。)



BLOB通常表示实体的属性 - 例如,
照片可能是Employee实体的属性。对于小到
中等大小的BLOB(和CLOB),您应该为数据创建一个单独的实体
,并创建一个一对一的关系来代替
属性。例如,您可以创建Employee和Photograph
实体,它们之间具有一对一的关系,其中从Employee到Photograph的
关系替换了Employee的
照片属性。这种模式最大化了对象
错误的好处(参见Faulting和Uniquing)。任何给定的照片只有在实际需要的情况下才能检索到
(如果遍历关系)。



但是,如果你能够存储,那就更好了BLOB作为文件系统
上的资源,并维护那些
资源的链接(例如URL或路径)。然后,您可以在必要时加载BLOB。


注意:


我已将下面的逻辑移到完成处理程序中(参见上面的代码
),我不再看到任何错误。如前所述,
问题是关于使用Swift在iOS中处理大型文件是否有更高效的方式。


尝试处理生成的items数组以传递给UIActvityViewController时,使用以下逻辑:



if items.count> 0 {

let sharedActivityView = UIActivityViewController(activityItems:items,applicationActivities:nil)
self.presentViewController(sharedActivityView,animated:true){() - > V in in
// finished}
}



我看到以下错误:通讯错误:{count = 1,
contents =XPCErrorDescription=> {length =
22,contents =Connection interrupted}}>(请注意,我正在寻找更好的设计,而不是这个错误消息的答案)

解决方案

性能取决于数据是否适合RAM。如果是,那么你应该使用 NSData writeToURL 并启用 atomically 功能,这就是你正在做的。



当写入公共目录与iOS完全不相关时,Apple的注意事项是危险的,因为没有公共目录。该部分仅适用于OS X.坦率地说,它也不是很重要。



所以,你编写的代码尽可能高效,只要视频适合RAM(约100MB是一个安全限制)。



对于不适合RAM的文件,你需要使用一个流,否则你的应用程序会崩溃将视频保存在内存中。要从服务器下载大型视频并将其写入磁盘,您应该使用 NSURLSessionDownloadTask



一般来说,流式传输(包括 NSURLSessionDownloadTask )将比 NSData.writeToURL()慢几个数量级。所以除非你需要,否则不要使用流。 NSData 上的所有操作都非常快,它完全能够处理大小为数TB的文件,并在OS X上具有出色的性能(iOS显然不能有那么大的文件,但它是同一个具有相同性能的类。)






有一些代码中的问题。



这是错误的:

  let filePath = NSTemporaryDirectory()+名为

相反总是这样做:

  let filePath = NSTemporaryDirectory()。stringByAppendingPathComponent(named)

但这也不理想,你应该避免使用路径(它们有缺陷和缓慢)。而是使用这样的URL:

 让tmpDir = NSURL(fileURLWithPath:NSTemporaryDirectory())作为NSURL! 
let fileURL = tmpDir.URLByAppendingPathComponent(named)

此外,你正在使用一条路径检查文件是否存在...不要这样做:

 如果NSFileManager.defaultManager()。fileExistsAtPath(filePath) {

而是使用NSURL检查它是否存在:

  if fileURL.checkResourceIsReachableAndReturnError(nil){


Update

I have resolved and removed the distracting error. Please read the entire post and feel free to leave comments if any questions remain.

Background

I am attempting to write relatively large files (video) to disk on iOS using Swift 2.0, GCD, and a completion handler. I would like to know if there is a more efficient way to perform this task. The task needs to be done without blocking the Main UI, while using completion logic, and also ensuring that the operation happens as quickly as possible. I have custom objects with an NSData property so I am currently experimenting using an extension on NSData. As an example an alternate solution might include using NSFilehandle or NSStreams coupled with some form of thread safe behavior that results in much faster throughput than the NSData writeToURL function on which I base the current solution.

What's wrong with NSData Anyway?

Please note the following discussion taken from the NSData Class Reference, (Saving Data). I do perform writes to my temp directory however the main reason that I am having an issue is that I can see a noticeable lag in the UI when dealing with large files. This lag is precisely because NSData is not asynchronous (and Apple Docs note that atomic writes can cause performance issues on "large" files ~ > 1mb). So when dealing with large files one is at the mercy of whatever internal mechanism is at work within the NSData methods.

I did some more digging and found this info from Apple..."This method is ideal for converting data:// URLs to NSData objects, and can also be used for reading short files synchronously. If you need to read potentially large files, use inputStreamWithURL: to open a stream, then read the file a piece at a time." (NSData Class Reference, Objective-C, +dataWithContentsOfURL). This info seems to imply that I could try using streams to write the file out on a background thread if moving the writeToURL to the background thread (as suggested by @jtbandes) is not sufficient.

The NSData class and its subclasses provide methods to quickly and easily save their contents to disk. To minimize the risk of data loss, these methods provide the option of saving the data atomically. Atomic writes guarantee that the data is either saved in its entirety, or it fails completely. The atomic write begins by writing the data to a temporary file. If this write succeeds, then the method moves the temporary file to its final location.

While atomic write operations minimize the risk of data loss due to corrupt or partially-written files, they may not be appropriate when writing to a temporary directory, the user’s home directory or other publicly accessible directories. Any time you work with a publicly accessible file, you should treat that file as an untrusted and potentially dangerous resource. An attacker may compromise or corrupt these files. The attacker can also replace the files with hard or symbolic links, causing your write operations to overwrite or corrupt other system resources.

Avoid using the writeToURL:atomically: method (and the related methods) when working inside a publicly accessible directory. Instead initialize an NSFileHandle object with an existing file descriptor and use the NSFileHandle methods to securely write the file.

Other Alternatives

One article on Concurrent Programming at objc.io provides interesting options on "Advanced: File I/O in the Background". Some of the options involve use of an InputStream as well. Apple also has some older references to reading and writing files asynchronously. I am posting this question in anticipation of Swift alternatives.

Example of an appropriate answer

Here is an example of an appropriate answer that might satisfy this type of question. (Taken for the Stream Programming Guide, Writing To Output Streams)

Using an NSOutputStream instance to write to an output stream requires several steps:

  1. Create and initialize an instance of NSOutputStream with a repository for the written data. Also set a delegate.
  2. Schedule the stream object on a run loop and open the stream.
  3. Handle the events that the stream object reports to its delegate.
  4. If the stream object has written data to memory, obtain the data by requesting the NSStreamDataWrittenToMemoryStreamKey property.
  5. When there is no more data to write, dispose of the stream object.

I am looking for the most proficient algorithm that applies to writing extremely large files to iOS using Swift, APIs, or possibly even C/ObjC would suffice. I can transpose the algorithm into appropriate Swift compatible constructs.

Nota Bene

I understand the informational error below. It is included for completeness. This question is asking whether or not there is a better algorithm to use for writing large files to disk with a guaranteed dependency sequence (e.g. NSOperation dependencies). If there is please provide enough information (description/sample for me to reconstruct pertinent Swift 2.0 compatible code). Please advise if I am missing any information that would help answer the question.

Note on the extension

I've added a completion handler to the base writeToURL to ensure that no unintended resource sharing occurs. My dependent tasks that use the file should never face a race condition.

extension NSData {

    func writeToURL(named:String, completion: (result: Bool, url:NSURL?) -> Void)  {

       let filePath = NSTemporaryDirectory() + named
       //var success:Bool = false
       let tmpURL = NSURL( fileURLWithPath:  filePath )
       weak var weakSelf = self


      dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), {
                //write to URL atomically
                if weakSelf!.writeToURL(tmpURL, atomically: true) {

                        if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
                            completion(result: true, url:tmpURL)                        
                        } else {
                            completion (result: false, url:tmpURL)
                        }
                    }
            })

        }
    }

This method is used to process the custom objects data from a controller using:

var items = [AnyObject]()
if let video = myCustomClass.data {

    //video is of type NSData        
    video.writeToURL("shared.mp4", completion: { (result, url) -> Void in
        if result {
            items.append(url!)
            if items.count > 0 {

                let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)

                self.presentViewController(sharedActivityView, animated: true) { () -> Void in
                //finished
    }
}
        }
     })
}

Conclusion

The Apple Docs on Core Data Performance provide some good advice on dealing with memory pressure and managing BLOBs. This is really one heck of an article with a lot of clues to behavior and how to moderate the issue of large files within your app. Now although it is specific to Core Data and not files, the warning on atomic writing does tell me that I ought to implement methods that write atomically with great care.

With large files, the only safe way to manage writing seems to be adding in a completion handler (to the write method) and showing an activity view on the main thread. Whether one does that with a stream or by modifying an existing API to add completion logic is up to the reader. I've done both in the past and am in the midst of testing for best performance.

Until then, I'm changing the solution to remove all binary data properties from Core Data and replacing them with strings to hold asset URLs on disk. I am also leveraging the built in functionality from Assets Library and PHAsset to grab and store all related asset URLs. When or if I need to copy any assets I will use standard API methods (export methods on PHAsset/Asset Library) with completion handlers to notify user of finished state on the main thread.

(Really useful snippets from the Core Data Performance article)

Reducing Memory Overhead

It is sometimes the case that you want to use managed objects on a temporary basis, for example to calculate an average value for a particular attribute. This causes your object graph, and memory consumption, to grow. You can reduce the memory overhead by re-faulting individual managed objects that you no longer need, or you can reset a managed object context to clear an entire object graph. You can also use patterns that apply to Cocoa programming in general.

You can re-fault an individual managed object using NSManagedObjectContext’s refreshObject:mergeChanges: method. This has the effect of clearing its in-memory property values thereby reducing its memory overhead. (Note that this is not the same as setting the property values to nil—the values will be retrieved on demand if the fault is fired—see Faulting and Uniquing.)

When you create a fetch request you can set includesPropertyValues to NO > to reduce memory overhead by avoiding creation of objects to represent the property values. You should typically only do so, however, if you are sure that either you will not need the actual property data or you already have the information in the row cache, otherwise you will incur multiple trips to the persistent store.

You can use the reset method of NSManagedObjectContext to remove all managed objects associated with a context and "start over" as if you'd just created it. Note that any managed object associated with that context will be invalidated, and so you will need to discard any references to and re-fetch any objects associated with that context in which you are still interested. If you iterate over a lot of objects, you may need to use local autorelease pool blocks to ensure temporary objects are deallocated as soon as possible.

If you do not intend to use Core Data’s undo functionality, you can reduce your application's resource requirements by setting the context’s undo manager to nil. This may be especially beneficial for background worker threads, as well as for large import or batch operations.

Finally, Core Data does not by default keep strong references to managed objects (unless they have unsaved changes). If you have lots of objects in memory, you should determine the owning references. Managed objects maintain strong references to each other through relationships, which can easily create strong reference cycles. You can break cycles by re-faulting objects (again by using the refreshObject:mergeChanges: method of NSManagedObjectContext).

Large Data Objects (BLOBs)

If your application uses large BLOBs ("Binary Large OBjects" such as image and sound data), you need to take care to minimize overheads. The exact definition of "small", "modest", and "large" is fluid and depends on an application’s usage. A loose rule of thumb is that objects in the order of kilobytes in size are of a "modest" sized and those in the order of megabytes in size are "large" sized. Some developers have achieved good performance with 10MB BLOBs in a database. On the other hand, if an application has millions of rows in a table, even 128 bytes might be a "modest" sized CLOB (Character Large OBject) that needs to be normalized into a separate table.

In general, if you need to store BLOBs in a persistent store, you should use an SQLite store. The XML and binary stores require that the whole object graph reside in memory, and store writes are atomic (see Persistent Store Features) which means that they do not efficiently deal with large data objects. SQLite can scale to handle extremely large databases. Properly used, SQLite provides good performance for databases up to 100GB, and a single row can hold up to 1GB (although of course reading 1GB of data into memory is an expensive operation no matter how efficient the repository).

A BLOB often represents an attribute of an entity—for example, a photograph might be an attribute of an Employee entity. For small to modest sized BLOBs (and CLOBs), you should create a separate entity for the data and create a to-one relationship in place of the attribute. For example, you might create Employee and Photograph entities with a one-to-one relationship between them, where the relationship from Employee to Photograph replaces the Employee's photograph attribute. This pattern maximizes the benefits of object faulting (see Faulting and Uniquing). Any given photograph is only retrieved if it is actually needed (if the relationship is traversed).

It is better, however, if you are able to store BLOBs as resources on the filesystem, and to maintain links (such as URLs or paths) to those resources. You can then load a BLOB as and when necessary.

Note:

I've moved the logic below into the completion handler (see the code above) and I no longer see any error. As mentioned before this question is about whether or not there is a more performant way to process large files in iOS using Swift.

When attempting to process the resulting items array to pass to a UIActvityViewController, using the following logic:

if items.count > 0 {
let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil) self.presentViewController(sharedActivityView, animated: true) { () -> Void in //finished} }

I am seeing the following error: Communications error: { count = 1, contents = "XPCErrorDescription" => { length = 22, contents = "Connection interrupted" } }> (please note, I am looking for a better design, not an answer to this error message)

解决方案

Performance depends wether or not the data fits in RAM. If it does, then you should use NSData writeToURL with the atomically feature turned on, which is what you're doing.

Apple's notes about this being dangerous when "writing to a public directory" are completely irrelevant on iOS because there are no public directories. That section only applies to OS X. And frankly it's not really important there either.

So, the code you've written is as efficient as possible as long as the video fits in RAM (about 100MB would be a safe limit).

For files that don't fit in RAM, you need to use a stream or your app will crash while holding the video in memory. To download a large video from a server and write it to disk, you should use NSURLSessionDownloadTask.

In general, streaming (including NSURLSessionDownloadTask) will be orders of magnitude slower than NSData.writeToURL(). So don't use a stream unless you need to. All operations on NSData are extremely fast, it is perfectly capable of dealing with files that are multiple terabytes in size with excellent performance on OS X (iOS obviously can't have files that large, but it's the same class with the same performance).


There are a few issues in your code.

This is wrong:

let filePath = NSTemporaryDirectory() + named

Instead always do:

let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)

But that's not ideal either, you should avoid using paths (they are buggy and slow). Instead use a URL like this:

let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory()) as NSURL!
let fileURL = tmpDir.URLByAppendingPathComponent(named)

Also, you're using a path to check if the file exists... don't do this:

if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {

Instead use NSURL to check if it exists:

if fileURL.checkResourceIsReachableAndReturnError(nil) {

这篇关于如何在后台线程上有效地将大文件写入磁盘(Swift)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆