Win32 API的和用FindFirstFile性能FindNextFile VS命令行 [英] Win32 API FindFirstFile and FindNextFile performance vs command line

查看:244
本文介绍了Win32 API的和用FindFirstFile性能FindNextFile VS命令行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经遍历寻找使用通配符模式的文件目录时遇到了意外的性能问题。

We have encountered an unexpected performance issue when traversing directories looking for files using a wildcard pattern.

我们必须每个都包含10,000个文件夹180。使用命令行的搜索 DIR<模式> / S 完成几乎瞬间(小于0.25秒)。然而,从我们的应用程序相同的搜索需要3-4秒之间。

We have 180 folders each containing 10,000 files. A command line search using dir <pattern> /s completes almost instantly (<0.25 second). However, from our application the same search takes between 3-4 seconds.

我们使用最初试图 System.IO.DirectoryInfo.GetFiles() SearchOption.AllDirectories ,现在已尝试在Win32 API调用用FindFirstFile() FindNextFile()

We initially tried using System.IO.DirectoryInfo.GetFiles() with SearchOption.AllDirectories and have now tried the Win32 API calls FindFirstFile() and FindNextFile().

剖析我们使用code表示的执行时间绝大部分都花在这些调用。

Profiling our code using indicates that the vast majority of execution time is spent on these calls.

我们的code基于以下博客文章:

Our code is based on the following blog post:

<一个href="http://$c$cbetter.com/blogs/matthew.podwysocki/archive/2008/10/16/functional-net-fighting-friction-in-the-bcl-with-directory-getfiles.aspx" rel="nofollow">http://$c$cbetter.com/blogs/matthew.podwysocki/archive/2008/10/16/functional-net-fighting-friction-in-the-bcl-with-directory-getfiles.aspx

我们发现这是缓慢,更新的GetFiles 函数取字符串搜索模式,而不是predicate。

We found this to be slow so updated the GetFiles function to take a string search pattern rather than a predicate.

任何人都可以摆脱任何光线在什么可能是错误的应对方法?

Can anyone shed any light on what might be wrong with our approach?

推荐答案

进程监视器一个简单的测试显示,CMD.EXE dir命令和File.GetFiles行为显著不同。以下是.NET Directory.GetFiles()做了一个单独的目录:

A simple test with Process Monitor shows that cmd.exe dir command and File.GetFiles behave significantly different. Here is what .NET Directory.GetFiles() does for a single directory:

"CreateFile","d:\somedir","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Complete If Oplocked, Open For Backup, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"SetBasicInformationFile","d:\somedir","SUCCESS","CreationTime: 1/1/1601 1:59:59 AM, LastAccessTime: 1/1/1601 1:59:59 AM, LastWriteTime: 1/1/1601 1:59:59 AM, ChangeTime: 1/1/1601 1:59:59 AM, FileAttributes: n/a"
"QueryFileInternalInformationFile","d:\somedir","SUCCESS","IndexNumber: 0x4000000000030"
"FileSystemControl","d:\somedir","END OF FILE","Control: FSCTL_FILE_PREFETCH"
"CloseFile","d:\somedir","SUCCESS",""

在另一方面CMD.EXE的行为是这样的:

On the other hand cmd.exe behaves like this:

"CreateFile","d:\somedir","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"QueryDirectory","d:\somedir\*","SUCCESS","Filter: *, 1: ."
"QueryDirectory","d:\somedir","SUCCESS"
"QueryDirectory","d:\somedir","NO MORE FILES",""
"CloseFile","d:\somedir","SUCCESS",""
"CreateFile","d:\somedir","SUCCESS","Desired Access: Read Data/List Directory, Synchronize, Disposition: Open, Options: Directory, Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened"
"QueryDirectory","d:\somedir\*","SUCCESS","Filter: *, 1: ."
"QueryDirectory","d:\somedir","SUCCESS"
"QueryDirectory","d:\somedir","NO MORE FILES",""
"CloseFile","d:\somedir","SUCCESS",""

虽然CMD.EXE似乎是在做两次操作的数量方面的工作,它似乎并没有被调用的API NtSetBasicInformationFile NtQueryFileInternalInformationFile NtFileSystemControl 。它仅使用 NtQueryDirectoryFile 来得到它想要的信息。

Although cmd.exe seems to be doing twice the work in terms of number of operations, it doesn't seem to be calling APIs NtSetBasicInformationFile, NtQueryFileInternalInformationFile or NtFileSystemControl. It only uses NtQueryDirectoryFile to get the information it wants.

最容易受到API是 NtSetBasicInformationFile 它设置一个的LastAccessTime的CMD.EXE不打扰做的事情。正如你所看到的,这需要写操作,文件系统结构,并可能招致的实际开销。

The most susceptible API is NtSetBasicInformationFile which sets a "LastAccessTime" that cmd.exe doesn't bother doing. As you can see this requires "write" operation to file system structures and might be incurring the actual overhead.

但是我的研究是不完整的:

However my research is incomplete:

  • 我没有验证.NET真的比cmd.exe的慢。我只是比他们的行动。

  • I didn't verify if .NET is really slower than cmd.exe. I just compared their operations.

我不知道如果对比时,有一个独立的可执行文件目录命令,提问者把进程启动的时间考虑在内。

I'm not sure if asker took "process startup time" into account when comparing "dir" command with a standalone executable.

有人说引用使用用FindFirstFile NtQueryDirectoryFile但我并没有与微软的资源验证这一点。

Some references say FindFirstFile uses NtQueryDirectoryFile but I didn't verify this with Microsoft resources.

需要有人进程监视器堆要经过的痕迹,找出哪些特定的Win32 API的使用和运行测试,使用它们来代替。

Someone needs to go through Process Monitor stack traces to find out which specific Win32 APIs are used and run tests using them instead.

这篇关于Win32 API的和用FindFirstFile性能FindNextFile VS命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆