是否有比使用Shell COM组件更快的方法来获取文件元数据? [英] Is there a faster way to get file metadata than by using the shell COM component?

查看:49
本文介绍了是否有比使用Shell COM组件更快的方法来获取文件元数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里和其他地方阅读各种答案,我将这一点拼凑在一起以获得所需的文件元数据:

Reading various answers here and elsewhere, I pieced together this bit to get the file metadata that I need:

Public Class windows_metadata_helper
    Public Shared shell As New Shell32.Shell
    Public Shared indices_of_interest As New Dictionary(Of Integer, String)
    Public Shared path_index As Integer

    Shared Sub New()
        'snipped long piece code for figuring out the indices of the attributes that I need, they are stored in indices_of_interest, for example 0:Name
    End Sub

    Public Shared Function get_interesting_data(path) As Dictionary(Of String, String)
        Dim fi As New IO.FileInfo(path)
        Dim f_dir = shell.NameSpace(fi.DirectoryName)
        Dim data As New Dictionary(Of String, String)

        For Each item In f_dir.Items()
            If f_dir.GetDetailsOf(item, path_index) = fi.FullName Then
                For Each kvp In indices_of_interest
                    Dim val = f_dir.GetDetailsOf(item, kvp.Key)
                    If Not String.IsNullOrEmpty(val) Then data.Add(kvp.Value, val)
                Next
                Exit For
            End If
        Next

        Return data
    End Function
End Class

它不是世界上最高效的代码,即获取目录中每个文件的path属性以标识我实际感兴趣的文件.优化此代码以使其仅读取一次每个文件的path属性即可速度提高50%(通过让其获取第一个文件来确定其是否正确)进行了测试,但不管怎么说,它的速度都比预期的慢得多.

Its not the most efficient code in the world, namely getting the path attribute of each file in the directory to identify the file I'm actually interested in. Optimizing this to only read the path attribute of each file once makes it around 50% faster (tested by letting it take the first file it finds whether its the right one or not) but regardless, its far slower than expected.

它需要从每个文件中获取24个属性,并且需要在大约100k内找到大约20k个文件,目前这需要一个小时的时间.

It needs to fetch 24 attributes from each file and it needs to find around 20k files from within ~100k, currently this takes an entire hour.

分析告诉我,CPU是瓶颈,而且占用了我看不见的任何周期,因为它的99%位于Shell32.Folder.GetDetailsOf方法中.

Profiling tells me that CPU is the bottleneck and whatever is taking up the cycles I can't see since its 99% inside the Shell32.Folder.GetDetailsOf method.

是否有更快的获取元数据的方法?答案不必特定于vb或.net.

Is there a faster way to get the metadata? Answer doesn't have to be vb or .net specific.

推荐答案

由于您正在寻求最大速度,因此建议您为代码启用 Option Strict 并进行必要的修改通过IDE.这将消除不必要的类型转换.

Since you are seeking maximum speed, I suggest that you enable Option Strict for your code and make the necessary modifications that will be suggested by the IDE. This will eliminate unnecessary type conversions.

例如,

Public Shared Function get_interesting_data(path) As Dictionary(Of String, String)

应为:

Public Shared Function get_interesting_data(path As String) As Dictionary(Of String, String)

使用

Instead of enumerating the Shell32.Folder.Items collection, use the Shell32.Folder.ParseName Method to directly retrieve a FolderItem object. This object can be cast to a Shell32.ShellFolderItem that will allow using the ShellFolderItem.ExtendedProperty method.

有两种指定属性的方法.首先是分配属性的众所周知的名称,例如作者"或日期",即sPropName.但是,每个属性都是组件对象模型(COM)的成员属性集,也可以通过指定其格式ID进行标识(FMTID)和属性ID(PID).FMTID是一个GUID,用于标识属性集,而PID是标识特定对象的整数属性集中的属性.

There are two ways to specify a property. The first is to assign the property's well-known name, such as "Author" or "Date", to sPropName. However, each property is a member of a Component Object Model (COM) property set and can also be identified by specifying its format ID (FMTID) and property ID (PID). An FMTID is a GUID that identifies the property set, and a PID is an integer that identifies a particular property within the property set.

通过FMTID/PID值指定属性通常会更多比使用其名称更有效.使用属性的FMTID/PID值如果使用ExtendedProperty,则必须将它们组合成一个SCID.SCID是一个字符串,其中包含形式为"FMTID ** PID"的FMTID/PID值,其中FMTID是属性集的GUID的字符串形式.为了例如,摘要信息属性集的作者的SCID属性为"{F29F85E0-4FF9-1068-AB91-08002B27B3D9} 4".

Specifying a property by its FMTID/PID values is usually more efficient than using its name. To use a property's FMTID/PID values with ExtendedProperty, they must be combined into an SCID. An SCID is a string that contains the FMTID/PID values in the form "FMTID**PID", where the FMTID is the string form of the property set's GUID. For example, the SCID of the summary information property set's author property is "{F29F85E0-4FF9-1068-AB91-08002B27B3D9} 4".

可以在 Windows上提供的链接下找到许多FMTID/PID值.属性.
您可以找到完整属性表格(向下滚动).

Many FMTID/PID values can be found under links presented at Windows Properties.
You can find the full property table here (scroll down).

将某些选定的属性放在一起:

Putting this together for some selected properties:

Public Shared Function get_interesting_data(path As String) As Dictionary(Of String, String)
    Dim fi As New IO.FileInfo(path)
    Dim f_dir As Shell32.Folder = shell.NameSpace(fi.DirectoryName)

    ' instead of enumerating f_dir.Items to find the file of interest
    ' directly retrieve the item reference
    Dim item As Shell32.ShellFolderItem = DirectCast(f_dir.ParseName(fi.Name), Shell32.ShellFolderItem)

    Dim scid_Bitrate As String = "{64440490-4C8B-11D1-8B70-080036B11A03} 4"    ' Audio: System.Audio.EncodingBitrate
    Dim scid_Title As String = "{F29F85E0 - 4.0FF9-1068-AB91-08002B27B3D9} 2"  ' Core: System.Title
    Dim scid_Created As String = "{B725F130-47EF-101A-A5F1-02608C9EEBAC} 15"   ' Core: System.DateCreated
    Dim scid_Copyright As String = "{64440492-4C8B-11D1-8B70-080036B11A03} 11" ' Core: System.Copyright
    Dim scid_Publisher As String = "{64440492-4C8B-11D1-8B70-080036B11A03} 30" ' Media: System.Media.Publisher
    Dim scid_FullDetails As String = "{C9944A21-A406-48FE-8225-AEC7E24C211B} 2" ' PropList: System.PropList.FullDetails

    Dim bitrate As Object = item.ExtendedProperty(scid_Bitrate)
    Dim title As Object = item.ExtendedProperty(scid_Title)
    Dim created As Object = item.ExtendedProperty(scid_Created)
    Dim copyright As Object = item.ExtendedProperty(scid_Copyright)
    Dim publisher As Object = item.ExtendedProperty(scid_Publisher)
    Dim fullDetails As Object = item.ExtendedProperty(scid_FullDetails)

    Dim data As New Dictionary(Of String, String)
    ' save the retrieved properties

    Return data
End Function

我不知道这种检索属性的技术是否比您现在使用的 GetDetailsOf 更快,但是其他更改应该有所改进.

I do not know if this technique of retrieving the properties is faster than you have currently using GetDetailsOf, but the other changes should make some improvement.

这篇关于是否有比使用Shell COM组件更快的方法来获取文件元数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆