如何通过C#访问Office文件上的标签信息 [英] How to access tag information on office files via C#

查看:134
本文介绍了如何通过C#访问Office文件上的标签信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一小段代码,仅从目录中存在的一组办公室(docx,pptx等)文件中提取标签信息,以便可以轻松对其进行索引和搜索.

I would like to write a simple bit of code that would extract only the tag information from a set of office (docx, pptx, etc.) files that exist in a directory so that it could be indexed and searched easily.

当我说标签"时,是指自Vista以来您已经可以添加到文件中的标签信息.通常是使用资源管理器完成的.例如,下面的屏幕截图中的pptx文件带有标签"bubble".

When I say "tag", I mean the tag info that you have been able to add to a file since Vista. It's typically done using Explorer. For example, the pptx file in the screenshot below has the tag, "bubble" attached.

但是您说,搜索那些标签已经内置在Windows中了吗?为什么,是的,但是我只需要索引标签,并且需要通过Intranet而不是Windows内部公开信息.

But searching those tags is already built into Windows, you say? Why, yes, but I need this to only index the tags and I need to expose the info through an intranet rather than inside of Windows.

我发现在office文件包中,实际信息存储在cp:keywords元素中的/docProps/core.xml文件中.而且我确实意识到,在代码中,我可以解压缩文件,访问该文件并提取所需的文件.但是,我希望那里有一个预先提取的解决方案.我严重怀疑这是Windows为索引相同信息所做的事情(但是,坦白地说,我无法真正找到关于它的任何好信息).

I have found that inside the office file package, the actual information is stored in /docProps/core.xml file in the cp:keywords element. And I do realize that, in code, I could unzip the file, access that file, and extract what I need. I'm hoping that there's a pre-abstracted solution out there somewhere, however. I seriously doubt that's what Windows is doing to index that same information (but admittedly, I can't really find any good info on it).

我还发现了一些有关 IFilters 的讨论.但是,这将访问文件的文本.我看不到IFilter在哪里可以解决此特定问题.

I have also found some discussions about IFilters. And yet, this accesses the text of the file. I don't see where an IFilter helps solve this particular problem.

有人能指出我在这个方向上的正确方向吗?

Can anyone point me in the right direction on this one?

推荐答案

我没有安装单词,但是我猜想它们可以从标准属性系统作为KEYWORD条目访问,就像jpg图片上的标记一样

I don't have word installed but i'll guess that they are accessible from the standard property system as the KEYWORD entries as are the tags on a jpg picture.

如果您想确切地知道它是如何完成的,我使用了shell COM API,下面是Gist中的完整示例代码: Microsoft Windows API代码包,因为它们的实现很多清洁工.

If you want to know exactly how it's done, I played with the shell COM API and here is a full sample code in Gist : FileTags.cs. But that was just for fun you should use the Microsoft Windows API Code Pack as their implementation is a lot cleaner.

要获取标签(内部称为关键字),请引用Microsoft.WindowsAPICodePack.Shell.dll,然后:

To get the tags (called keywords internally) reference Microsoft.WindowsAPICodePack.Shell.dll then :

using System;
using Microsoft.WindowsAPICodePack.Shell;

class Program
{
    static void Main()
    {
        var shellFile = ShellFile.FromFilePath(@"C:\path\to\some\file.jpg");
        var tags = (string[])shellFile.Properties.System.Keywords.ValueAsObject;
        tags = tags ?? new string[0];
        Console.WriteLine("Tags: {0}", String.Join("; ", tags));
        Console.ReadLine();
    }
}

如果他们没有把它弄乱,它应该从 Windows XP SP2 开始工作(我应该从SP1开始工作,因为我避免了PropVariantGetStringElem,但是没有它们真的很烦人 ).

if they didn't mess it up it should work starting from Windows XP SP2 (Mine should work from SP1 as I avoided the PropVariantGetStringElem but it's really annoying without them).

这篇关于如何通过C#访问Office文件上的标签信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆