如何让 powershell 在 Word 文档中搜索通配符并返回找到的单词? [英] How do I make powershell search a Word document for wildcards and return the word it found?

查看:61
本文介绍了如何让 powershell 在 Word 文档中搜索通配符并返回找到的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在大量 Word 文档 (5000) 中搜索大量字符串 (3000).我知道如何在 Powershell 脚本中执行此操作,但这需要很长时间.幸运的是,这些字符串中的大多数在前 3 或 4 个字符中都有公共文本,如果在 find.execute 语句中使用通配符搜索,我可以将字符串缩小到大约 300 个.如果我在 strings.txt 中搜索 (cod)*,并在 Word 文档中找到诸如代码"、编码"、编码"等结果,我需要将这些结果放入文本文件中.不过,我的运气并不好.

I am searching a very large amount of Word documents (5000) for a very large number of strings (3000). I know how to do this in a Powershell script, but it takes an extremely long time. Fortunately, most of these strings have common text in the first 3 or 4 characters, and I am able to narrow the strings down to roughly 300 if utilize wildcard searches in a find.execute statement. If I search for (cod)* in strings.txt, and it find results such as "code," "coding", "coded", etc. in the Word doc, I need to have those results placed into a text file. However, I'm not having much luck.

$filePath = "C:\files\"
$textPath = "C:\strings.txt"
$outputPath = "C:\output.txt"
$findTexts = (Get-Content $textPath)
$docs = Get-childitem -path $filePath -Recurse -Include *.docx 
$application = New-Object -comobject word.application 
Foreach ($doc in $docs)
{
   $document = $application.documents.open("$doc", $false, $true)
   $application.visible = $False
   $matchCase = $false 
   $matchWholeWord = $false 
   $matchWildCards = $true 
   $matchSoundsLike = $false 
   $matchAllWordForms = $false 
   $forward = $true 
   $wrap = 1
   $range = $document.content
   $null = $range.movestart()

   Foreach ($findtext in $findTexts)
   {
       $wordFound = $range.find.execute($findText,$matchCase,$matchWholeWord,$matchWildCards,$matchSoundsLike, $matchAllWordForms,$forward,$wrap) 
       if ($wordFound) 
       { 
           $docName = $doc.Name
           #Output search results and file name to a tab-delimited file
           "$findText`t$docName" | Out-File -append $outputPath   
        } #end if $wordFound 

     } #end foreach $findText
$document.close()
} #end foreach $doc
$application.quit()

如果我有一个 Word 文档,其中包含编码"一词,则此脚本会生成包含 (cod)* 通配符和文件名的 output.txt,因为 $findText = (cod)*.那么有没有办法让编码"这个词输出到文件中?

If I have a Word doc with the word "coding" in it, this script results in output.txt containing the (cod)* wildcard and the filename because $findText = (cod)*. So is there any way to get the word "coding" to output to the file?

推荐答案

与其使用 Word 的通配符搜索,不如对文档中的所有文本使用 Powershell 正则表达式.像这样:

Instead of using Word's wildcard searching why not just use a Powershell regex on all of the text in the document. Something like this:

if ($document.Content.Text -match "\b$($findText)\w+\b") 
{ 
  $docName = $doc.Name
  "$($matches[0])`t$docName" | Out-File -append $outputPath   
}

这篇关于如何让 powershell 在 Word 文档中搜索通配符并返回找到的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆