Powershell在Word文档中搜索匹配字符串 [英] Powershell search matching string in word document

查看:284
本文介绍了Powershell在Word文档中搜索匹配字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的要求.我需要在Word文档中搜索一个字符串,结果需要在文档中找到匹配的行/一些单词.

I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.

到目前为止,我可以在包含Word文档的文件夹中成功搜索字符串,但是它会根据是否可以找到搜索字符串返回True/False.

So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\MORLAB"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"

Function getStringMatch
{
  # Loop through all *.doc files in the $path directory
  Foreach ($file In $files)
  {
   $document = $application.documents.open($file.FullName,$false,$true)
   $range = $document.content
   $wordFound = $range.find.execute($findText)

   if($wordFound) 
    { 
     "$file.fullname has $wordfound" | Out-File $output -Append
    }

  }
$document.close()
$application.quit()
}

getStringMatch

推荐答案

#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path     = "c:\Temp"
$files    = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output   = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = @{}

Function getStringMatch
{
    # Loop through all *.doc files in the $path directory
    Foreach ($file In $files)
    {
        $document = $application.documents.open($file.FullName,$false,$true)
        $range = $document.content

        If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
             $properties = @{
                File = $file.FullName
                Match = $findtext
                TextAround = $Matches[0] 
             }
             $results += New-Object -TypeName PsCustomObject -Property $properties
        }
    }

    If($results){
        $results | Export-Csv $output -NoTypeInformation
    }

    $document.close()
    $application.quit()
}

getStringMatch

import-csv $output

有两种方法可以获取您想要的东西.一种简单的方法是,因为您已经拥有文档的文本,可以对它执行正则表达式匹配并返回结果等等.这有助于尝试解决在文档中出现一些单词的问题.

There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.

我们有变量$charactersAround,它设置要匹配$findtext的字符数.另外,尽管我的输出更适合CSV文件,所以我使用$results捕获属性的哈希表,最后将这些属性输出到csv文件.

We have the variable $charactersAround which sets the number of characters to match around the $findtext. Also I though the output was a better fit for a CSV file so I used $results to capture a hashtable of properties that, in the end, are output to a csv file.

请确保为您自己的测试更改变量.现在,我们使用正则表达式来定位匹配项,这将打开一个无限的可能性.

Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.

示例输出

Match TextAround                                                        File                          
----- ----------                                                        ----                          
First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx

这篇关于Powershell在Word文档中搜索匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆