如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件? [英] How to match multiple regex patterns in multiple files and write something to a log file?

查看:33
本文介绍了如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我在文本框中给出路径的文件夹内的文件 (*.txt) 中搜索一些正则表达式模式,并且该文件夹包含其他带有 txt 文件的子文件夹,格式为 12345\2031\30201\txt\120.txt 并且如果模式即使在一个文件中也匹配,则将一个字符串写入日志文件中,该文件在我在文本框中给出的路径的文件夹内创建,然后移至下一个正则表达式等等到目前为止我所做的是

I want to search some regex patterns in files (*.txt) which are inside a folder whose path I'have given in a text box, and the folder contains other sub-folders with txt files in the form 12345\2031\30201\txt\120.txt and if the pattern matches even in one file, then a string is written on a log file which is created inside the folder whose path I've given in the text box and then it moves on to the next regex and so on What I've done so far is

Dim tLoc As String = TextBox1.Text
        Dim txtFilesArray = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories).Where(Function(f) f Like "*\#*\#*\#*\txt\#*.txt")
        Dim fileLoc As String = tLoc & "\Checklist.log"
        Dim fs As FileStream = Nothing
        If (Not File.Exists(fileLoc)) Then
            fs = File.Create(fileLoc)
            Using fs

            End Using
        End If
        For Each tFile In txtFilesArray
            Dim input As String = File.ReadAllText(tFile)
            Dim pattern1 As New Regex("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")
            Dim pattern2 As New Regex("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")
            If pattern1.IsMatch(input) Then
                FileOpen(1, fileLoc, OpenMode.Append)
                PrintLine(1, "Check figure link")
                FileClose()
            End If
            If pattern2.IsMatch(input) Then
                FileOpen(1, fileLoc, OpenMode.Append)
                PrintLine(1, "Check table link")
                FileClose()
            End If

        Next

但问题是:1)即使 pattern1 在多个文件中匹配,我也希望它只在日志文件中写入一次 Check figure link 字符串,而不是每次在其中找到匹配项时写入不同的文件和 pattern2....patternN 相同,此外,我希望程序在 pattern1 在一个文件中匹配时继续下一个正则表达式模式匹配(无需在其他文件中寻找相同的模式)2)我想在这个程序中使用大约一百个正则表达式模式,谁能告诉我如何缩短编码?

But the problems are: 1) Even if pattern1 matches in multiple files, I want it to write the string Check figure link only once in the log file and not in every time it finds a match in different files and same for pattern2....patternN, furthermore, I want the program to move on to the next regex pattern match the moment the pattern1 matches in one file (no need to look for the same pattern in other files) 2)I have around a hundred of regex patterns that I want to use in this program, can anyone tell me how do I shorten the coding?

推荐答案

您可以将模式放入某种集合中,然后在找到时将其从集合中删除

You can put the patterns in some kind of collection and then remove them from it when found

Dim re = Function(p$) New Regex(p, RegexOptions.Compiled)
Dim patterns = New Dictionary(Of String, Regex) From {
    {"Check figure link", re("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")},
    {"Check table link", re("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")}
}
Dim output = New List(Of String)
Dim tLoc = TextBox1.Text
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)

For Each tFile In txtFiles
    If Not tFile Like "*\#*\#*\#*\txt\#*.txt" Then Continue For
    Dim input = File.ReadAllText(tFile)

    Dim match = ""
    For Each pattern In patterns
        If pattern.Value.IsMatch(input) Then
            match = pattern.Key
            Exit For
        End If
    Next
    If match > "" Then
        output.Add(match)
        patterns.Remove(match)
    End If
Next
File.WriteAllLines(tLoc.TrimEnd("\"c) & "\Checklist.log", output)

如果您想将每个模式与所有文件进行比较,那么并行化(同时在多个处理器上运行)会更容易,因为无需从集合中删除它们:

If you want to compare each pattern against all files, then it will be easier to parallelize (run on multiple processors at the same time) because there will be no need to remove them from the collection:

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Parallel.ForEach(patterns,
    Sub(pattern)
        Dim tLoc = TextBox1.Text
        Dim output = New List(Of String)
        Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
        Dim regEx = New Regex(pattern(1), RegexOptions.Compiled)

        For Each tFile In txtFiles
            If tFile Like "*\#*\#*\#*\txt\#*.txt" Then
                Dim input = File.ReadAllText(tFile)
                If regEx.IsMatch(input) Then
                    output.Add(pattern(0))
                    Exit For
                End If
            End If
        Next
        File.AppendAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
    End Sub)

或者这个更短更复杂的版本

or this shorter more complicated version

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Dim output = From pattern In patterns.AsParallel
             Let regEx = New Regex(pattern(1), RegexOptions.Compiled)
             From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
             Where tFile Like "*\#*\#*\#*\txt\#*.txt" AndAlso regEx.IsMatch(File.ReadAllText(tFile))
             Take 1
             Select pattern(0)

File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)

这篇关于如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆