如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件? [英] How to match multiple regex patterns in multiple files and write something to a log file?
问题描述
我想在我在文本框中给出路径的文件夹内的文件 (*.txt) 中搜索一些正则表达式模式,并且该文件夹包含其他带有 txt 文件的子文件夹,格式为 12345\2031\30201\txt\120.txt 并且如果模式即使在一个文件中也匹配,则将一个字符串写入日志文件中,该文件在我在文本框中给出的路径的文件夹内创建,然后移至下一个正则表达式等等到目前为止我所做的是
I want to search some regex patterns in files (*.txt) which are inside a folder whose path I'have given in a text box, and the folder contains other sub-folders with txt files in the form 12345\2031\30201\txt\120.txt and if the pattern matches even in one file, then a string is written on a log file which is created inside the folder whose path I've given in the text box and then it moves on to the next regex and so on What I've done so far is
Dim tLoc As String = TextBox1.Text
Dim txtFilesArray = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories).Where(Function(f) f Like "*\#*\#*\#*\txt\#*.txt")
Dim fileLoc As String = tLoc & "\Checklist.log"
Dim fs As FileStream = Nothing
If (Not File.Exists(fileLoc)) Then
fs = File.Create(fileLoc)
Using fs
End Using
End If
For Each tFile In txtFilesArray
Dim input As String = File.ReadAllText(tFile)
Dim pattern1 As New Regex("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")
Dim pattern2 As New Regex("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")
If pattern1.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check figure link")
FileClose()
End If
If pattern2.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check table link")
FileClose()
End If
Next
但问题是:1)即使 pattern1
在多个文件中匹配,我也希望它只在日志文件中写入一次 Check figure link 字符串,而不是每次在其中找到匹配项时写入不同的文件和 pattern2....patternN 相同,此外,我希望程序在 pattern1
在一个文件中匹配时继续下一个正则表达式模式匹配(无需在其他文件中寻找相同的模式)2)我想在这个程序中使用大约一百个正则表达式模式,谁能告诉我如何缩短编码?
But the problems are:
1) Even if pattern1
matches in multiple files, I want it to write the string Check figure link only once in the log file and not in every time it finds a match in different files and same for pattern2....patternN, furthermore, I want the program to move on to the next regex pattern match the moment the pattern1
matches in one file (no need to look for the same pattern in other files)
2)I have around a hundred of regex patterns that I want to use in this program, can anyone tell me how do I shorten the coding?
推荐答案
您可以将模式放入某种集合中,然后在找到时将其从集合中删除
You can put the patterns in some kind of collection and then remove them from it when found
Dim re = Function(p$) New Regex(p, RegexOptions.Compiled)
Dim patterns = New Dictionary(Of String, Regex) From {
{"Check figure link", re("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")},
{"Check table link", re("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")}
}
Dim output = New List(Of String)
Dim tLoc = TextBox1.Text
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
For Each tFile In txtFiles
If Not tFile Like "*\#*\#*\#*\txt\#*.txt" Then Continue For
Dim input = File.ReadAllText(tFile)
Dim match = ""
For Each pattern In patterns
If pattern.Value.IsMatch(input) Then
match = pattern.Key
Exit For
End If
Next
If match > "" Then
output.Add(match)
patterns.Remove(match)
End If
Next
File.WriteAllLines(tLoc.TrimEnd("\"c) & "\Checklist.log", output)
如果您想将每个模式与所有文件进行比较,那么并行化(同时在多个处理器上运行)会更容易,因为无需从集合中删除它们:
If you want to compare each pattern against all files, then it will be easier to parallelize (run on multiple processors at the same time) because there will be no need to remove them from the collection:
Dim patterns = New List(Of String()) From {
({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}
Parallel.ForEach(patterns,
Sub(pattern)
Dim tLoc = TextBox1.Text
Dim output = New List(Of String)
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
Dim regEx = New Regex(pattern(1), RegexOptions.Compiled)
For Each tFile In txtFiles
If tFile Like "*\#*\#*\#*\txt\#*.txt" Then
Dim input = File.ReadAllText(tFile)
If regEx.IsMatch(input) Then
output.Add(pattern(0))
Exit For
End If
End If
Next
File.AppendAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
End Sub)
或者这个更短更复杂的版本
or this shorter more complicated version
Dim patterns = New List(Of String()) From {
({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}
Dim output = From pattern In patterns.AsParallel
Let regEx = New Regex(pattern(1), RegexOptions.Compiled)
From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
Where tFile Like "*\#*\#*\#*\txt\#*.txt" AndAlso regEx.IsMatch(File.ReadAllText(tFile))
Take 1
Select pattern(0)
File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
这篇关于如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!