选择关键字之间的文本 [英] Select text between key words

查看:31
本文介绍了选择关键字之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对选择文本块的后续问题并合并到新文档中

我有一个 SGM 文档,在我的 sgm 文件中添加了注释和注释.我需要提取开始/停止注释之间的字符串,以便我可以将它们放在临时文件中进行修改.现在它正在选择所有内容,包括开始/停止注释和开始/停止注释之外的数据.

I have a SGM document with comments added and comments in my sgm file. I need to extract the strings in between the start/stop comments so I can put them in a temporary file for modification. Right now it's selecting everything including the start/stop comments and data outside of the start/stop comments.

Dim DirFolder As String = txtDirectory.Text
Dim Directory As New IO.DirectoryInfo(DirFolder)
Dim allFiles As IO.FileInfo() = Directory.GetFiles("*.sgm")
Dim singleFile As IO.FileInfo
Dim Prefix As String
Dim newMasterFilePath As String
Dim masterFileName As String
Dim newMasterFileName As String
Dim startMark As String = "<!--#start#-->"
Dim stopMark As String = "<!--#stop#-->"
searchDir = txtDirectory.Text
Prefix = txtBxUnique.Text

For Each singleFile In allFiles
    If File.Exists(singleFile.FullName) Then
        Dim fileName = singleFile.FullName
        Debug.Print("file name : " & fileName)
        ' A backup first    
        Dim backup As String = fileName & ".bak"
        File.Copy(fileName, backup, True)

        ' Load lines from the source file in memory
        Dim lines() As String = File.ReadAllLines(backup)

        ' Now re-create the source file and start writing lines inside a block
        ' Evaluate all the lines in the file.
        ' Set insideBlock to false
        Dim insideBlock As Boolean = False
        Using sw As StreamWriter = File.CreateText(backup)
            For Each line As String In lines
                If line = startMark Then
                    ' start writing at the line below
                    insideBlock = True
                    ' Evaluate if the next line is <!Stop>
                ElseIf line = stopMark Then
                    ' Stop writing
                    insideBlock = False
                ElseIf insideBlock = True Then
                    ' Write the current line in the block
                    sw.WriteLine(line)
                End If
            Next
        End Using
    End If

Next

这是要测试的示例文本.

This is the example text to test on.

<chapter id="Chapter_Overview"> <?Pub Lcl _divid="500" _parentid="0"> 
<title>Learning how to gather data</title>
<!--#start#-->
<section>
<title>ALTERNATE MISSION EQUIPMENT</title>
<para0 verdate="18 Jan 2019" verstatus="ver">
<title>
<applicabil applicref="xxx">
</applicabil>Three-Button Trackball Mouse</title>
<para>This is the example to grab all text between start and stop comments. 
</para></para0>
</section>
<!--#stop#-->

注意事项:开始和停止注释总是在一个新行上,一个文档可以有多个开始/停止部分

Things to note: the start and stop comments ALWAYS fall on a new line, a document can have multiple start/stop sections

我想也许在这个上使用正则表达式

I thought maybe using a regex on this

(<section>[\w+\w]+.*?<\/section>)\R(<\?Pub _gtinsert.*>\R<pgbrk pgnum.*?>\R<\?Pub /_gtinsert>)*

或者可能使用 IndexOf 和 LastIndexOf,但我无法使其正常工作.

Or maybe use IndexOf and LastIndexOf, but I couldn't get that working.

推荐答案

可以读取整个文件并使用{"<!--#start#-->的字符串数组拆分成一个数组;", "<!--#stop#-->"} 来分割,变成这个

You can read the entire file and split it into an array using the string array of {"<!--#start#-->", "<!--#stop#-->"} to split, into this

  • 元素 0:"<!--#start#-->"
  • 之前的文本
  • 元素 1:""""
  • 元素 2:"<!--#stop#-->"
  • 之后的文本
  • Element 0: Text before "<!--#start#-->"
  • Element 1: Text between "<!--#start#-->" and "<!--#stop#-->"
  • Element 2: Text after "<!--#stop#-->"

并获取元素 1.然后将其写入您的备份.

and take element 1. Then write it to your backup.

Dim text = File.ReadAllText(backup).Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)(1)
Using sw As StreamWriter = File.CreateText(backup)
    sw.Write(text)
End Using

编辑以解决评论

我确实使原始代码有点紧凑.它可以扩展为以下内容,允许您添加一些验证

I did make the original code a little compact. It can be expanded out into the following, which allows you to add some validation

Dim text = File.ReadAllText(backup)
Dim split = text.Split({startMark, stopMark}, StringSplitOptions.RemoveEmptyEntries)
If split.Count() <> 3 Then Throw New Exception("File didn't contain one or more delimiters.")
text = split(1)
Using sw As StreamWriter = File.CreateText(backup)
    sw.Write(text)
End Using

这篇关于选择关键字之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆