查找字符串并将其向上移动到文档的顶部 [英] Find string and move it up to the top of the document
问题描述
我到处搜索这个答案,如果它是基本的,我很抱歉.我对 VB.Net 还是很陌生.感谢大家的帮助.
I've searched everywhere for this answer and I'm sorry if it's basic. I'm still very new to VB.Net. I appreciate everyone's help.
我的问题是我的脚本有一个包含 <Entity...> 的大文件.其中的字符串后面是内容 <body...>(例如)字符串.<实体>字符串遍布整个文件.我需要做的是收集所有的<Entity>字符串并将它们向上移动到顶部文档序言.所以他们会在 [ ] 之间进行.所以基本上代码需要找到正则表达式 ^<ENTITY.*$ 剪切它并转到["并粘贴内容.
My problem is my script has a large file that have <Entity...> strings in it which are followed by content <body...> (as example) strings. The <Entity> strings are located all over the file. What I need to do is gather up all the <Entity> strings and move them up to the top document prologue. So they would be going between [ ]. So essentially the code would need to find the regex ^<ENTITY.*$ cut it and go to "[" and paste the content.
你能给我的任何帮助都会很棒.
Any help you can give me would be great.
我尝试创建一个数组来执行此操作,但失败了.然后想到使用 REGEX 来获取 <Entity 字符串,但失败了.
I've tried creating an array to do this and failed. Then thought of using a REGEX to grab the <Entity string but that failed.
然后我尝试了 file.Append,但没有奏效.
I then tried file.Append and that didn't work.
这是我想出的代码,但它不起作用.实际上构建需要很长时间.
This is the code I've come up with, but it's not working. In fact it takes a long time to build.
Dim regex = New Regex("<Entity.*$")
Dim lines As String() = File.ReadAllLines(fileName)
Dim arrEntity(0 To -1) As String
Dim regexMatches = regex.Matches(fileName)
Dim i As Integer = 0
For Each match As Match In regexMatches
'If <!ENTITY.*> is found write it to an array
Dim entityLine = match.ToString
finalValue.Append(arrEntity(i))
i += 1
Next
'Go to top of document and write the entity list between []
预期的结果是 fileName 文档包含所有 <Entity...>要在文档顶部的 [ ] 之间向上移动的行.文档中不应该有其他 <Entity 字符串,除了顶部序言.
The expected results would be the fileName document to have all the <Entity...> lines to be moved up in between the [ ] at the top of the document. There should be no other <Entity strings in the document except in the top prologue.
示例 SGM 文件
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<chginfo>
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<para0>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>
推荐答案
好吧,我使用您发布的示例文本测试了此代码:
Well, I tested this code with the sample text you have posted:
Dim largeFilePath As String = "largeFilePath"
Dim lines = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines
Where reg.IsMatch(line)
Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = -1
For Each s In entities
idx = lines.IndexOf(s, idx + 1)
dictionary.Add(idx, s)
Next
Dim deletedItems = 0
For Each itm In dictionary
lines.RemoveAt(itm.Key - deletedItems)
deletedItems += 1
Next
For Each s In dictionary.Values
lines.Insert(1, s)
Next
Using sw As New System.IO.StreamWriter("newfile.txt")
For Each line As String In lines
sw.WriteLine(line)
Next
sw.Flush()
sw.Close()
End Using
这是最终结果:
<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<chginfo>
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<para0>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>
代码已在 100 MB 文件上更新和测试,处理仅用了 2 秒!
The code has been updated and tested on a 100 MB file and the processing took only 2 seconds!
这篇关于查找字符串并将其向上移动到文档的顶部的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!