如何在vb.net中使用某些正则表达式匹配拆分段落中的句子 [英] how to split a sentence in a paragraph with certain Regex match in vb.net
问题描述
输入
2015-04-22 JV RM - 保存您的清单2014-12-28 SV查看单词 - 图片在mail2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附件中的pdf-2014-11-09 SV首先在这里开始编程的小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目
我需要将其拆分为
2015-04 -22 JV RM - 保存您的清单
2014-12-28 SV查看单词 - 图片的邮件
2014-12-21 SV参见word document0
2014-12-15 SV见字
2014-11-09 SV在这里开始编程的第一个小项目。请参阅附件中的pdf
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-
2014-11-09 SV首先在这里开始编程的小项目。请参阅附件中的pdf
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-
2014-12-15 SV见字
2014-11-09 SV首先在这里开始编程的小项目。请参阅随附的pdf
2014-11-09 SV开始编程的第一个小项目
像这样我需要
私有 Sub Cmd_Rem_Click(发件人 As 对象,e As EventArgs)句柄 Cmd_Rem.Click
Dim input As String = Txt_Commantaarintern.Text
Dim result As String ()= Regex.Split(输入, (?< = [' A-Za-z0-9] [\。\!\?])\ + +(?= [AZ]))
对于 每个 s 作为 字符串 在结果
Console.WriteLine(s )
Txt_After_Remove .Text = s
下一页
结束 Sub
请指导我在哪里停留
Regex.Split可能不会给你太多帮助:它会删除匹配代码并丢弃它。由于您的新行数据包含您要保留的信息,这是一个问题。
我会使用这样的正则表达式:
\d {4} -\\\\\\然后使用https://msdn.microsoft.com/en-us/library /system.text.regularexpressions.capture.index(v=vs.110).aspx属性告诉我每行开始的位置。
然后我可以使用string.Substring分割掉每一行。
除了 OriginalGriff [ ^ ]回答,这里是他的话的实现;)
string s = @ 2015-04-22 JV RM - 保存清单2014 -12-28 SV见Word - 图像在mail2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV这是开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附件中的pdf-2014-11-09 SV首先在这里开始编程的小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf这里开始编程的第一个小项目;
System.Text.RegularExpressions.Regex searchTerm =
new System.Text.RegularExpressions.Regex( @ (\d {4} -\d {2} -\d {2}));
var matches = 来自 System.Text.RegularExpressions.Match匹配 searchTerm.Matches(s)
选择 new {match.Value,match.Index};
/ / 参见结果#1
var matchedValues = matches。选择((Record,RowNo)=> new
{
Index = RowNo ++,
Date = Record.Value,
GetTextFrom = Record.Index,
GetText Length = matches.Skip(RowNo ++)。Take( 1 )。选择(a => a.Index - Record.Index).FirstOrDefault()
});
// 参见结果#2;
列表< string> ; lines = new List< string>();
foreach ( var line in matchedValues)
{
lines.Add(s.Substring(line.GetTextFrom,line.GetTextLength == 0 ?s .Length - line .GetTextFrom:line.GetTextLength));
}
// 参见结果#3< / string>< / string>
Dim s < span class =code-keyword> As String = 2015-04-22 JV RM - 保存您的清单2014-12-28 SV查看单词 - 图片的邮件内容2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV在这里开始编程的第一个小项目。请参阅附件中的pdf。在这里开始编程的第一个小项目。参见附件中的pdf-2014-11-09 SV这里开始编程的第一个小项目。参见pdf in attach2014-11-09 SV这里开始编程的第一个小项目。参见附件中的pdf-2014-12-15 SV见word2014-11-09 SV这里开始编程的第一个小项目。参见attach2014-11-09中的pdf SV在这里开始编程的第一个小项目
Dim searchTerm As 新 System.Text.RegularExpressions.Regex(< span class =code-string> (\d {4} -\d {2} -\d {2}))
Dim matches = from match In searchTerm.Matches(s) 选择 新 使用 {_
match.Value,_
match.Index _
}
Dim matchedValues = matches。[选择](功能(Record,RowNo)新 < span class =code-keyword>使用 {_
Key .Index = System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1 ),_
键。[日期] = Record.Value, _
Key .GetTextFrom = Record.Index,_
Key .GetTextLength = matches.Skip(System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1 ))。取( 1 )。[选择]( 功能(a)a.Index - Record.Index).FirstOrDefault()_
})
Dim 行作为 新列表( Of 字符串)()
对于 每个行在 matchedValues
lines.Add(s.Substring(line.GetTextFrom,如果(line.GetTextLength = 0 ,s.Length - line.GetTextFrom,line.GetTextLength)))
下一步
结果#1(匹配
)
价值指数
2015-04 -22 0
2014-12-28 39
2014-12-21 81
2014-12-15 113
2014-11-09 135
2014-11 -09 215
2014-11-09 296
2014-11-09 376
2014-12-15 457
2014-11-09 479
2014-11 -09 559
结果#2(matchedValues
)
<前lang =text> 索引日期GetTextFrom GetTextLength
0 2015-04-22 0 39
1 2014-12-28 39 42
2 2014-12-21 81 32
3 2014-12-15 113 22
4 2014-11-09 135 80
5 2014-11-09 215 81
6 2014 -11-09 296 80
7 2014-11-09 376 81
8 2014-12-15 457 22
9 2014-11-09 479 80
10 2014-11 -09 559 0
结果#3(行
)
2015-04-22 JV RM - 在此保存您的清单
2014-12-28 SV见字 - 图片在邮箱
2014-12-21 SV见word document0
2014-12-15 SV见字
2014-11-09 SV在这里开始编程的第一个小项目。请参阅pdf in attach
2014-11-09 SV首先在这里开始编程的小项目。请参阅附件中的pdf-
2014-11-09 SV这里开始编程的第一个小项目。请参阅pdf in attach
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-
2014-12-15 SV见字
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach
2014-11-09 SV首先开始编程的小项目
]
Ooopppsss ...:omg:
我错过了这个问题被称为VB.NET。我稍后会改进我的答案;)
完成(已添加VB.NET代码)!
根据OP''对灵魂的评论1
Sub Main
Dim s As String = 2015-04-22 JV RM - 保存您的清单2014-12-28 SV查看单词 - 图片在mail2014-12-21 SV参见word document02014-12-15 SV请参阅word2014-11-09 SV这是开始编程的第一个小项目。请参阅附件2016-14-11 SV中的pdf首先要在这里开始编程的小项目。请参阅附件中的pdf - 2014-11-09 SV第一个小项目到在这里开始编程。请参阅附件中的pdf。在这里开始编程的第一个小项目。参见附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。 pdf in attach2014-11-09 SV First s在这里开始编程的商城项目
Dim searchTerm 作为 新 System.Text.RegularExpressions.Regex( (\\ \\ d {4} -\d {2} -\d {2}))
Dim 匹配=来自匹配在 searchTerm.Matches(s)选择 新 使用 {_
match.Value,_
match.Index _
}
Dim matchedValues =来自m 匹配_
让 RowNo =增量_
选择 新 使用 {_
Key .Index = RowNo,_
Key。[日期] = m.Value,_
Key .GetTextFrom = m.Index,_
Key .GetTextLength = matches.Skip(RowNo).Take( 1 )。[选择](功能 (a)a.Index - m.Index).FirstOrDefault()_
}
Dim 行 As 新列表( MyMessage)()
对于 每个行在 matchedValues
Dim sDate as String = s。子串(line.GetTextFrom, 10 )
' Console.WriteLine({0} - {1},line.GetTextFrom,If(line.GetTextLength Is Nothing,s.Length - line.Get TextFrom,line.GetTextLength)-10)
Dim sMsg as 字符串 = s.Substring(line.GetTextFrom + 10,如果(line.GetTextLength Nothing ,s.Length - line.GetTextFrom,line.GetTextLength)-10).Replace( - , )。修剪()
Dim oMsg = 新 MyMessage(sDate,sMsg )
如果 不 lines.Contains(oMsg)然后 lines.Add(oMsg)
下一步
结束 Sub
'在此处定义其他方法和类
公共 共享 功能 increment()作为 整数
静态 i 作为 整数
i = i + 1
返回 i
结束 功能
Public Class MyMessage
Implements IEquatable( Of MyMessage)
Dim sDate As 字符串 = 字符串 .Empty
Dim sMessage As 字符串 = 字符串 .Empty
公共 Sub 新(_ Date as String ,_ Message As String )
sDate = _Date
sMessage = _Message
结束 Sub
公开 属性 aDate 为 字符串
获取
返回 sDate
结束 获取
设置(值作为 字符串)
sDate = value
结束 设置
结束 属性
公共 属性 aMessage 作为 字符串
获取
返回 sMessage
结束 获取
设置(值正如 字符串)
sMessage = value
结束 设置
结束 属性
公共 覆盖 Function Equals(obj As Object )作为 布尔
如果 obj 什么 然后
返回 错误
结束 如果
Dim objMyMessage As MyMessage = TryCast (obj,MyMessage)
如果 objMyMessage < span class =code-keyword>没什么 然后
返回 错误
其他
返回等于(objMyMessage)
结束 如果
结束 函数
公共 覆盖 函数 GetHashCode()作为 整数
返回 aMessage
结束 功能
公共 重载 功能等于(其他<跨度cl ass =code-keyword> As MyMessage) As Boolean _
Implements IEquatable( MyMessage).Equals
如果其他 Nothing 那么
返回 错误
结束 如果
返回(我 .aMessage.Equals(other.aMessage))
结束 功能
结束 类
结果:
aDate aMessa ge
2015-04-22 JV RM保存您的清单
2014-12-28 SV查看邮件中的Word Image
2014-12-21 SV参见word document0
2014-12-15 SV见字
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach
2014-11-09 SV首先开始编程的小项目
INPUT
2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here
I neeed to split this as
2015-04-22 JV RM - Save your list here
2014-12-28 SV See Word - Image in the mail
2014-12-21 SV See word document0
2014-12-15 SV See word
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here. See the pdf in attach-
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here. See the pdf in attach-
2014-12-15 SV See word
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here
Like this I need
Private Sub Cmd_Rem_Click(sender As Object, e As EventArgs) Handles Cmd_Rem.Click
Dim input As String = Txt_Commantaarintern.Text
Dim result As String() = Regex.Split(input, "(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])")
For Each s As String In result
Console.WriteLine(s)
Txt_After_Remove .Text =s
Next
End Sub
Please Guide me Where i Stuck
Regex.Split probably isn''t going to help you much: it removes the match code and discards it. Since your "new line" data contains info you want to keep, that''s a problem.
I''d use a regex like this:
\d{4}-\d\d-\d\dand then use the https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.capture.index(v=vs.110).aspx property to tell me where each line starts.
I could then split out each individual line using string.Substring
In addition to OriginalGriff[^] answer, here is an implementation of His words ;)
string s = @"2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here"; System.Text.RegularExpressions.Regex searchTerm = new System.Text.RegularExpressions.Regex(@"(\d{4}-\d{2}-\d{2})"); var matches = from System.Text.RegularExpressions.Match match in searchTerm.Matches(s) select new{match.Value, match.Index}; //see result #1 var matchedValues = matches.Select((Record,RowNo)=>new { Index = RowNo++, Date = Record.Value, GetTextFrom = Record.Index, GetTextLength = matches.Skip(RowNo++).Take(1).Select(a=>a.Index - Record.Index).FirstOrDefault() }); //see Result #2; List<string> lines = new List<string>(); foreach(var line in matchedValues) { lines.Add(s.Substring(line.GetTextFrom, line.GetTextLength==0 ? s.Length - line.GetTextFrom : line.GetTextLength)); } //see result #3</string></string>
Dim s As String = "2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here" Dim searchTerm As New System.Text.RegularExpressions.Regex("(\d{4}-\d{2}-\d{2})") Dim matches = From match In searchTerm.Matches(s) Select New With { _ match.Value, _ match.Index _ } Dim matchedValues = matches.[Select](Function(Record, RowNo) New With { _ Key .Index = System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1), _ Key .[Date] = Record.Value, _ Key .GetTextFrom = Record.Index, _ Key .GetTextLength = matches.Skip(System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1)).Take(1).[Select](Function(a) a.Index - Record.Index).FirstOrDefault() _ }) Dim lines As New List(Of String)() For Each line In matchedValues lines.Add(s.Substring(line.GetTextFrom, If(line.GetTextLength = 0, s.Length - line.GetTextFrom, line.GetTextLength))) Next
Result #1 (matches
)
Value Index 2015-04-22 0 2014-12-28 39 2014-12-21 81 2014-12-15 113 2014-11-09 135 2014-11-09 215 2014-11-09 296 2014-11-09 376 2014-12-15 457 2014-11-09 479 2014-11-09 559
Result #2 (matchedValues
)
Index Date GetTextFrom GetTextLength 0 2015-04-22 0 39 1 2014-12-28 39 42 2 2014-12-21 81 32 3 2014-12-15 113 22 4 2014-11-09 135 80 5 2014-11-09 215 81 6 2014-11-09 296 80 7 2014-11-09 376 81 8 2014-12-15 457 22 9 2014-11-09 479 80 10 2014-11-09 559 0
Result #3 (lines
)
2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail 2014-12-21 SV See word document0 2014-12-15 SV See word 2014-11-09 SV First small items to start programming here. See the pdf in attach 2014-11-09 SV First small items to start programming here. See the pdf in attach- 2014-11-09 SV First small items to start programming here. See the pdf in attach 2014-11-09 SV First small items to start programming here. See the pdf in attach- 2014-12-15 SV See word 2014-11-09 SV First small items to start programming here. See the pdf in attach 2014-11-09 SV First small items to start programming here
[EDIT]
Ooopppsss... :omg:
I missed that question is taged as VB.NET. I''ll improve my answer later ;)
Done (VB.NET code has been added)!
As per OP''s comments to the soultion 1
Sub Main Dim s As String = "2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here" Dim searchTerm As New System.Text.RegularExpressions.Regex("(\d{4}-\d{2}-\d{2})") Dim matches = From match In searchTerm.Matches(s) Select New With { _ match.Value, _ match.Index _ } Dim matchedValues = From m In matches _ Let RowNo = increment _ Select New With { _ Key .Index = RowNo, _ Key .[Date] = m.Value, _ Key .GetTextFrom = m.Index, _ Key .GetTextLength = matches.Skip(RowNo).Take(1).[Select](Function(a) a.Index - m.Index).FirstOrDefault() _ } Dim lines As New List(Of MyMessage)() For Each line In matchedValues Dim sDate as String = s.Substring(line.GetTextFrom, 10) 'Console.WriteLine("{0} - {1}", line.GetTextFrom, If(line.GetTextLength Is Nothing, s.Length - line.GetTextFrom, line.GetTextLength)-10) Dim sMsg as String = s.Substring(line.GetTextFrom+10, If(line.GetTextLength Is Nothing, s.Length - line.GetTextFrom, line.GetTextLength)-10).Replace("-","").Trim() Dim oMsg = New MyMessage(sDate, sMsg) If Not lines.Contains(oMsg) Then lines.Add(oMsg) Next End Sub ' Define other methods and classes here Public Shared Function increment() As Integer Static i As Integer i = i + 1 Return i End Function Public Class MyMessage Implements IEquatable(Of MyMessage) Dim sDate As String = String.Empty Dim sMessage As String = String.Empty Public Sub New(_Date as String, _Message As String) sDate = _Date sMessage = _Message End Sub Public Property aDate As String Get Return sDate End Get Set (value As String) sDate = value End Set End Property Public Property aMessage As String Get Return sMessage End Get Set (value As String) sMessage = value End Set End Property Public Overrides Function Equals(obj As Object) As Boolean If obj Is Nothing Then Return False End If Dim objMyMessage As MyMessage = TryCast(obj, MyMessage) If objMyMessage Is Nothing Then Return False Else Return Equals(objMyMessage) End If End Function Public Overrides Function GetHashCode() As Integer Return aMessage End Function Public Overloads Function Equals(other As MyMessage) As Boolean _ Implements IEquatable(Of MyMessage).Equals If other Is Nothing Then Return False End If Return (Me.aMessage.Equals(other.aMessage)) End Function End Class
Result:
aDate aMessage 2015-04-22 JV RM Save your list here 2014-12-28 SV See Word Image in the mail 2014-12-21 SV See word document0 2014-12-15 SV See word 2014-11-09 SV First small items to start programming here. See the pdf in attach 2014-11-09 SV First small items to start programming here
这篇关于如何在vb.net中使用某些正则表达式匹配拆分段落中的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!