如何在vb.net中使用某些正则表达式匹配拆分段落中的句子 [英] how to split a sentence in a paragraph with certain Regex match in vb.net

查看:101
本文介绍了如何在vb.net中使用某些正则表达式匹配拆分段落中的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入

2015-04-22 JV RM - 保存您的清单2014-12-28 SV查看单词 - 图片在mail2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附件中的pdf-2014-11-09 SV首先在这里开始编程的小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目



我需要将其拆分为

2015-04 -22 JV RM - 保存您的清单

2014-12-28 SV查看单词 - 图片的邮件

2014-12-21 SV参见word document0

2014-12-15 SV见字

2014-11-09 SV在这里开始编程的第一个小项目。请参阅附件中的pdf

2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-

2014-11-09 SV首先在这里开始编程的小项目。请参阅附件中的pdf

2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-

2014-12-15 SV见字

2014-11-09 SV首先在这里开始编程的小项目。请参阅随附的pdf

2014-11-09 SV开始编程的第一个小项目



像这样我需要

 私有  Sub  Cmd_Rem_Click(发件人 As  对象,e  As  EventArgs)句柄 Cmd_Rem.Click 
Dim input As String = Txt_Commantaarintern.Text
Dim result As String ()= Regex.Split(输入, (?< = [' A-Za-z0-9] [\。\!\?])\ + +(?= [AZ])
对于 每个 s 作为 字符串 结果
Console.WriteLine(s )
Txt_After_Remove .Text = s
下一页

结束 Sub



请指导我在哪里停留

解决方案

Regex.Split可能不会给你太多帮助:它会删除匹配代码并丢弃它。由于您的新行数据包含您要保留的信息,这是一个问题。

我会使用这样的正则表达式:

 \d {4} -\\\\\\ 

然后使用https://msdn.microsoft.com/en-us/library /system.text.regularexpressions.capture.index(v=vs.110).aspx属性告诉我每行开始的位置。

然后我可以使用string.Substring分割掉每一行。


除了 OriginalGriff [ ^ ]回答,这里是他的话的实现;)





  string  s =  @  2015-04-22 JV RM  - 保存清单2014 -12-28 SV见Word  - 图像在mail2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV这是开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附件中的pdf-2014-11-09 SV首先在这里开始编程的小项目。请参阅attach2014-11-09 SV中的pdf首先在这里开始编程的小项目。请参阅附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。请参阅attach2014-11-09 SV中的pdf这里开始编程的第一个小项目; 

System.Text.RegularExpressions.Regex searchTerm =
new System.Text.RegularExpressions.Regex( @ (\d {4} -\d {2} -\d {2}));

var matches = 来自 System.Text.RegularExpressions.Match匹配 searchTerm.Matches(s)
选择 new {match.Value,match.Index};
/ / 参见结果#1

var matchedValues = matches。选择((Record,RowNo)=> new
{
Index = RowNo ++,
Date = Record.Value,
GetTextFrom = Record.Index,
GetText Length = matches.Skip(RowNo ++)。Take( 1 )。选择(a => a.Index - Record.Index).FirstOrDefault()
});
// 参见结果#2;

列表< string> ; lines = new List< string>();
foreach var line in matchedValues)
{
lines.Add(s.Substring(line.GetTextFrom,line.GetTextLength == 0 ?s .Length - line .GetTextFrom:line.GetTextLength));
}
// 参见结果#3< / string>< / string>





  Dim  s < span class =code-keyword> As   String  =   2015-04-22 JV RM  - 保存您的清单2014-12-28 SV查看单词 - 图片的邮件内容2014-12-21 SV参见word document02014-12-15 SV参见word2014-11-09 SV在这里开始编程的第一个小项目。请参阅附件中的pdf。在这里开始编程的第一个小项目。参见附件中的pdf-2014-11-09 SV这里开始编程的第一个小项目。参见pdf in attach2014-11-09 SV这里开始编程的第一个小项目。参见附件中的pdf-2014-12-15 SV见word2014-11-09 SV这里开始编程的第一个小项目。参见attach2014-11-09中的pdf SV在这里开始编程的第一个小项目 

Dim searchTerm As System.Text.RegularExpressions.Regex(< span class =code-string> (\d {4} -\d {2} -\d {2})

Dim matches = from match In searchTerm.Matches(s) 选择 使用 {_
match.Value,_
match.Index _
}

Dim matchedValues = matches。[选择](功能(Record,RowNo) < span class =code-keyword>使用 {_
Key .Index = System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1 ),_
键。[日期] = Record.Value, _
Key .GetTextFrom = Record.Index,_
Key .GetTextLength = matches.Skip(System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1 ))。取( 1 )。[选择]( 功能(a)a.Index - Record.Index).FirstOrDefault()_
})

Dim 作为 列表( Of 字符串)()
对于 每个 matchedValues
lines.Add(s.Substring(line.GetTextFrom,如果(line.GetTextLength = 0 ,s.Length - line.GetTextFrom,line.GetTextLength)))
下一步





结果#1(匹配

 价值指数 
2015-04 -22 0
2014-12-28 39
2014-12-21 81
2014-12-15 113
2014-11-09 135
2014-11 -09 215
2014-11-09 296
2014-11-09 376
2014-12-15 457
2014-11-09 479
2014-11 -09 559





结果#2( matchedValues

<前lang =text> 索引日期GetTextFrom GetTextLength
0 2015-04-22 0 39
1 2014-12-28 39 42
2 2014-12-21 81 32
3 2014-12-15 113 22
4 2014-11-09 135 80
5 2014-11-09 215 81
6 2014 -11-09 296 80
7 2014-11-09 376 81
8 2014-12-15 457 22
9 2014-11-09 479 80
10 2014-11 -09 559 0





结果#3(

 2015-04-22 JV RM  - 在此保存您的清单
2014-12-28 SV见字 - 图片在邮箱
2014-12-21 SV见word document0
2014-12-15 SV见字
2014-11-09 SV在这里开始编程的第一个小项目。请参阅pdf in attach
2014-11-09 SV首先在这里开始编程的小项目。请参阅附件中的pdf-
2014-11-09 SV这里开始编程的第一个小项目。请参阅pdf in attach
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach-
2014-12-15 SV见字
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach
2014-11-09 SV首先开始编程的小项目





]

Ooopppsss ...:omg:

我错过了这个问题被称为VB.NET。我稍后会改进我的答案;)


完成(已添加VB.NET代码)!


根据OP''对灵魂的评论1

  Sub  Main 
Dim s As String = 2015-04-22 JV RM - 保存您的清单2014-12-28 SV查看单词 - 图片在mail2014-12-21 SV参见word document02014-12-15 SV请参阅word2014-11-09 SV这是开始编程的第一个小项目。请参阅附件2016-14-11 SV中的pdf首先要在这里开始编程的小项目。请参阅附件中的pdf - 2014-11-09 SV第一个小项目到在这里开始编程。请参阅附件中的pdf。在这里开始编程的第一个小项目。参见附录中的pdf-2014-12-15 SV参见word2014-11-09 SV这里开始编程的第一个小项目。 pdf in attach2014-11-09 SV First s在这里开始编程的商城项目

Dim searchTerm 作为 System.Text.RegularExpressions.Regex( (\\ \\ d {4} -\d {2} -\d {2})

Dim 匹配=来自匹配 searchTerm.Matches(s)选择 使用 {_
match.Value,_
match.Index _
}

Dim matchedValues =来自m 匹配_
RowNo =增量_
选择 使用 {_
Key .Index = RowNo,_
Key。[日期] = m.Value,_
Key .GetTextFrom = m.Index,_
Key .GetTextLength = matches.Skip(RowNo).Take( 1 )。[选择](功能 (a)a.Index - m.Index).FirstOrDefault()_
}

Dim As 列表( MyMessage)()
对于 每个 matchedValues
Dim sDate as String = s。子串(line.GetTextFrom, 10
' Console.WriteLine({0} - {1},line.GetTextFrom,If(line.GetTextLength Is Nothing,s.Length - line.Get TextFrom,line.GetTextLength)-10)
Dim sMsg as 字符串 = s.Substring(line.GetTextFrom + 10,如果(line.GetTextLength Nothing ,s.Length - line.GetTextFrom,line.GetTextLength)-10).Replace( - )。修剪()
Dim oMsg = MyMessage(sDate,sMsg )
如果 lines.Contains(oMsg)然后 lines.Add(oMsg)
下一步

结束 Sub

' 在此处定义其他方法和类

公共 共享 功能 increment()作为 整数
静态 i 作为 整数
i = i + 1
返回 i
结束 功能


Public Class MyMessage
Implements IEquatable( Of MyMessage)

Dim sDate As 字符串 = 字符串 .Empty
Dim sMessage As 字符串 = 字符串 .Empty

公共 Sub (_ Date as String ,_ Message As String
sDate = _Date
sMessage = _Message
结束 Sub

公开 属性 aDate 字符串
获取
返回 sDate
结束 获取
设置(值作为 字符串
sDate = value
结束 设置
结束 属性

公共 属性 aMessage 作为 字符串
获取
返回 sMessage
结束 获取
设置(值正如 字符串
sMessage = value
结束 设置
结束 属性

公共 覆盖 Function Equals(obj As Object 作为 布尔
如果 obj 什么 然后
返回 错误
结束 如果
Dim objMyMessage As MyMessage = TryCast (obj,MyMessage)
如果 objMyMessage < span class =code-keyword>没什么 然后
返回 错误
其他
返回等于(objMyMessage)
结束 如果
结束 函数
公共 覆盖 函数 GetHashCode()作为 整数
返回 aMessage
结束 功能
公共 重载 功能等于(其他<跨度cl ass =code-keyword> As MyMessage) As Boolean _
Implements IEquatable( MyMessage).Equals
如果其他 Nothing 那么
返回 错误
结束 如果
返回 .aMessage.Equals(other.aMessage))
结束 功能

结束





结果:

  aDate aMessa ge  
2015-04-22 JV RM保存您的清单
2014-12-28 SV查看邮件中的Word Image
2014-12-21 SV参见word document0
2014-12-15 SV见字
2014-11-09 SV首先在这里开始编程的小项目。请参阅pdf in attach
2014-11-09 SV首先开始编程的小项目


INPUT
2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here

I neeed to split this as
2015-04-22 JV RM - Save your list here
2014-12-28 SV See Word - Image in the mail
2014-12-21 SV See word document0
2014-12-15 SV See word
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here. See the pdf in attach-
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here. See the pdf in attach-
2014-12-15 SV See word
2014-11-09 SV First small items to start programming here. See the pdf in attach
2014-11-09 SV First small items to start programming here

Like this I need

    Private Sub Cmd_Rem_Click(sender As Object, e As EventArgs) Handles Cmd_Rem.Click
        Dim input As String = Txt_Commantaarintern.Text
       Dim result As String() = Regex.Split(input, "(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])")
            For Each s As String In result
            Console.WriteLine(s)
Txt_After_Remove .Text =s 
        Next

    End Sub


Please Guide me Where i Stuck

解决方案

Regex.Split probably isn''t going to help you much: it removes the match code and discards it. Since your "new line" data contains info you want to keep, that''s a problem.
I''d use a regex like this:

\d{4}-\d\d-\d\d

and then use the https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.capture.index(v=vs.110).aspx property to tell me where each line starts.
I could then split out each individual line using string.Substring


In addition to OriginalGriff[^] answer, here is an implementation of His words ;)


string s = @"2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here";

System.Text.RegularExpressions.Regex searchTerm =
  new System.Text.RegularExpressions.Regex(@"(\d{4}-\d{2}-\d{2})");
			
var matches = from System.Text.RegularExpressions.Match match in searchTerm.Matches(s)
                                select new{match.Value, match.Index};
//see result #1

var matchedValues = matches.Select((Record,RowNo)=>new
			{
				Index = RowNo++,
				Date = Record.Value,
				GetTextFrom = Record.Index,
				GetTextLength = matches.Skip(RowNo++).Take(1).Select(a=>a.Index - Record.Index).FirstOrDefault()
			});
//see Result #2;

List<string> lines = new List<string>();
foreach(var line in matchedValues)
{
	lines.Add(s.Substring(line.GetTextFrom, line.GetTextLength==0 ? s.Length - line.GetTextFrom : line.GetTextLength));
}
//see result #3</string></string>



Dim s As String = "2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here"

Dim searchTerm As New System.Text.RegularExpressions.Regex("(\d{4}-\d{2}-\d{2})")

Dim matches = From match In searchTerm.Matches(s) Select New With { _
	match.Value, _
	match.Index _
}

Dim matchedValues = matches.[Select](Function(Record, RowNo) New With { _
	Key .Index = System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1), _
	Key .[Date] = Record.Value, _
	Key .GetTextFrom = Record.Index, _
	Key .GetTextLength = matches.Skip(System.Math.Max(System.Threading.Interlocked.Increment(RowNo),RowNo - 1)).Take(1).[Select](Function(a) a.Index - Record.Index).FirstOrDefault() _
})

Dim lines As New List(Of String)()
For Each line In matchedValues
	lines.Add(s.Substring(line.GetTextFrom, If(line.GetTextLength = 0, s.Length - line.GetTextFrom, line.GetTextLength)))
Next



Result #1 (matches)

Value      Index
2015-04-22 0 
2014-12-28 39 
2014-12-21 81 
2014-12-15 113 
2014-11-09 135 
2014-11-09 215 
2014-11-09 296 
2014-11-09 376 
2014-12-15 457 
2014-11-09 479 
2014-11-09 559



Result #2 (matchedValues)

Index Date       GetTextFrom GetTextLength
0     2015-04-22 0           39 
1     2014-12-28 39          42 
2     2014-12-21 81          32 
3     2014-12-15 113         22 
4     2014-11-09 135         80 
5     2014-11-09 215         81 
6     2014-11-09 296         80 
7     2014-11-09 376         81 
8     2014-12-15 457         22 
9     2014-11-09 479         80 
10    2014-11-09 559         0



Result #3 (lines)

2015-04-22 JV RM - Save your list here  
2014-12-28 SV See Word - Image in the mail 
2014-12-21 SV See word document0 
2014-12-15 SV See word 
2014-11-09 SV First small items to start programming here. See the pdf in attach 
2014-11-09 SV First small items to start programming here. See the pdf in attach- 
2014-11-09 SV First small items to start programming here. See the pdf in attach 
2014-11-09 SV First small items to start programming here. See the pdf in attach- 
2014-12-15 SV See word 
2014-11-09 SV First small items to start programming here. See the pdf in attach 
2014-11-09 SV First small items to start programming here 



[EDIT]
Ooopppsss... :omg:
I missed that question is taged as VB.NET. I''ll improve my answer later ;)

Done (VB.NET code has been added)!


As per OP''s comments to the soultion 1

Sub Main
	Dim s As String = "2015-04-22 JV RM - Save your list here 2014-12-28 SV See Word - Image in the mail2014-12-21 SV See word document02014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here. See the pdf in attach-2014-12-15 SV See word2014-11-09 SV First small items to start programming here. See the pdf in attach2014-11-09 SV First small items to start programming here"
	 
	Dim searchTerm As New System.Text.RegularExpressions.Regex("(\d{4}-\d{2}-\d{2})")
	 
	Dim matches = From match In searchTerm.Matches(s) Select New With { _
		match.Value, _
		match.Index _
	}
	 
	Dim matchedValues = From m In matches _
		Let RowNo  = increment _
		Select New With { _
			Key .Index = RowNo, _
			Key .[Date] = m.Value, _
			Key .GetTextFrom = m.Index, _
			Key .GetTextLength = matches.Skip(RowNo).Take(1).[Select](Function(a) a.Index - m.Index).FirstOrDefault() _
		}
	
	Dim lines As New List(Of MyMessage)()
	For Each line In matchedValues
		Dim sDate as String = s.Substring(line.GetTextFrom, 10)
		'Console.WriteLine("{0} - {1}", line.GetTextFrom, If(line.GetTextLength Is Nothing, s.Length - line.GetTextFrom, line.GetTextLength)-10)
		Dim sMsg as String = s.Substring(line.GetTextFrom+10, If(line.GetTextLength Is Nothing, s.Length - line.GetTextFrom, line.GetTextLength)-10).Replace("-","").Trim()
		Dim oMsg = New MyMessage(sDate, sMsg)
		If Not lines.Contains(oMsg) Then lines.Add(oMsg)
	Next
	
End Sub

' Define other methods and classes here

Public Shared Function increment() As Integer
     Static i As Integer
     i = i + 1
     Return i
End Function


Public Class MyMessage 
	Implements IEquatable(Of MyMessage)

	Dim sDate As String = String.Empty
	Dim sMessage As String = String.Empty
	
	Public Sub New(_Date as String, _Message As String)
		sDate = _Date
		sMessage = _Message
	End Sub
	
	Public Property aDate As String
		Get
			Return sDate
		End Get
		Set (value As String)
			sDate = value
		End Set
	End Property

	Public Property aMessage As String
		Get
			Return sMessage
		End Get
		Set (value As String)
			sMessage = value
		End Set
	End Property
	
    Public Overrides Function Equals(obj As Object) As Boolean 
        If obj Is Nothing Then 
            Return False 
        End If 
        Dim objMyMessage As MyMessage = TryCast(obj, MyMessage)
        If objMyMessage Is Nothing Then 
            Return False 
        Else 
            Return Equals(objMyMessage)
        End If 
    End Function 
    Public Overrides Function GetHashCode() As Integer 
        Return aMessage
    End Function 
    Public Overloads Function Equals(other As MyMessage) As Boolean _
        Implements IEquatable(Of MyMessage).Equals
        If other Is Nothing Then 
            Return False 
        End If 
        Return (Me.aMessage.Equals(other.aMessage))
    End Function 

End Class



Result:

aDate       aMessage
2015-04-22 JV RM  Save your list here 
2014-12-28 SV See Word  Image in the mail 
2014-12-21 SV See word document0 
2014-12-15 SV See word 
2014-11-09 SV First small items to start programming here. See the pdf in attach 
2014-11-09 SV First small items to start programming here 


这篇关于如何在vb.net中使用某些正则表达式匹配拆分段落中的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆