使用正则表达式 C# 解析字幕文件 [英] Parse subtitle file using regex C#

查看:67
本文介绍了使用正则表达式 C# 解析字幕文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到数字、输入和输出时间码点以及文本的所有行.

I need to find the number, the in and out timecode points and all lines of the text.

9
00:09:48,347 --> 00:09:52,818
- Let's see... what else she's got?
- Yea... ha, ha.

10
00:09:56,108 --> 00:09:58,788
What you got down there, missy?

11
00:09:58,830 --> 00:10:00,811
I wouldn't do that!

12
00:10:03,566 --> 00:10:07,047
-Shit, that's not enough!
-Pull her back!

我目前正在使用这种模式,但它忘记了所有两行文本

I'm currently using this pattern but it forgets all two lines text

(?<Order>\d+)\r\n(?<StartTime>(\d\d:){2}\d\d,\d{3}) --> (?<EndTime>(\d\d:){2}\d\d,\d{3})\r\n(?<Sub>.+)(?=\r\n\r\n\d+|$)

任何帮助将不胜感激.

推荐答案

我认为正则表达式有两个问题.第一个是 (?.+) 末尾附近的 . 不匹配换行符.因此,您可以将其修改为:

I think there's two problems with the regex. The first is that the . near the end in (?<Sub>.+) is not matching newlines. So you could modify it to:

(?<Sub>(.|[\r\n])+?)

或者您可以指定 RegexOptions.Singleline 作为正则表达式的选项.该选项唯一能做的就是让点匹配换行符.

Or you could specify RegexOptions.Singleline as an option to the regex. The only thing the option does is make the dot match newlines.

第二个问题是 .+ 匹配尽可能多的行.你可以让它不贪婪:

The second problem is that .+ matches as many lines as it can. You can make it non-greedy like:

(?<Sub>(.|[\r\n])+?(?=\r\n\r\n|$))

这匹配最少数量的以空行或字符串结尾结尾的文本.

This matches the least amount of text that ends with an empty line or the end of the string.

这篇关于使用正则表达式 C# 解析字幕文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆