在字符串格式化句子,用C# [英] Formatting sentences in a string using C#
问题描述
我有多个句子的字符串。我如何大写第一个字的每一句的第一个字母。类似段落格式字。
I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.
例如:这是一些代码。该代码是在C#。
的输出中必须是这是一些代码。该代码是用C# 。
eg ."this is some code. the code is in C#. " The ouput must be "This is some code. The code is in C#".
一种方法是分裂的基础上。的字符串,然后首字母大写,然后归队。
one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.
有没有更好的解决方案吗?
Is there a better solution?
推荐答案
在我看来,当涉及到潜在的复杂规则为基础的字符串匹配和替换 - 你不能得到更好比基于正则表达式的解决方案(尽管事实上,他们是如此难以阅读!)。这提供了最佳的性能和内存效率,在我看来 - 你会在多快。这将是惊讶。
In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.
我使用的 Regex.Replace接受输入字符串,正则表达式和MatchEvaluator委托超载。 ,一个MatchEvaluator是接受匹配
对象作为输入,并返回一个字符串替换函数
I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match
object as input and returns a string replacement.
下面的代码:
public static string Capitalise(string input)
{
//now the first character
return Regex.Replace(input, @"(?<=(^|[.;:])\s*)[a-z]",
(match) => { return match.Value.ToUpper(); });
}
正则表达式使用(小于?=)构建(零宽度正后向)来限制仅捕获到的字符串的开始之前AZ字符或标点符号你想要的。在 [;:]
位,你可以添加你想要的(如多余的人[;:?]
补充?和字。
The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:]
bit you can add the extra ones you want (e.g. [.;:?."]
to add ? and " characters.
这意味着,同样,你MatchEvaluator没有做任何不必要的字符串连接(要避免性能原因)。
This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).
所有通过关于使用RegexOptions.Compiled其他回答者中提及了其他的东西也从性能上看相关的。静态Regex.Replace 。方法确实提供了非常相似的性能优势,但(那里只是一个额外的词典查询)
All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).
就像我说的 - 我会感到惊讶,如果任何其他非正则表达式解决方案,这里将更好地工作,并以最快的速度。
Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.
修改
已经把。该解决方案对抗艾哈迈德的,因为他相当正确地指出,环视可能要比做他的方式效率较低
Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.
下面是我做粗基准:
public string LowerCaseLipsum
{
get
{
//went to lipsum.com and generated 10 paragraphs of lipsum
//which I then initialised into the backing field with @"[lipsumtext]".ToLower()
return _lowerCaseLipsum;
}
}
[TestMethod]
public void CapitaliseAhmadsWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(@"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1)));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
[TestMethod]
public void CapitaliseLookAroundWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(@"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
在发布版本中,我的解决方案比Ahmad的快约12% (1.48秒钟相至1.68秒)。
In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).
有趣的是,然而,如果它是通过静态Regex.Replace方法完成的,两者都是慢约80%,以及我的解决方案是比Ahmad的慢。
Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.
这篇关于在字符串格式化句子,用C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!