在字符串格式化句子,用C# [英] Formatting sentences in a string using C#

查看:194
本文介绍了在字符串格式化句子,用C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个句子的字符串。我如何大写第一个字的每一句的第一个字母。类似段落格式字。

I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.

例如:这是一些代码。该代码是在C#。
的输出中必须是这是一些代码。该代码是用C# 。

eg ."this is some code. the code is in C#. " The ouput must be "This is some code. The code is in C#".

一种方法是分裂的基础上。的字符串,然后首字母大写,然后归队。

one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.

有没有更好的解决方案吗?

Is there a better solution?

推荐答案

在我看来,当涉及到潜在的复杂规则为基础的字符串匹配和替换 - 你不能得到更好比基于正则表达式的解决方案(尽管事实上,他们是如此难以阅读!)。这提供了最佳的性能和内存效率,在我看来 - 你会在多快。这将是惊讶。

In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.

我使用的 Regex.Replace接受输入字符串,正则表达式和MatchEvaluator委托超载​​。 ,一个MatchEvaluator是接受匹配对象作为输入,并返回一个字符串替换函数

I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.

下面的代码:

public static string Capitalise(string input)
{
  //now the first character
  return Regex.Replace(input, @"(?<=(^|[.;:])\s*)[a-z]",
    (match) => { return match.Value.ToUpper(); });
}



正则表达式使用(小于?=)构建(零宽度正后向)来限制仅捕获到的字符串的开始之前AZ字符或标点符号你想要的。在 [;:] 位,你可以添加你想要的(如多余的人[;:?] 补充?和字。

The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.

这意味着,同样,你MatchEvaluator没有做任何不必要的字符串连接(要避免性能原因)。

This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).

所有通过关于使用RegexOptions.Compiled其他回答者中提及了其他的东西也从性能上看相关的。静态Regex.Replace 。方法确实提供了非常相似的性能优势,但(那里只是一个额外的词典查询)

All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).

就像我说的 - 我会感到惊讶,如果任何其他非正则表达式解决方案,这里将更好地工作,并以最快的速度。

Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.

修改

已经把。该解决方案对抗艾哈迈德的,因为他相当正确地指出,环视可能要比做他的方式效率较低

Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.

下面是我做粗基准:

public string LowerCaseLipsum
{
  get
  {
    //went to lipsum.com and generated 10 paragraphs of lipsum
    //which I then initialised into the backing field with @"[lipsumtext]".ToLower()
    return _lowerCaseLipsum;
  }
 }
 [TestMethod]
 public void CapitaliseAhmadsWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
                      + m.Groups[2].Value.Substring(0, 1).ToUpper()
                           + m.Groups[2].Value.Substring(1)));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

 [TestMethod]
 public void CapitaliseLookAroundWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

在发布版本中,我的解决方案比Ahmad的快约12% (1.48秒钟相至1.68秒)。

In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).

有趣的是,然而,如果它是通过静态Regex.Replace方法完成的,两者都是慢约80%,以及我的解决方案是比Ahmad的慢。

Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.

这篇关于在字符串格式化句子,用C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆