在字符串格式化句子，用C＃ [英] Formatting sentences in a string using C#

查看：194 发布时间：2016/9/19 11:34:25 c# string formatting paragraph text-segmentation

本文介绍了在字符串格式化句子，用C＃的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有多个句子的字符串。我如何大写第一个字的每一句的第一个字母。类似段落格式字。

I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.

例如：这是一些代码。该代码是在C＃。
的输出中必须是这是一些代码。该代码是用C＃。

eg ."this is some code. the code is in C#. " The ouput must be "This is some code. The code is in C#".

一种方法是分裂的基础上。的字符串，然后首字母大写，然后归队。

one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.

有没有更好的解决方案吗？

Is there a better solution?

推荐答案

在我看来，当涉及到潜在的复杂规则为基础的字符串匹配和替换 - 你不能得到更好比基于正则表达式的解决方案（尽管事实上，他们是如此难以阅读！）。这提供了最佳的性能和内存效率，在我看来 - 你会在多快。这将是惊讶。

In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.

我使用的 Regex.Replace接受输入字符串，正则表达式和MatchEvaluator委托超载。，一个MatchEvaluator是接受匹配对象作为输入，并返回一个字符串替换函数

I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.

下面的代码：

public static string Capitalise(string input)
{
  //now the first character
  return Regex.Replace(input, @"(?<=(^|[.;:])\s*)[a-z]",
    (match) => { return match.Value.ToUpper(); });
}

正则表达式使用（小于？=）构建（零宽度正后向）来限制仅捕获到的字符串的开始之前AZ字符或标点符号你想要的。在 [;：] 位，你可以添加你想要的（如多余的人[;：？] 补充？和字。

The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.

这意味着，同样，你MatchEvaluator没有做任何不必要的字符串连接（要避免性能原因）。

This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).

所有通过关于使用RegexOptions.Compiled其他回答者中提及了其他的东西也从性能上看相关的。静态Regex.Replace 。方法确实提供了非常相似的性能优势，但（那里只是一个额外的词典查询）

All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).

就像我说的 - 我会感到惊讶，如果任何其他非正则表达式解决方案，这里将更好地工作，并以最快的速度。

Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.

修改

已经把。该解决方案对抗艾哈迈德的，因为他相当正确地指出，环视可能要比做他的方式效率较低

Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.

下面是我做粗基准：

public string LowerCaseLipsum
{
  get
  {
    //went to lipsum.com and generated 10 paragraphs of lipsum
    //which I then initialised into the backing field with @"[lipsumtext]".ToLower()
    return _lowerCaseLipsum;
  }
 }
 [TestMethod]
 public void CapitaliseAhmadsWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
                      + m.Groups[2].Value.Substring(0, 1).ToUpper()
                           + m.Groups[2].Value.Substring(1)));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

 [TestMethod]
 public void CapitaliseLookAroundWay()
 {
   List<string> results = new List<string>();
   DateTime start = DateTime.Now;
   Regex r = new Regex(@"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
   for (int f = 0; f < 1000; f++)
   {
     results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
   }
   TimeSpan duration = DateTime.Now - start;
   Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
 }

在发布版本中，我的解决方案比Ahmad的快约12％（1.48秒钟相至1.68秒）。

In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).

有趣的是，然而，如果它是通过静态Regex.Replace方法完成的，两者都是慢约80％，以及我的解决方案是比Ahmad的慢。

Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.

这篇关于在字符串格式化句子，用C＃的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在字符串格式化句子，用C＃ [英] Formatting sentences in a string using C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

在字符串格式化句子，用C＃ [英] Formatting sentences in a string using C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭