将句子拆分成单词,但在C#中标点符号出现问题 [英] Split sentence into words but having trouble with the punctuations in C#

查看:177
本文介绍了将句子拆分成单词,但在C#中标点符号出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到了一些类似的问题,但是我正在努力实现这一目标.

I have seen a few similar questions but I am trying to achieve this.

给出一个字符串,str =月亮是我们的天然卫星,即它绕地球旋转!" 我想提取单词并将其存储在数组中. 预期的数组元素就是这个.

Given a string, str="The moon is our natural satellite, i.e. it rotates around the Earth!" I want to extract the words and store them in an array. The expected array elements would be this.

the 
moon 
is 
our 
natural 
satellite 
i.e. 
it  
rotates 
around 
the 
earth

我尝试使用String.split(','\ t','\ r'),但这无法正常工作.我也尝试删除.和其他标点符号,但我想要一个字符串,例如"i.e.".也被解析出来.实现此目标的最佳方法是什么? 我也尝试使用regex.split无济于事.

I tried using String.split( ','\t','\r') but this does not work correctly. I also tried removing the ., and other punctuation marks but I would want a string like "i.e." to be parsed out too. What is the best way to achieve this? I also tried using regex.split to no avail.

string[] words = Regex.Split(line, @"\W+");

一定会感激一些朝着正确方向前进的人.

Would surely appreciate some nudges in the right direction.

推荐答案

正则表达式解决方案.

(\b[^\s]+\b)

如果您真的要在i.e.上修复最后一个.,则可以使用它.

And if you really want to fix that last . on i.e. you could use this.

((\b[^\s]+\b)((?<=\.\w).)?)

这是我正在使用的代码.

Here's the code I'm using.

  var input = "The moon is our natural satellite, i.e. it rotates around the Earth!";
  var matches = Regex.Matches(input, @"((\b[^\s]+\b)((?<=\.\w).)?)");

  foreach(var match in matches)
  {
     Console.WriteLine(match);
  }

结果:

The
moon
is
our
natural
satellite
i.e.
it
rotates
around
the
Earth

这篇关于将句子拆分成单词,但在C#中标点符号出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆