在 Javascript 中使用正则表达式标记字符串 [英] Tokenizing strings using regular expression in Javascript

查看:37
本文介绍了在 Javascript 中使用正则表达式标记字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含换行符和制表符的长字符串:

Suppose I've a long string containing newlines and tabs as:

var x = "This is a long string.\n\t This is another one on next line.";

那么我们如何使用正则表达式将这个字符串拆分为标记?

So how can we split this string into tokens, using regular expression?

我不想使用 .split(' ') 因为我想学习 Javascript 的 Regex.

I don't want to use .split(' ') because I want to learn Javascript's Regex.

更复杂的字符串可能是这样的:

A more complicated string could be this:

var y = "This @is a #long $string. Alright, lets split this.";

现在我只想从这个字符串中提取有效的,没有特殊字符和标点符号,即我想要这些:

Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:

var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];

var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];

推荐答案

以下是您提出的问题的 jsfiddle 示例:http://jsfiddle.net/ayezutov/BjXw5/1/

Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/

基本上,代码很简单:

var y = "This @is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"

var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
    document.write(match[i]);
    document.write('<br>');
}

更新:基本上你可以扩展分隔符列表:http://jsfiddle.net/ayezutov/BjXw5/2/

UPDATE: Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/

var regex = /[^\s\.,!?]+/g;

更新 2:一直只有字母:http://jsfiddle.net/ayezutov/BjXw5/3/

var regex = /\w+/g;

这篇关于在 Javascript 中使用正则表达式标记字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆