正则表达式提取嵌套括号内的单词 [英] Regular expression to extract Words inside nested parentheses

查看:503
本文介绍了正则表达式提取嵌套括号内的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



消息正文输入: Test1(Test2)(test3)(ti, ab(text(text here(possible text)text(possible text(more text))))end(text)(text(text here(possible text)text(possible text(more text))))

我想收集 ti,ab(................)

$内的所有内容b
$ b

  var messageBody = message.getPlainBody()
var ssFile = DriveApp.getFileById(id);
DriveApp.getFolderById(folder.getId())。addFile(ssFile);
var ss = SpreadsheetApp.open(ssFile);
var sheet = ss.getSheets()[0];
sheet.insertColumnAfter(sheet.getLastColumn());
SpreadsheetApp.flush();
var sheet = ss.getSheets()[0];
var range = sheet.getRange(1,1,sheet.getLastRow(),sheet.getLastColumn()+ 1)
var values = range.getValues();

values [0] [sheet.getLastColumn()] =搜索策略;
$ b $ for(var i = 1; i< values.length; i ++){
// here here my Regexp
var y = messageBody.match(/ \(( TI,AB *)\)/ IG)。
if(y);
values [i] [values [i] .length - 1] = y.toString();


range.setValues(values);


解决方案

子字符串,然后过滤它们以获得所有以 ti,ab 开头的所有内容:

  var a = [],r = [],result; var txt =Test1(Test2)(test3)(ti,ab(text(text here可能的文本)text(possible text(more text))))end(text); for(var i = 0; i  

嵌套圆括号的功能是从嵌套圆括号中得到字符串 / ^ ti,ab \(/ regex匹配 ti,ab(在字符串开头。



<上面的解决方案允许在嵌套括号内提取嵌套括号。如果你不需要它,可以使用

 



用于过滤和删除不必要单词的模式

  \b( ?:ti | ab | su)(?:,(ti | ab | su))* \(




$ b


  • \ b - 一个字界

  • (?: ti | ab | su) - 1个替代品,
    (?:,(ti | ab | su))* - 0或以上重复与三种选择中的一种

  • \( - a )。



该匹配被替换为)以在比赛中恢复它。


im looking for the regexp that make able to do this tasks

message Body Input: Test1 (Test2) (test3) (ti,ab(text(text here(possible text)text(possible text(more text))))) end (text)

the result that i want Result: (text(text here(possible text)text(possible text(more text))))

I want to collect everything that is inside ti,ab(................)

var messageBody = message.getPlainBody()
var ssFile = DriveApp.getFileById(id);
DriveApp.getFolderById(folder.getId()).addFile(ssFile);
var ss = SpreadsheetApp.open(ssFile);
var sheet = ss.getSheets()[0];
sheet.insertColumnAfter(sheet.getLastColumn());
SpreadsheetApp.flush();
var sheet = ss.getSheets()[0];
var range = sheet.getRange(1, 1, sheet.getLastRow(), sheet.getLastColumn() + 1)                            
var values = range.getValues();

values[0][sheet.getLastColumn()] = "Search Strategy";

 for (var i = 1; i < values.length; i++) {                          
                             //here my Regexp 
                            var y = messageBody.match(/\((ti,ab.*)\)/ig);
                            if (y);        
                            values[i][values[i].length - 1] = y.toString(); 


                            range.setValues(values);

解决方案

The only solution you may use here is to extract all substrings inside parentheses and then filter them to get all those that start with ti,ab:

var a = [], r = [], result;
var txt = "Test1  (Test2) (test3) (ti,ab(text(text here(possible text)text(possible text(more text))))) end (text)";
for(var i=0; i < txt.length; i++){
    if(txt.charAt(i) == '(') {
        a.push(i);
    }
    if(txt.charAt(i) == ')') {
        r.push(txt.substring(a.pop()+1,i));
    }
}
result = r.filter(function(x) { return /^ti,ab\(/.test(x); })
          .map(function(y) {return y.substring(6,y.length-1);})
console.log(result);

The nested parentheses function is borrowed from Nested parentheses get string one by one. The /^ti,ab\(/ regex matches ti,ab( at the start of the string.

The above solution allows extracting nested parentheses inside nested parentheses. If you do not need it, use

var txt = "Test1 (Test2) ((ti,ab(text(text here))) AND ab(test3) Near Ti(test4) NOT ti,ab,su(test5) NOT su(Test6))";
var start=0, r = [], level=0;
for (var j = 0; j < txt.length; j++) {
  if (txt.charAt(j) == '(') {
    if (level === 0) start=j;
    ++level;
  }
  if (txt.charAt(j) == ')') {
     
    if (level > 0) {
    		--level;
    }
    if (level === 0) {
    	r.push(txt.substring(start, j+1));
    }
  }
}
console.log("r: ", r);
var rx = "\\b(?:ti|ab|su)(?:,(ti|ab|su))*\\(";
var result = r.filter(function(y) { return new RegExp(rx, "i").test(y); })
	.map(function(x) {
  	return x.replace(new RegExp(rx, "ig"), '(') 
  });
console.log("Result:",result);

The pattern used to filter and remove the unnecessary words

\b(?:ti|ab|su)(?:,(ti|ab|su))*\(

Details

  • \b - a word boundary
  • (?:ti|ab|su) - 1 of the alternatives,
  • (?:,(ti|ab|su))* - 0 or more repetitions of , followed with 1 of the 3 alternatives
  • \( - a (.

The match is replaced with ( to restore it in the match.

这篇关于正则表达式提取嵌套括号内的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆