从字符串中删除特定的单词 [英] remove specific word from string

查看:333
本文介绍了从字符串中删除特定的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用oracle10g.

我想删除句子中所有出现的特定单词,但是我不想删除包含a-z或A-Z之间其他字符的其他单词.

I want to remove all occurrences of particular word from sentence, But I don't want to remove any other word which contains other characters between a-z or A-Z.

例如,以下是我要从中删除some的句子:

For example, Following is a sentence from which I want to remove some:

some text, 123 someone, another text some1

预期输出:

 text, 123 someone, another text

请注意,如果some单词在some之前或之后包含some +除A-Za-z之外的任何其他单词,我也想删除该单词.

Note that I also want to remove some word if it contains some+ any other word than A-Z and a-z before or after some.

这是我到目前为止尝试过的:

This is what I have tried so far:

select replace('some text, 123 someone, another text some1','some','') 
from dual;

我正在输出:

 text, 123 one, another text 1

在上述输出中,我希望不替换someone,而应该完全替换some1.

In above output I am expecting someone not to be replaced and some1 should be replaced totally.

我应该如何实现呢?任何建议将不胜感激.

How should I achieve this? Any suggestion will be appreciated.

编辑:为清楚起见,这是我正在寻找的另一个示例:

For clarity this is another example of what I am looking for:

some other text someone other text, someB some1 some.

输出应为:

 other text someone other text, someB 

从上面的句子someB中删除,因为它的字符在a-z
之间 并且some1some.被删除,因为它在a-z之间没有字符.

From above sentence someB is not removed because it has characters between a-z
And some1 and some. is removed becasue it doesn't has characters between a-z.

Edit2

如果我使用正则表达式:

If I use regex:

select REGEXP_REPLACE('some text, 123 someone, another text some1','[^a-zA-Z]','')
from dual

我正在输出:

sometextsomeoneanothertextsome

预期输出:

sometextsomeoneanothertext

请注意,我还希望将some1从字符串中删除,因为它包含A-Z以外的其他字符.

Note that I want some1 also be removed from string as it contains other character than A-Z.

也欢迎使用regex的答案.

推荐答案

由于缺乏对的支持,在Oracle实现中的lookbehind/lookahead 单词边界(\b)正则表达式,似乎不可能仅用 REGEXP_REPLACE 调用.特别是针对这种情况, Egor Skriptunoff指出:模式匹配,然后是一个接一个的,它们之间只有一个分隔符,例如some some some some ....

Due to lack of support for lookbehind/lookahead and word boundary(\b) in Oracle implementation of regular expression, it seems to be impossible to meet all requirements in single REGEXP_REPLACE call. Especially for case, pointed out by Egor Skriptunoff : pattern matches, followed one by one with only one separator between them like some some some some ....

在这种情况下,可以通过此调用来匹配所有这样的字符串:

Without this case it's possible to match all such strings with this call:

regexp_replace(
  source_string,                                       -- source string
  '([^[:alnum:]]|^)((\d)*some(\d)*)([^[:alnum:]]|$)',  -- pattern
  '\1\5',                                              -- leave separators in place
  1,                                                   -- start from beginning
  0,                                                   -- replace all occurences
  'im'                                                 -- case-insensitive and multiline 
);

样式部分:

(                -- start of Group #1
  [^[:alnum:]]   -- any non-alphanumeric character 
  |              -- or 
  ^              -- start of string or start of line 
)                -- end of Group #1
(                -- start of Group #2
  (              -- start of Group #3 
    \d           -- any digit
  )              -- end of Group #3
  *              -- include in previous group zero or more consecutive digits
  some           -- core string to match
  (              -- start of group #4
    \d           -- any digit
  )              -- end of group #4  
  *              -- include in previous group zero or more consecutive digits
)                -- end of Group #2
(                -- start of Group #5
  [^[:alnum:]]   -- any non-alphanumeric character 
  |              -- or
  $              -- end of string or end of line
)                -- end of Group #5

由于匹配模式中包含用于匹配的分隔符(第1组和第5组),成功匹配后会将其从源字符串中删除,因此我们需要通过在第三个regexp_replace参数中进行指定来还原此部分.

Because separators used for matching (Group #1 and Group #5) included in match pattern it will be removed from source string on successful match, so we need restore this parts by specifying in third regexp_replace parameter.

基于此解决方案,可以替换循环中的所有甚至重复的事件.

Based on this solution it's possible to replace all, even repetitive occurrences within a loop.

例如,您可以定义如下函数:

For example, you can define a function like that:

create or replace function delete_str_with_digits(
  pSourceString in varchar2, 
  pReplacePart  in varchar2  -- base string (like 'some' in question)
)
  return varchar2
is
  C_PATTERN_START constant varchar2(100) := '([^[:alnum:]]|^)((\d)*';
  C_PATTERN_END   constant varchar2(100) := '(\d)*)([^[:alnum:]]|$)';

  vPattern         varchar2(4000);
  vCurValue        varchar2(4000);
  vPatternPosition binary_integer;
begin

  vPattern := C_PATTERN_START || pReplacePart || C_PATTERN_END;
  vCurValue := pSourceString;

  vPatternPosition := regexp_instr(vCurValue, vPattern);

  while(vPatternPosition > 0) loop
    vCurValue := regexp_replace(vCurValue, vPattern,'\1\5',1,0,'im');
    vPatternPosition := regexp_instr(vCurValue, vPattern);
  end loop;

  return vCurValue;  

end;

并与SQL或其他PL/SQL代码一起使用:

and use it with SQL or other PL/SQL code:

SELECT 
  delete_str_with_digits(
    'some text, -> awesome <- 123 someone, 3some3
     line of 7 :> some some some some some some some <
222some  another some1? some22 text 0some000', 
    'some'
  )  as result_string
FROM 
  dual

SQLFiddle示例

这篇关于从字符串中删除特定的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆