从字符串中删除特定的单词 [英] remove specific word from string
问题描述
我正在使用oracle10g
.
我想删除句子中所有出现的特定单词,但是我不想删除包含a-z或A-Z之间其他字符的其他单词.
I want to remove all occurrences of particular word from sentence, But I don't want to remove any other word which contains other characters between a-z or A-Z.
例如,以下是我要从中删除some
的句子:
For example, Following is a sentence from which I want to remove some
:
some text, 123 someone, another text some1
预期输出:
text, 123 someone, another text
请注意,如果some
单词在some
之前或之后包含some
+除A-Z
和a-z
之外的任何其他单词,我也想删除该单词.
Note that I also want to remove some
word if it contains some
+ any other word than A-Z
and a-z
before or after some
.
这是我到目前为止尝试过的:
This is what I have tried so far:
select replace('some text, 123 someone, another text some1','some','')
from dual;
我正在输出:
text, 123 one, another text 1
在上述输出中,我希望不替换someone
,而应该完全替换some1
.
In above output I am expecting someone
not to be replaced and some1
should be replaced totally.
我应该如何实现呢?任何建议将不胜感激.
How should I achieve this? Any suggestion will be appreciated.
编辑:为清楚起见,这是我正在寻找的另一个示例:
For clarity this is another example of what I am looking for:
some other text someone other text, someB some1 some.
输出应为:
other text someone other text, someB
从上面的句子someB
中删除,因为它的字符在a-z
之间
并且some1
和some.
被删除,因为它在a-z
之间没有字符.
From above sentence someB
is not removed because it has characters between a-z
And some1
and some.
is removed becasue it doesn't has characters between a-z
.
Edit2
如果我使用正则表达式:
If I use regex:
select REGEXP_REPLACE('some text, 123 someone, another text some1','[^a-zA-Z]','')
from dual
我正在输出:
sometextsomeoneanothertextsome
预期输出:
sometextsomeoneanothertext
请注意,我还希望将some1
从字符串中删除,因为它包含A-Z
以外的其他字符.
Note that I want some1
also be removed from string as it contains other character than A-Z
.
也欢迎使用regex
的答案.
推荐答案
由于缺乏对的支持,在Oracle实现中的lookbehind/lookahead 和单词边界(\b
)正则表达式,似乎不可能仅用 REGEXP_REPLACE
调用.特别是针对这种情况, Egor Skriptunoff指出:模式匹配,然后是一个接一个的,它们之间只有一个分隔符,例如some some some some ...
.
Due to lack of support for lookbehind/lookahead and word boundary(\b
) in Oracle implementation of regular expression, it seems to be impossible to meet all requirements in single REGEXP_REPLACE
call. Especially for case, pointed out by Egor Skriptunoff : pattern matches, followed one by one with only one separator between them like some some some some ...
.
在这种情况下,可以通过此调用来匹配所有这样的字符串:
Without this case it's possible to match all such strings with this call:
regexp_replace(
source_string, -- source string
'([^[:alnum:]]|^)((\d)*some(\d)*)([^[:alnum:]]|$)', -- pattern
'\1\5', -- leave separators in place
1, -- start from beginning
0, -- replace all occurences
'im' -- case-insensitive and multiline
);
样式部分:
( -- start of Group #1
[^[:alnum:]] -- any non-alphanumeric character
| -- or
^ -- start of string or start of line
) -- end of Group #1
( -- start of Group #2
( -- start of Group #3
\d -- any digit
) -- end of Group #3
* -- include in previous group zero or more consecutive digits
some -- core string to match
( -- start of group #4
\d -- any digit
) -- end of group #4
* -- include in previous group zero or more consecutive digits
) -- end of Group #2
( -- start of Group #5
[^[:alnum:]] -- any non-alphanumeric character
| -- or
$ -- end of string or end of line
) -- end of Group #5
由于匹配模式中包含用于匹配的分隔符(第1组和第5组),成功匹配后会将其从源字符串中删除,因此我们需要通过在第三个regexp_replace
参数中进行指定来还原此部分.
Because separators used for matching (Group #1 and Group #5) included in match pattern it will be removed from source string on successful match, so we need restore this parts by specifying in third regexp_replace
parameter.
基于此解决方案,可以替换循环中的所有甚至重复的事件.
Based on this solution it's possible to replace all, even repetitive occurrences within a loop.
例如,您可以定义如下函数:
For example, you can define a function like that:
create or replace function delete_str_with_digits(
pSourceString in varchar2,
pReplacePart in varchar2 -- base string (like 'some' in question)
)
return varchar2
is
C_PATTERN_START constant varchar2(100) := '([^[:alnum:]]|^)((\d)*';
C_PATTERN_END constant varchar2(100) := '(\d)*)([^[:alnum:]]|$)';
vPattern varchar2(4000);
vCurValue varchar2(4000);
vPatternPosition binary_integer;
begin
vPattern := C_PATTERN_START || pReplacePart || C_PATTERN_END;
vCurValue := pSourceString;
vPatternPosition := regexp_instr(vCurValue, vPattern);
while(vPatternPosition > 0) loop
vCurValue := regexp_replace(vCurValue, vPattern,'\1\5',1,0,'im');
vPatternPosition := regexp_instr(vCurValue, vPattern);
end loop;
return vCurValue;
end;
并与SQL或其他PL/SQL代码一起使用:
and use it with SQL or other PL/SQL code:
SELECT
delete_str_with_digits(
'some text, -> awesome <- 123 someone, 3some3
line of 7 :> some some some some some some some <
222some another some1? some22 text 0some000',
'some'
) as result_string
FROM
dual
这篇关于从字符串中删除特定的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!