在Oracle SQL中删除HTML标记的更好方法 [英] Better way to remove HTML tags in Oracle SQL
问题描述
我有一个comments
列,添加到版本中的注释以富文本格式存储在comments
列中.现在,我正在尝试处理这些数据并获得人类可读的输出.我正在提供我正在尝试处理的oracle SQL
数据库中的2个示例comment
数据.
I have a comments
column and the comments added to release are stored as rich text in comments
column. Now i'm trying to process this data and get an human readable output. I'm providing 2 sample comment
data i have in my oracle SQL
db which i'm trying to process.
示例1:
<html>
<body>
<div align="left"><font face="Arial Unicode MS"><span style="font-size:8pt">Display the frulog on the count values</span></font></div>
</body>
</html>
示例2:<not implemented in this release>
我使用以下查询来处理html字符
i used the below query to process the html characters
Select (REGEXP_REPLACE(comments),'<.+?>') from test_table;
注意:考虑示例1和示例2中提供的值在上述SQL命令中作为列comments
传入.
Note: consider values provided in Example 1 and Example 2 are passed in as column comments
in the above SQL command.
Example 1
的查询结果是Display the frulog on the count values
,这是我期望的结果. Example 2
的结果为''
. Example 2
中的值不是html标记,但仍替换了标记.我怎样才能使replace语句更智能.
the query result for Example 1
was Display the frulog on the count values
which is what i'm expecting. result for Example 2
was ''
. Value in Example 2
was not an html tag but it still replaced the tags. How can i make the replace statement smart.
随时放弃您的建议.
推荐答案
Oracle 11g R2架构设置:
CREATE TABLE comments ( value ) AS
SELECT '<html>
<body>
<div align="left">
<font face="Arial Unicode MS">
<span style="font-size:8pt">
Display the frulog on the count values
</span>
</font>
</div>
</body>
</html>' FROM DUAL UNION ALL
SELECT '<not implemented in this release>' FROM DUAL UNION ALL
SELECT '<test a="1"
b=''2''
c = 3
d
e = ">" >test</test>' FROM DUAL;
查询1 :
SELECT value,
REGEXP_REPLACE(
value,
'\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*',
NULL,
1,
0,
'im'
) AS replaced
FROM comments
结果 :
Results:
| VALUE | REPLACED |
|------------------------------------------|----------------------------------------|
| <html> | Display the frulog on the count values |
| <body> | |
| <div align="left"> | |
| <font face="Arial Unicode MS"> | |
| <span style="font-size:8pt"> | |
| Display the frulog on the count values | |
| </span> | |
| </font> | |
| </div> | |
| </body> | |
| </html> | |
|------------------------------------------|----------------------------------------|
| <not implemented in this release> | (null) |
|------------------------------------------|----------------------------------------|
| <test a="1" | test |
| b='2' | |
| c = 3 | |
| d | |
| e = ">" >test</test> | |
注意:<not implemented in this release>
是有效的HTML 具有标签名称not
和属性implemented
,in
,this
和release
的自定义元素.
Note: <not implemented in this release>
is a valid HTML custom element with tag name not
and attributes implemented
, in
, this
and release
.
如果您只想替换特定的HTML元素,则在正则表达式的开头列出它们:
If you only want to replace specific HTML elements then list them at the start of the regular expression:
\s*</?(a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|command|content|data|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|element|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|head|header|hgroup|hr|html|i|iframe|image|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|multicol|nav|nextid|nobr|noembed|noframes|noscript|object|ol|optgroup|option|output|p|param|picture|plaintext|pre|progress|q|rp|rt|rtc|ruby|s|samp|script|section|select|shadow|slot|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|template|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*
这篇关于在Oracle SQL中删除HTML标记的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!