在Oracle SQL中删除HTML标记的更好方法 [英] Better way to remove HTML tags in Oracle SQL

查看:55
本文介绍了在Oracle SQL中删除HTML标记的更好方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个comments列,添加到版本中的注释以富文本格式存储在comments列中.现在,我正在尝试处理这些数据并获得人类可读的输出.我正在提供我正在尝试处理的oracle SQL数据库中的2个示例comment数据.

I have a comments column and the comments added to release are stored as rich text in comments column. Now i'm trying to process this data and get an human readable output. I'm providing 2 sample comment data i have in my oracle SQL db which i'm trying to process.

示例1:

    <html>
<body>
<div align="left"><font face="Arial Unicode MS"><span style="font-size:8pt">Display the frulog on the count values</span></font></div>
</body>
</html>

示例2:<not implemented in this release>

我使用以下查询来处理html字符

i used the below query to process the html characters

Select (REGEXP_REPLACE(comments),'<.+?>') from test_table;

注意:考虑示例1和示例2中提供的值在上述SQL命令中作为列comments传入.

Note: consider values provided in Example 1 and Example 2 are passed in as column comments in the above SQL command.

Example 1的查询结果是Display the frulog on the count values,这是我期望的结果. Example 2的结果为''. Example 2中的值不是html标记,但仍替换了标记.我怎样才能使replace语句更智能.

the query result for Example 1 was Display the frulog on the count values which is what i'm expecting. result for Example 2 was ''. Value in Example 2 was not an html tag but it still replaced the tags. How can i make the replace statement smart.

随时放弃您的建议.

推荐答案

SQL提琴

Oracle 11g R2架构设置:

CREATE TABLE comments ( value ) AS
SELECT '<html>
<body>
<div align="left">
<font face="Arial Unicode MS">
<span style="font-size:8pt">
  Display the frulog on the count values
</span>
</font>
</div>
</body>
</html>' FROM DUAL UNION ALL
SELECT '<not implemented in this release>' FROM DUAL UNION ALL
SELECT '<test a="1"
  b=''2''
  c = 3
  d
  e = ">" >test</test>' FROM DUAL;

查询1 :

SELECT value,
       REGEXP_REPLACE(
         value,
         '\s*</?\w+((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*',
         NULL,
         1,
         0,
         'im'
       ) AS replaced
FROM   comments

结果 :

Results:

|                                    VALUE |                               REPLACED |
|------------------------------------------|----------------------------------------|
| <html>                                   | Display the frulog on the count values |
| <body>                                   |                                        |
| <div align="left">                       |                                        |
| <font face="Arial Unicode MS">           |                                        |
| <span style="font-size:8pt">             |                                        |
| Display the frulog on the count values   |                                        |
| </span>                                  |                                        |
| </font>                                  |                                        |
| </div>                                   |                                        |
| </body>                                  |                                        |
| </html>                                  |                                        |
|------------------------------------------|----------------------------------------|
| <not implemented in this release>        |                                 (null) |
|------------------------------------------|----------------------------------------|
| <test a="1"                              |                                   test | 
|   b='2'                                  |                                        |
|   c = 3                                  |                                        |
|   d                                      |                                        |
|   e = ">" >test</test>                   |                                        |

注意:<not implemented in this release>是有效的HTML 具有标签名称not和属性implementedinthisrelease的自定义元素.

Note: <not implemented in this release> is a valid HTML custom element with tag name not and attributes implemented, in, this and release.

如果您只想替换特定的HTML元素,则在正则表达式的开头列出它们:

If you only want to replace specific HTML elements then list them at the start of the regular expression:

\s*</?(a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|command|content|data|datalist|dd|del|details|dfn|dialog|dir|div|dl|dt|element|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|head|header|hgroup|hr|html|i|iframe|image|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|multicol|nav|nextid|nobr|noembed|noframes|noscript|object|ol|optgroup|option|output|p|param|picture|plaintext|pre|progress|q|rp|rt|rtc|ruby|s|samp|script|section|select|shadow|slot|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|template|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)((\s+\w+(\s*=\s*(".*?"|''.*?''|[^''">\s]+))?)+\s*|\s*)/?>\s*

这篇关于在Oracle SQL中删除HTML标记的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆