PostgreSQL - 替换 HTML 实体 [英] PostgreSQL - Replace HTML Entities

查看:25
本文介绍了PostgreSQL - 替换 HTML 实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始着手从我们的数据库中去除 HTML 实体的任务,因为我们进行了大量的抓取,而一些抓取工具在输入时并没有这样做:(

I have just set about the task of stripping out HTML entities from our database, as we do a lot of crawling and some of the crawlers didn't do this at input time :(

所以我开始写一堆看起来像的查询;

So I started writing a bunch of queries that look like;

UPDATE nodes SET name=regexp_replace(name, 'à', 'à', 'g') WHERE name LIKE '%#xe0%';
UPDATE nodes SET name=regexp_replace(name, 'á', 'á', 'g') WHERE name LIKE '%#xe1%';
UPDATE nodes SET name=regexp_replace(name, 'â', 'â', 'g') WHERE name LIKE '%#xe2%';

这显然是一种非常幼稚的方法.我一直试图弄清楚我是否可以用 decode 函数做一些聪明的事情;也许通过像 /&#x(..);/ 这样的正则表达式抓取 html 实体,然后将 just %1 部分传递给ascii 解码器,并重建字符串......或其他......

Which is clearly a pretty naive approach. I've been trying to figure out if there is something clever I can do with the decode function; maybe grabbing the html entity by regex like /&#x(..);/, then passing just the %1 part to the ascii decoder, and reconstructing the string...or something...

我可以继续提问吗?大概只有 40 个左右.

Shall I just press on with the queries? There will probably only be 40 or so of them.

推荐答案

使用 pl/perlu 编写一个函数并使用这个模块 https://metacpan.org/pod/HTML::Entities

Write a function using pl/perlu and use this module https://metacpan.org/pod/HTML::Entities

当然,你需要安装 perl 并且 pl/perl 可用.

Of course you need to have perl installed and pl/perl available.

1)首先创建过程语言pl/perlu:

1) First of all create the procedural language pl/perlu:

CREATE EXTENSION plperlu;

2) 然后创建一个这样的函数:

2) Then create a function like this:

CREATE FUNCTION decode_html_entities(text) RETURNS TEXT AS $$
    use HTML::Entities;
    return decode_entities($_[0]);
$$ LANGUAGE plperlu;

3) 然后你可以这样使用它:

3) Then you can use it like this:

select decode_html_entities('aaabbb&.... asasdasdasd …');
   decode_html_entities    
---------------------------
 aaabbb&.... asasdasdasd …
(1 row)

这篇关于PostgreSQL - 替换 HTML 实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆