PostgreSQL-替换HTML实体 [英] PostgreSQL - Replace HTML Entities

查看：159 发布时间：2020/5/29 20:02:39 sql regex postgresql replace

本文介绍了PostgreSQL-替换HTML实体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始着手从数据库中剥离HTML实体的任务，因为我们进行了大量的爬网，并且某些爬网程序在输入时没有这样做：（

I have just set about the task of stripping out HTML entities from our database, as we do a lot of crawling and some of the crawlers didn't do this at input time :(

所以我开始写一堆看起来像这样的查询；

So I started writing a bunch of queries that look like;

UPDATE nodes SET name=regexp_replace(name, '&#xe0;', 'à', 'g') WHERE name LIKE '%#xe0%';
UPDATE nodes SET name=regexp_replace(name, '&#xe1;', 'á', 'g') WHERE name LIKE '%#xe1%';
UPDATE nodes SET name=regexp_replace(name, '&#xe2;', 'â', 'g') WHERE name LIKE '%#xe2%';

这显然是一种非常幼稚的方法，我一直在尝试找出如果有什么聪明的事情我可以使用解码功能；也许可以通过正则表达式来获取html实体，例如 /& #x（..）; / ，然后传递只是将％1 部分传递给ascii解码器，然后重建字符串...或其他内容...

Which is clearly a pretty naive approach. I've been trying to figure out if there is something clever I can do with the decode function; maybe grabbing the html entity by regex like /&#x(..);/, then passing just the %1 part to the ascii decoder, and reconstructing the string...or something...

我要继续查询吗？ ly中只有40个左右。

Shall I just press on with the queries? There will probably only be 40 or so of them.

推荐答案

使用pl / perlu编写函数并使用此模块 https://metacpan.org/pod/HTML::Entities

Write a function using pl/perlu and use this module https://metacpan.org/pod/HTML::Entities

当然，您需要安装perl并提供pl / perl。

Of course you need to have perl installed and pl/perl available.

1）
首先创建程序语言pl / perlu：

1) First of all create the procedural language pl/perlu:

CREATE EXTENSION plperlu;

2）然后创建如下函数：

CREATE FUNCTION decode_html_entities(text) RETURNS TEXT AS $$
    use HTML::Entities;
    return decode_entities($_[0]);
$$ LANGUAGE plperlu;

3）然后，您可以像这样使用它：

3) Then you can use it like this:

select decode_html_entities('aaabbb&amp;.... asasdasdasd &hellip;');
   decode_html_entities    
---------------------------
 aaabbb&.... asasdasdasd …
(1 row)

这篇关于PostgreSQL-替换HTML实体的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PostgreSQL-替换HTML实体 [英] PostgreSQL - Replace HTML Entities

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL-替换HTML实体 [英] PostgreSQL - Replace HTML Entities

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭