mysql替换html特殊字符与UTF等效 [英] mysql substitute html special chars with UTF equivalents
问题描述
Universidad Tecnoló gica Nacional - UTN |
| Tecnoló gico de Buenos Aires |
|高等教育学院院长博士。 Joaquí n。Gonzá lez& amp; amp; |
| Escuela Nacional de Ná utica& amp;Manuel Belgrano& amp; amp; amp; |
|国家自然保护协会#250; sica& amp;Carlos L& amp;#243; pez Buchardo& amp; |
|阿根廷计算机学院 - IAC |
|保守主义者法官Mú sica& amp;Manuel de Falla& amp; amp; |
我需要将其转换为正确的UTF格式。 我可以做的更好,而不仅仅是遍历数据库,并从每个代码映射到等效的符号?
&放大器;放大器;#225; - > 'á'
& amp; amp - > '''
...
在我上面的评论中提及,这是非常不清楚的你试图在自己的情况下做。
我可以做的更好,而不仅仅是遍历数据库,并从每个代码到等效的符号?
嗯,是的,你可以替换字符代码实体(例如&# 123;
和ƫ
),而不必在映射中查找字符代码,但命名实体例如& quot;
)总是需要查找。
这是我试图解决一般情况:
-
创建一个表来存储HTML中定义的命名字符实体:
CREATE TABLE ents(
ref VARCHAR(8)NOT NULL COLLATE utf8_bin,
rep CHAR(1)NOT NULL,
PRIMARY KEY(ref)
) ;
-
填写此表 - 我建议使用脚本,例如PHP:
$ dbh = new PDO(mysql:dbname = $ dbname,$ username,$ password);
$ dbh-> setAttribute(PDO :: ATTR_EMULATE_PREPARES,FALSE);
$ ins = $ dbh-> prepare('INSERT INTO ents(ref,rep)VALUES(?,?)');
$ t = get_html_translation_table(HTML_ENTITIES);
foreach($ t as $ k => $ v)$ ins-> execute([substr($ v,1,-1),$ k]);
-
定义一个SQL函数来执行实体替换(在适用的情况下使用此表,或者由字符代码):
DELIMITER ;;
CREATE FUNCTION dhe(s TEXT)RETURNS TEXT
BEGIN
DECLARE n,p,i,t INT DEFAULT 0;
DECLARE r VARCHAR(12);
entity_search:LOOP
SET n:= LOCATE('&',s,n + 1);
IF(!n)THEN
LEAVE entity_search;
END IF;
IF(SUBSTRING(s,n + 1,1)='#')THEN
CASE
WHEN SUBSTRING(s,n + 2,1)RLIKE'[[ :digit:]]'THEN
SET t:= 2,p:= n + 2,r:='[[:digit:]]';
WHEN SUBSTRING(s,n + 2,1)='x'THEN
SET t:= 3,p:= n + 3,r:='[[:xdigit:]]
ELSE ITERATE entity_search;
END CASE;
ELSE
SET t:= 1,p:= n + 1,r:='[[:alnum:] _]';
END IF;
SET i:= 0;
参考:LOOP
如果SUBSTRING(s,p + i,1)NOT RLIKE r THEN
如果SUBSTRING(s,p + i,1)RLIKE'[[:alnum:] _ ]'THEN
ITERATE entity_search;
END IF;
LEAVE参考;
END IF;
IF i = 8 THEN ITERATE entity_search;万一;
SET i:= i + 1;
END LOOP引用;
SET s:= CONCAT(
LEFT(s,n-1),
CASE t
WHEN 1 THEN COALESCE(
(SELECT rep FROM ents WHERE ref = SUBSTRING(s,p,i))
,SUBSTRING(s,n,i + IF(SUBSTRING(s,p + i,1)=';',1,0))
WHEN 2 THEN CHAR(SUBSTRING(s,p,i))
WHEN 3 THEN CHAR(CONV(SUBSTRING(s,p,i),16,10))
END,
SUBSTRING(s,p + i + IF(SUBSTRING(s,p + i,1)=';',1,0))
);
END LOOP entity_search;
返回
END ;;
DELIMITER;
-
应用此功能两次来解码(显然)编码表:
更新my_table SET my_column = dhe(dhe(my_column));
I have a database where some of the elements consist of HTML special characters:
| Universidad Tecnológica Nacional - UTN |
| Instituto Tecnológico de Buenos Aires |
| Instituto Superior del Profesorado "Dr. Joaquín V. González" |
| Escuela Nacional de Náutica "Manuel Belgrano" |
| Conservatorio Nacional de Música "Carlos López Buchardo" |
| Instituto Argentino de Computacion - IAC |
| Conservatorio de Superior de Música "Manuel de Falla" |
I need to convert it to a proper UTF format. Can I do better than just iterating through the database, and having a mapping from each code to the equivalent symbol?
á -> 'á'
" -> '"'
...
As mentioned in my comment above, it's terribly unclear what you're trying to do in your own case.
Can I do better than just iterating through the database, and having a mapping from each code to the equivalent symbol?
Well, yes. You can replace character code entities (e.g. {
and ƫ
) with their replacement characters without having to lookup the character code in a "mapping". But named entities (e.g. "
) will always need to be looked up.
Here's my attempt to solve the general case:
Create a table to store named character entities defined in HTML:
CREATE TABLE ents ( ref VARCHAR(8) NOT NULL COLLATE utf8_bin, rep CHAR(1) NOT NULL, PRIMARY KEY (ref) );
Populate this table - I suggest using a script, for example from PHP:
$dbh = new PDO("mysql:dbname=$dbname", $username, $password); $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, FALSE); $ins = $dbh->prepare('INSERT INTO ents (ref, rep) VALUES (?, ?)'); $t = get_html_translation_table(HTML_ENTITIES); foreach ($t as $k => $v) $ins->execute([substr($v, 1, -1), $k]);
Define an SQL function to perform entity replacements (using this table where applicable, or else by character code):
DELIMITER ;; CREATE FUNCTION dhe(s TEXT) RETURNS TEXT BEGIN DECLARE n, p, i, t INT DEFAULT 0; DECLARE r VARCHAR(12); entity_search: LOOP SET n := LOCATE('&', s, n+1); IF (!n) THEN LEAVE entity_search; END IF; IF (SUBSTRING(s, n+1, 1) = '#') THEN CASE WHEN SUBSTRING(s, n+2, 1) RLIKE '[[:digit:]]' THEN SET t := 2, p := n+2, r := '[[:digit:]]'; WHEN SUBSTRING(s, n+2, 1) = 'x' THEN SET t := 3, p := n+3, r := '[[:xdigit:]]'; ELSE ITERATE entity_search; END CASE; ELSE SET t := 1, p := n+1, r := '[[:alnum:]_]'; END IF; SET i := 0; reference: LOOP IF SUBSTRING(s, p+i, 1) NOT RLIKE r THEN IF SUBSTRING(s, p+i, 1) RLIKE '[[:alnum:]_]' THEN ITERATE entity_search; END IF; LEAVE reference; END IF; IF i = 8 THEN ITERATE entity_search; END IF; SET i := i + 1; END LOOP reference; SET s := CONCAT( LEFT(s, n-1), CASE t WHEN 1 THEN COALESCE( (SELECT rep FROM ents WHERE ref = SUBSTRING(s, p, i)) , SUBSTRING(s, n, i + IF(SUBSTRING(s, p+i, 1)=';',1,0)) ) WHEN 2 THEN CHAR(SUBSTRING(s, p, i)) WHEN 3 THEN CHAR(CONV(SUBSTRING(s, p, i), 16, 10)) END, SUBSTRING(s, p + i + IF(SUBSTRING(s, p+i, 1)=';',1,0)) ); END LOOP entity_search; RETURN s; END;; DELIMITER ;
Apply this function twice to decode your (apparently) doubly-encoded table:
UPDATE my_table SET my_column = dhe(dhe(my_column));
这篇关于mysql替换html特殊字符与UTF等效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!