在我的html后净化 [英] â�� in my html after purify

查看:248
本文介绍了 在我的html后净化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,我正在重建表结构是垃圾,所以我把一些数据从一个表到另一个。这个数据似乎是从MSO产品复制粘贴,因此我得到的数据我用htmlpurifier和一些str_replace在PHP。这里是清除函数:

I have a database the I am rebuilding the table structure was crap so I'm porting some of the data from one table to another. This data appears to have been copy-pasted from MSO product so as I'm getting the data I clean it up with htmlpurifier and some str_replace in php. Here is the clean function:

   function clean_html($html) {
    $config = HTMLPurifier_Config::createDefault();
    $config->set('AutoFormat','RemoveEmpty',true);
    $config->set('HTML','AllowedAttributes','href,src');
    $config->set('HTML','AllowedElements','p,em,strong,a,ul,li,ol,img');
    $purifier = new HTMLPurifier($config);

    $html = $purifier->purify($html);

    $html = str_replace(' ',' ',$html);
    $html = str_replace("\r",'',$html);
    $html = str_replace("\n",'',$html);
    $html = str_replace("\t",'',$html);
    $html = str_replace('  ',' ',$html);
    $html = str_replace('<p> </p>','',$html);
    $html = str_replace(chr(160),' ',$html);

    return trim($html);
   }

然而,当我把结果放到我的新表中, ckeditor我得到这三个字符。

However, when I put the results into my new table and output them to the ckeditor I get those three characters.

然后,我有一个javascript函数,用于从ckeditor的内容中删除特殊字符。它不会清除它

I then have a javascript function that is called to remove special characters from the content of the ckeditor too. it doesn't clean it either

  function remove_special(str) {
    var rExps=[ /[\xC0-\xC2]/g, /[\xE0-\xE2]/g,
    /[\xC8-\xCA]/g, /[\xE8-\xEB]/g,
    /[\xCC-\xCE]/g, /[\xEC-\xEE]/g,
    /[\xD2-\xD4]/g, /[\xF2-\xF4]/g,
    /[\xD9-\xDB]/g, /[\xF9-\xFB]/g,
    /\xD1/,/\xF1/g,
    "/[\u00a0|\u1680|[\u2000-\u2009]|u200a|\u200b|\u2028|\u2029|\u202f|\u205f|\u3000|\xa0]/g", 
    /\u000b/g,'/[\u180e|\u000c]/g',
    /\u2013/g, /\u2014/g,
    /\xa9/g,/\xae/g,/\xb7/g,/\u2018/g,/\u2019/g,/\u201c/g,/\u201d/g,/\u2026/g];
    var repChar=['A','a','E','e','I','i','O','o','U','u','N','n',' ','\t','','-','--','(c)','(r)','*',"'","'",'"','"','...'];

    for(var i=0; i<rExps.length; i++) {
        str=str.replace(rExps[i],repChar[i]);
    }

      for (var x = 0; x < str.length; x++) {
    charcode = str.charCodeAt(x);
          if ((charcode < 32 || charcode > 126) && charcode !=10 && charcode != 13) {
              str = str.replace(str.charAt(x), "");
          }
      }
      return str;
  }

有没有人知道我需要做什么来摆脱他们。我认为他们可能是某种报价。

Does anyone know off hand what I need to do to get rid of them. I think they may be some sort of quote.

推荐答案

你的字符编码都是无聊的。

Your character encodings are all out of whack. � is indicative to me of a three-byte UTF-8 encoded character.

有些事你需要发现


  • 旧表的编码是什么?

  • 新表的编码是什么?

  • 显示ckeditor的页面的编码是什么?

它看起来像 HTMLPurifier的默认值是UTF-8 ,所以你真的需要知道你的数据的编码!

It looks like HTMLPurifier's default is UTF-8 so you really need to be aware of the encoding of your data!

这篇关于 在我的html后净化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆