如何删除NBSP? [英] How to remove NBSP?

查看:95
本文介绍了如何删除NBSP?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用PHP对* .txt文件执行 file_get_contents .然后将数据插入MySQL.值正在插入null.空值是由不间断的空格和excel导出中的替换字符引起的.我发现是通过将文本文件中的字符复制到 unicode检查器.br>替换字符也一样.此处复制并粘贴.进行确认./p>

尝试了很多 str_replace preg_replace ,但是没有运气.在 SO 上尝试了几乎所有内容问题,发现这可行.

  $ some_text_with_non_breaking_spaces =基督·奥康妮";$ clean_text = hex2bin(str_replace('c2a0','20',bin2hex($ some_text_with_non_breaking_spaces))));;echo $ clean_text; 

但是当我将它与 file_get_contents()方法内联时却没有.

有什么想法如何使用 preg_replace str_replace 或其他方法解析空值?

这是我尝试过的所有版本:

  $ name = str_replace('\ A0 \ 00','',$ nbsp);$ name = str_replace('c2a0','20',$ nbsp);$ name = str_replace('\ xc2 \ xa0','',$ nbsp);$ name = str_replace('〜\ xc2 \ xa0〜','',$ nbsp);$ name = str_replace('\ xC2 \ xA0','',$ nbsp);$ name = str_replace('& amp; nbsp;','',$ nbsp);$ name = hex2bin(str_replace('c2a0','20',bin2hex($ nbsp)));;//这确实有效,但是在与原始代码内联时却无效.$ name = preg_replace('#[A-Za-z \,\.\'\-\ __]#','',$ nbsp);$ name = preg_replace('\ x {00a0}','',$ nbsp);$ name = preg_replace('〜\ x00 \ xa0〜','',$ nbsp);$ name = preg_replace('〜\ xc2 \ xa0〜','',$ nbsp);$ name = preg_replace('\ s \ s +','',$ nbsp);$ name = preg_replace('/\ s +/','',$ nbsp);$ name = preg_replace('〜\ x {c2a0}〜siu','',$ nbsp);$ name = preg_replace('/\ s/u','',$ nbsp);$ name = preg_replace('/[^ \ w \ d \ p {L}]/u','',$ nbsp); 

这是我尝试在其上执行file_get_contents的文件中的数据片段.

  SupervisorGivenName SupervisorSurName row_date logid item_name acdcalls AHT AvgHoldTime转移的CntOBCalls调用德斯·施密德(Ders Schmid)2015年2月9日5054589基督·奥康纳(Christ O'Connory)26420112 4 0 0尼克·弗莱姆(Nic Flemg)2015年2月9日5054596云母(Mica)机智28543 32 6 0 0插入语句:$ bb_query ="INSERT INTO`tier1_bb_agent_daily`(`date`,`loginID`,`empID`,`firstname`,`lastname`,`supID`,`supName`,`acd_calls`,`paetec_acd_calls,`aht`,平均停留时间",已转移",出站呼叫计数")VALUES('{$ row ['date']}','{$ row ['loginID']}','{$ empID}','{$ firstname}','{$ lastname}','{$supid}','{$ newSupName}',{$ row ['acd_calls']},{$ row ['paetec_acd_calls']},{$ row ['aht']},{$ row ['avg_hold_time']},,{$ row ['transferred']},{$ row ['outbound_call_count']})在重复密钥更新上,firstname ='{$ firstname}',lastname ='{$ lastname}',empID ='{$ empID}',supID ='{$ supid}',supName ='{$ newSupName}',acd_calls= {{$ row ['acd_calls']},aht = {$ row ['aht']},paetec_acd_calls = {$ row ['paetec_acd_calls']},avg_hold_time = {$ row ['avg_hold_time']},已传送= {$ row ['transferred']},outbound_call_count = {$ row ['outbound_call_count']}";;$ db-> query($ bb_query); 

解决方案

为什么大多数尝试都失败了

  $ name = str_replace('\ A0 \ 00','',$ nbsp);$ name = str_replace('c2a0','20',$ nbsp); 

错误的转义序列.

  $ name = str_replace('〜\ xc2 \ xa0〜','',$ nbsp); 

正则表达式需要定界符,而不是简单的字符串替换.

  $ name = str_replace('\ xc2 \ xa0','',$ nbsp);$ name = str_replace('\ xC2 \ xA0','',$ nbsp); 

正确的转义序列,但是您需要双引号字符串使转义序列起作用.

  $ name = str_replace('& amp; nbsp;','',$ nbsp); 

仅适用于HTML实体.

  $ name = preg_replace('#[A-Za-z \,\.\'\-\ _]#','',$ nbsp); 

为什么要用空格替换A-Z?

  $ name = preg_replace('\ x {00a0}','',$ nbsp); 

缺少定界符和Unicode修饰符.

  $ name = preg_replace('〜\ x00 \ xa0〜','',$ nbsp); 

尝试匹配NUL字符,缺少Unicode修饰符.

  $ name = preg_replace('〜\ xc2 \ xa0〜','',$ nbsp); 

这应该适用于UTF-8.等同于 bin2hex 黑客.

  $ name = preg_replace('\ s \ s +','',$ nbsp); 

缺少正则表达式分隔符.

  $ name = preg_replace('/\ s +/','',$ nbsp); 

缺少Unicode修饰符.

  $ name = preg_replace('〜\ x {c2a0}〜siu','',$ nbsp); 

错误的转义顺序.

  $ name = preg_replace('/\ s/u','',$ nbsp); 

这个应该可以,但是用空格替换每个空白字符.

  $ name = preg_replace('/[^ \ w \ d \ p {L}]/u','',$ nbsp); 

应该可以,但也可以将标点符号替换为空格.

如何用普通空间替换不间断空间

如果您的输入编码为UTF-8(可能是 bin2hex hack起作用的情况):

  $ result = str_replace("\ xC2 \ xA0",'',$ src);# 或者$ result = preg_replace('/\ xC2 \ xA0/','',$ src);# 或者$ result = preg_replace('/\ xA0/u','',$ src); 

如果您输入的内容编码为ISO-8859-1:

  $ result = str_replace("\ xA0",'',$ src);# 或者$ result = preg_replace('/\ xA0/','',$ src); 

出于性能方面的考虑,首选 str_replace 版本.

Using PHP to do a file_get_contents on a *.txt file. Then inserting the data into MySQL. Values are inserting null. The null is caused by a non breaking space and a replacement charcter from an excel export. I figured that by copying the characters from the text file into a unicode inspector .
Did the same with the replacement character. Copied the text and pasted it here to confirm.

Tried many str_replace and preg_replace but no luck. Tried nearly everything on this SO question and found this worked.

$some_text_with_non_breaking_spaces = "Christ  O'Connory";
$clean_text = hex2bin(str_replace('c2a0', '20', bin2hex($some_text_with_non_breaking_spaces)));
echo $clean_text;

BUT it didn't when I put it inline with the file_get_contents() method.

Any idea how to resolve the null value with preg_replace, str_replace or other methods?

Here's all the versions I've tried:

$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);
$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('~\xc2\xa0~', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);
$name = str_replace(' ', ' ',$nbsp);
$name = hex2bin(str_replace('c2a0', '20', bin2hex($nbsp)));  // this did work but not when putting inline with original code.
$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);
$name = preg_replace('\x{00a0}', ' ', $nbsp);
$name = preg_replace('~\x00\xa0~', ' ', $nbsp);
$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);
$name = preg_replace('\s\s+', ' ', $nbsp);
$name = preg_replace('/\s+/', ' ',  $nbsp);
$name = preg_replace('~\x{c2a0}~siu', ' ',  $nbsp);
$name = preg_replace('/\s/u', ' ',  $nbsp);
$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);

Here is a snippet of data from the file I was attempting to do a file_get_contents on.

SupervisorGivenName SupervisorSurName   row_date    logid   item_name   acdcalls    AHT AvgHoldTime transferred CntOBCalls  calls
        Ders    Schmid  09/02/2015  5054589 Christ  O'Connory   26  420 112 4   0   0
        Nic Flemg   09/02/2015  5054596 Mica  Wit   28  543 32  6   0   0



    Insert statement:

        $bb_query = "INSERT INTO `tier1_bb_agent_daily` (`date`,`loginID`,`empID`,`firstname`,`lastname`,`supID`, `supName`,`acd_calls`,`paetec_acd_calls`,`aht`,`avg_hold_time`,`transferred`,`outbound_call_count`)
                        VALUES ('{$row['date']}','{$row['loginID']}','{$empID}','{$firstname}','{$lastname}','{$supid}','{$newSupName}',{$row['acd_calls']},{$row['paetec_acd_calls']},{$row['aht']},{$row['avg_hold_time']},{$row['transferred']},{$row['outbound_call_count']})
                        ON DUPLICATE KEY UPDATE firstname = '{$firstname}', lastname = '{$lastname}',empID = '{$empID}', supID = '{$supid}', supName = '{$newSupName}',acd_calls = {$row['acd_calls']}, aht = {$row['aht']}, paetec_acd_calls = {$row['paetec_acd_calls']}, avg_hold_time = {$row['avg_hold_time']}, transferred = {$row['transferred']}, outbound_call_count = {$row['outbound_call_count']}";
                $db->query($bb_query);

解决方案

Why most of your attempts failed

$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);

Wrong escape sequences.

$name = str_replace('~\xc2\xa0~', ' ', $nbsp);

The delimiters are needed for regexes, not for simple string replacement.

$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);

Correct escape sequences, but you need double-quoted strings for escape sequences to work.

$name = str_replace(' ', ' ',$nbsp);

Only works for HTML entities.

$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);

Why would you want to replace A-Z with space?

$name = preg_replace('\x{00a0}', ' ', $nbsp);

Missing delimiters and Unicode modifier.

$name = preg_replace('~\x00\xa0~', ' ', $nbsp);

Tries to match NUL characters, missing Unicode modifier.

$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);

This one should have worked for UTF-8. It's equivalent to the bin2hex hack.

$name = preg_replace('\s\s+', ' ', $nbsp);

Missing regex delimiters.

$name = preg_replace('/\s+/', ' ',  $nbsp);

Missing Unicode modifier.

$name = preg_replace('~\x{c2a0}~siu', ' ',  $nbsp);

Wrong escape sequence.

$name = preg_replace('/\s/u', ' ',  $nbsp);

This one should work but replaces every whitespace character with space.

$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);

Should work but also replaces punctuation with space.

How to replace non-breaking space with normal space

If your input is encoded as UTF-8 (which it probably is if the bin2hex hack worked):

$result = str_replace("\xC2\xA0", ' ', $src); # or
$result = preg_replace('/\xC2\xA0/', ' ', $src); # or
$result = preg_replace('/\xA0/u', ' ', $src);

If your input is encoded as ISO-8859-1:

$result = str_replace("\xA0", ' ', $src); # or
$result = preg_replace('/\xA0/', ' ', $src);

The str_replace versions are preferred for performance reasons.

这篇关于如何删除NBSP?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆