如何删除NBSP? [英] How to remove NBSP?
问题描述
使用PHP对* .txt文件执行 file_get_contents
.然后将数据插入MySQL.值正在插入null.空值是由不间断的空格和excel导出中的替换字符引起的.我发现是通过将文本文件中的字符复制到 unicode检查器.br>替换字符也一样.此处复制并粘贴.进行确认./p>
尝试了很多 str_replace
和 preg_replace
,但是没有运气.在 SO 上尝试了几乎所有内容问题,发现这可行.
$ some_text_with_non_breaking_spaces =基督·奥康妮";$ clean_text = hex2bin(str_replace('c2a0','20',bin2hex($ some_text_with_non_breaking_spaces))));;echo $ clean_text;
但是当我将它与 file_get_contents()
方法内联时却没有.
有什么想法如何使用 preg_replace
, str_replace
或其他方法解析空值?
这是我尝试过的所有版本:
$ name = str_replace('\ A0 \ 00','',$ nbsp);$ name = str_replace('c2a0','20',$ nbsp);$ name = str_replace('\ xc2 \ xa0','',$ nbsp);$ name = str_replace('〜\ xc2 \ xa0〜','',$ nbsp);$ name = str_replace('\ xC2 \ xA0','',$ nbsp);$ name = str_replace('& amp; nbsp;','',$ nbsp);$ name = hex2bin(str_replace('c2a0','20',bin2hex($ nbsp)));;//这确实有效,但是在与原始代码内联时却无效.$ name = preg_replace('#[A-Za-z \,\.\'\-\ __]#','',$ nbsp);$ name = preg_replace('\ x {00a0}','',$ nbsp);$ name = preg_replace('〜\ x00 \ xa0〜','',$ nbsp);$ name = preg_replace('〜\ xc2 \ xa0〜','',$ nbsp);$ name = preg_replace('\ s \ s +','',$ nbsp);$ name = preg_replace('/\ s +/','',$ nbsp);$ name = preg_replace('〜\ x {c2a0}〜siu','',$ nbsp);$ name = preg_replace('/\ s/u','',$ nbsp);$ name = preg_replace('/[^ \ w \ d \ p {L}]/u','',$ nbsp);
这是我尝试在其上执行file_get_contents的文件中的数据片段.
SupervisorGivenName SupervisorSurName row_date logid item_name acdcalls AHT AvgHoldTime转移的CntOBCalls调用德斯·施密德(Ders Schmid)2015年2月9日5054589基督·奥康纳(Christ O'Connory)26420112 4 0 0尼克·弗莱姆(Nic Flemg)2015年2月9日5054596云母(Mica)机智28543 32 6 0 0插入语句:$ bb_query ="INSERT INTO`tier1_bb_agent_daily`(`date`,`loginID`,`empID`,`firstname`,`lastname`,`supID`,`supName`,`acd_calls`,`paetec_acd_calls,`aht`,平均停留时间",已转移",出站呼叫计数")VALUES('{$ row ['date']}','{$ row ['loginID']}','{$ empID}','{$ firstname}','{$ lastname}','{$supid}','{$ newSupName}',{$ row ['acd_calls']},{$ row ['paetec_acd_calls']},{$ row ['aht']},{$ row ['avg_hold_time']},,{$ row ['transferred']},{$ row ['outbound_call_count']})在重复密钥更新上,firstname ='{$ firstname}',lastname ='{$ lastname}',empID ='{$ empID}',supID ='{$ supid}',supName ='{$ newSupName}',acd_calls= {{$ row ['acd_calls']},aht = {$ row ['aht']},paetec_acd_calls = {$ row ['paetec_acd_calls']},avg_hold_time = {$ row ['avg_hold_time']},已传送= {$ row ['transferred']},outbound_call_count = {$ row ['outbound_call_count']}";;$ db-> query($ bb_query);
为什么大多数尝试都失败了
$ name = str_replace('\ A0 \ 00','',$ nbsp);$ name = str_replace('c2a0','20',$ nbsp);
错误的转义序列.
$ name = str_replace('〜\ xc2 \ xa0〜','',$ nbsp);
正则表达式需要定界符,而不是简单的字符串替换.
$ name = str_replace('\ xc2 \ xa0','',$ nbsp);$ name = str_replace('\ xC2 \ xA0','',$ nbsp);
正确的转义序列,但是您需要双引号字符串使转义序列起作用.
$ name = str_replace('& amp; nbsp;','',$ nbsp);
仅适用于HTML实体.
$ name = preg_replace('#[A-Za-z \,\.\'\-\ _]#','',$ nbsp);
为什么要用空格替换A-Z?
$ name = preg_replace('\ x {00a0}','',$ nbsp);
缺少定界符和Unicode修饰符.
$ name = preg_replace('〜\ x00 \ xa0〜','',$ nbsp);
尝试匹配NUL字符,缺少Unicode修饰符.
$ name = preg_replace('〜\ xc2 \ xa0〜','',$ nbsp);
这应该适用于UTF-8.等同于 bin2hex
黑客.
$ name = preg_replace('\ s \ s +','',$ nbsp);
缺少正则表达式分隔符.
$ name = preg_replace('/\ s +/','',$ nbsp);
缺少Unicode修饰符.
$ name = preg_replace('〜\ x {c2a0}〜siu','',$ nbsp);
错误的转义顺序.
$ name = preg_replace('/\ s/u','',$ nbsp);
这个应该可以,但是用空格替换每个空白字符.
$ name = preg_replace('/[^ \ w \ d \ p {L}]/u','',$ nbsp);
应该可以,但也可以将标点符号替换为空格.
如何用普通空间替换不间断空间
如果您的输入编码为UTF-8(可能是 bin2hex
hack起作用的情况):
$ result = str_replace("\ xC2 \ xA0",'',$ src);# 或者$ result = preg_replace('/\ xC2 \ xA0/','',$ src);# 或者$ result = preg_replace('/\ xA0/u','',$ src);
如果您输入的内容编码为ISO-8859-1:
$ result = str_replace("\ xA0",'',$ src);# 或者$ result = preg_replace('/\ xA0/','',$ src);
出于性能方面的考虑,首选 str_replace
版本.
Using PHP to do a file_get_contents
on a *.txt file. Then inserting the data into MySQL. Values are inserting null. The null is caused by a non breaking space and a replacement charcter from an excel export. I figured that by copying the characters from the text file into a unicode inspector .
Did the same with the replacement character. Copied the text and pasted it here to confirm.
Tried many str_replace
and preg_replace
but no luck. Tried nearly everything on this SO question and found this worked.
$some_text_with_non_breaking_spaces = "Christ O'Connory";
$clean_text = hex2bin(str_replace('c2a0', '20', bin2hex($some_text_with_non_breaking_spaces)));
echo $clean_text;
BUT it didn't when I put it inline with the file_get_contents()
method.
Any idea how to resolve the null value with preg_replace
, str_replace
or other methods?
Here's all the versions I've tried:
$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);
$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('~\xc2\xa0~', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);
$name = str_replace(' ', ' ',$nbsp);
$name = hex2bin(str_replace('c2a0', '20', bin2hex($nbsp))); // this did work but not when putting inline with original code.
$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);
$name = preg_replace('\x{00a0}', ' ', $nbsp);
$name = preg_replace('~\x00\xa0~', ' ', $nbsp);
$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);
$name = preg_replace('\s\s+', ' ', $nbsp);
$name = preg_replace('/\s+/', ' ', $nbsp);
$name = preg_replace('~\x{c2a0}~siu', ' ', $nbsp);
$name = preg_replace('/\s/u', ' ', $nbsp);
$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);
Here is a snippet of data from the file I was attempting to do a file_get_contents on.
SupervisorGivenName SupervisorSurName row_date logid item_name acdcalls AHT AvgHoldTime transferred CntOBCalls calls
Ders Schmid 09/02/2015 5054589 Christ O'Connory 26 420 112 4 0 0
Nic Flemg 09/02/2015 5054596 Mica Wit 28 543 32 6 0 0
Insert statement:
$bb_query = "INSERT INTO `tier1_bb_agent_daily` (`date`,`loginID`,`empID`,`firstname`,`lastname`,`supID`, `supName`,`acd_calls`,`paetec_acd_calls`,`aht`,`avg_hold_time`,`transferred`,`outbound_call_count`)
VALUES ('{$row['date']}','{$row['loginID']}','{$empID}','{$firstname}','{$lastname}','{$supid}','{$newSupName}',{$row['acd_calls']},{$row['paetec_acd_calls']},{$row['aht']},{$row['avg_hold_time']},{$row['transferred']},{$row['outbound_call_count']})
ON DUPLICATE KEY UPDATE firstname = '{$firstname}', lastname = '{$lastname}',empID = '{$empID}', supID = '{$supid}', supName = '{$newSupName}',acd_calls = {$row['acd_calls']}, aht = {$row['aht']}, paetec_acd_calls = {$row['paetec_acd_calls']}, avg_hold_time = {$row['avg_hold_time']}, transferred = {$row['transferred']}, outbound_call_count = {$row['outbound_call_count']}";
$db->query($bb_query);
Why most of your attempts failed
$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);
Wrong escape sequences.
$name = str_replace('~\xc2\xa0~', ' ', $nbsp);
The delimiters are needed for regexes, not for simple string replacement.
$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);
Correct escape sequences, but you need double-quoted strings for escape sequences to work.
$name = str_replace(' ', ' ',$nbsp);
Only works for HTML entities.
$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);
Why would you want to replace A-Z with space?
$name = preg_replace('\x{00a0}', ' ', $nbsp);
Missing delimiters and Unicode modifier.
$name = preg_replace('~\x00\xa0~', ' ', $nbsp);
Tries to match NUL characters, missing Unicode modifier.
$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);
This one should have worked for UTF-8. It's equivalent to the bin2hex
hack.
$name = preg_replace('\s\s+', ' ', $nbsp);
Missing regex delimiters.
$name = preg_replace('/\s+/', ' ', $nbsp);
Missing Unicode modifier.
$name = preg_replace('~\x{c2a0}~siu', ' ', $nbsp);
Wrong escape sequence.
$name = preg_replace('/\s/u', ' ', $nbsp);
This one should work but replaces every whitespace character with space.
$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);
Should work but also replaces punctuation with space.
How to replace non-breaking space with normal space
If your input is encoded as UTF-8 (which it probably is if the bin2hex
hack worked):
$result = str_replace("\xC2\xA0", ' ', $src); # or
$result = preg_replace('/\xC2\xA0/', ' ', $src); # or
$result = preg_replace('/\xA0/u', ' ', $src);
If your input is encoded as ISO-8859-1:
$result = str_replace("\xA0", ' ', $src); # or
$result = preg_replace('/\xA0/', ' ', $src);
The str_replace
versions are preferred for performance reasons.
这篇关于如何删除NBSP?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!