你如何处理“特殊” MS Word添加的字符? [英] How do you deal with the "special" characters that MS Word adds?

查看:143
本文介绍了你如何处理“特殊” MS Word添加的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何清理MS Word的特殊字符,例如m和n破折号和引号?



我经常发现自己正在复制来自Word的客户端内容并粘贴到一个静态HTML页面中,但内容最终会出现奇怪的字符,因为特殊字符不会转换为正确的ACSII代码,因此显示为乱码文本。 (对于这些基本的网站,我使用的是Dreamweaver。)

当客户将Word中的内容复制到纯文本字段(主要是textareas)时,我看到很多类似的问题, 。当我把它放到一个PDF中(通过PHP),或者它出现在页面上时,它也有乱码文本。



你如何处理这个问题?是否有清洁服务或程序使用?

解决方案

关于客户发布Word中的复制/粘贴文本textareas:



确保客户端以任何特定编码方式向您发送文本的最可靠方法(因此希望可以从CP-1252进行任何转换[ Word使用]为你),是为所有< form> accept-charset =...属性C $ C>秒。例如:

 < form ... accept-charset =UTF-8> 
...
< / form>

大多数浏览器都会遵守这一点,并确保将任何特定于Word的字符转换为适当的字符设置之前,它到达您的网站。



一旦无效的文本到达您的网站,你可以做的很少,可以可靠地修复它,所以最好只检查所有输入无论您使用的任何字符集是否有效,并放弃具有无效文本的任何请求。即使使用 accept-charset ,这也是必需的,因为毫无疑问,有些客户端会忽略它。


I'm wondering how you clean the special characters that MS Word as, such as m- and n-dashes and curly quotes?

I often find myself copying content from clients from Word and pasting into a static HTML page, but the content ends up with weird characters because the special characters are not converted to their correct ACSII codes and therefore show up as garbled text. (For these basic websites, I'm using Dreamweaver.)

I have seen a lot of similar problems when clients copy content from Word into text only fields (mostly textareas). When I put this into a PDF (through PHP) or it shows up on the page it too has garbled text.

How do you deal with this? Is there a cleaning service or program you use?

解决方案

With regards to clients posting copy/pasted text from Word in textareas:

The most reliable way to ensure that the client sends you text in any particular encoding (thus hopefully doing any conversion from CP-1252 [or whatever Word uses] for you), is to add the accept-charset="..." attribute to all your <form>s. E.g.:

<form ... accept-charset="UTF-8">
   ...
</form>

Most browsers will obey that and make sure any "Word-specific" characters are converted to the appropriate character set before it gets to your website.

Once invalid text gets to your website, there's very little you can do to fix it reliably, so it's best to simply check all input for being valid in whatever character set you use, and discard any requests that have invalid text. This is necessary even with accept-charset, because undoubtedly there are some clients out there that will ignore it.

这篇关于你如何处理“特殊” MS Word添加的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆