如何将utf-8花式报价转换为中性报价 [英] How to convert utf-8 fancy quotes to neutral quotes

查看：94 发布时间：2020/7/13 2:53:48 python python-2.7 unicode encoding utf-8

本文介绍了如何将utf-8花式报价转换为中性报价的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个小的Python脚本，用于解析word文档并将其写入csv文件.但是，有些文档中有些utf-8字符我的脚本无法正确处理.

I'm writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can't process correctly.

花哨的引号经常出现(u'\ u201c').是否有一种快速，简便(又聪明)的方法来替换中性的ascii支持的引号，所以我可以将line.encode('ascii')写入csv文件?

Fancy quotes show up quite often (u'\u201c'). Is there a quick and easy (and smart) way of replacing those with the neutral ascii-supported quotes, so I can just write line.encode('ascii') to the csv file?

我试图找到左引号并将其替换:

I have tried to find the left quote and replace it:

val = line.find(u'\u201c')
if val >= 0: line[val] = '"'

但无济于事:

TypeError: 'unicode' object does not support item assignment

我所说的是一个好的策略吗?还是我应该设置csv以支持utf-8(尽管我不确定要读取CSV的应用程序是否需要utf-8)?

Is what I've described a good strategy? Or should I just set up the csv to support utf-8 (though I'm not sure if the application that will be reading the CSV wants utf-8)?

谢谢

推荐答案

您可以使用 Unidecode包将所有Unicode字符自动转换为最接近的纯ASCII等效字符.

You can use the Unidecode package to automatically convert all Unicode characters to their nearest pure ASCII equivalent.

from unidecode import unidecode
line = unidecode(line)

这将处理双引号以及单引号，破折号和您可能尚未发现的其他内容.

This will handle both directions of double quotes as well as single quotes, em dashes, and other things that you probably haven't discovered yet.

这篇关于如何将utf-8花式报价转换为中性报价的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将utf-8花式报价转换为中性报价 [英] How to convert utf-8 fancy quotes to neutral quotes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将utf-8花式报价转换为中性报价 [英] How to convert utf-8 fancy quotes to neutral quotes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭