编码问题（é和è） [英] encoding problems (é and è)

查看：145 发布时间：2019/6/5 12:01:46 python

本文介绍了编码问题（é和è）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在制作一个用于格式化字符串的程序，

或

我已添加：

＃！/ usr / bin / python

＃ - * - 编码：utf-8 - * -

在我的剧本开头但是

str = str.replace（'''''，''C''）

str = str.replace（'''''，''E''）

str = str.replace（'''''，''E''）

str = str.replace（''è''，''E''' ）

str = str.replace（''è''，''E''）

str = str.replace（''ê''，''E ''）

不行，它让我并且，如果有人知道它可能会很棒，而不是通过E

重新

问候

Bussiere
ps：我已经添加了整个脚本：

__________________________________________________ ________________________

＃！/ usr / bin / python

＃ - * - 编码：utf-8 - * -

导入文件输入，glob，字符串，sys，os，re

fichA = raw_input（" Entrez le nom du fichier d''entree："）

print（" \ n"）

fichC = raw_input（" Entrez le nom du fichier de sortie："）

print（" \ n"）

normalisation1 = raw_input （Normaliser les adresses 1（例如：Avenue->

AV）（O / N）ou A tour tout normaliser \ n）

normalisation1 = normalisation1 .upper（）

if normalisation1！=" A"：

print（" \ n"）

normalisation2 = raw_in put（Normaliserlescivilités（例如：

Docteur-> DR）（O / N）\ n）

normalisation2 = normalisation2.upper（）

print（" \ n"）

normalisation3 = raw_input（" Normaliser les Adresses 2（例如：

Place-> PL）（O / N）\ n"）

normalisation3 = normalisation3 .upper（）

normalisation4 = raw_input（" Normaliser les caracteres / et - （例如：

/ - >）（O / N）\ n"）

normalisation4 = normalisation4.upper（）

如果normalisation1 ==" A"：

normalisation1 =" O"

normalisation2 =" O"

normalisation3 =" O"

normalisation4 =" O"

fiA = open（fichA，" r"）

fiC = open（fichC，" w"）

compteur = 0

而1：

ligneA = fiA.readline（）

if ligneA ==""：

休息

if ligneA！=""：

str = ligneA

str = str.replace（''''，''A''）

str = str.replace（''b''，''B' '）

str = str.replace（''c''，''C''）

str = str.replace（''d''，'' D''）

str = str.replace（''e''，''E''）

str = str.replace（''f''， ''F''）

str = str.replace（''g''，''G''）

str = str.replace（''h' '，''H''）

str = str.replace（''我'，''我'）

str = str.replace（'' j''，''J''）

str = str.replace（''k''，''K''）

str = str.replace（ ''l''，''L''）

str = str.replace（''''，''M''）

str = str。替换（''n''，''N''）

str = str.replace（''o''，''O''）

str = str.replace（''p''，''P''）

str = str.replace（''q''，''Q' ）

str = str.replace（''r''，''R''）

str = str.replace（'s''，''S ''）

str = str.replace（''t''，''T''）

str = str.replace（''u''，' '你'）

str = str.replace（''v''，''V''）

str = str.replace（''w'' ，''W''）

str = str.replace（''x''，''X''）

str = str.replace（''y ''，''Y''）

str = str.replace（''z''，''Z''）

str = str .replace（''？''，''C''）

str = str.replace（'''''，''C''）

str = str.replace（'''''，''E''）

str = str.replace（'''''，''E''）

str = str.replace（''è''，''E''）

str = str.replace（''è''，''E''）

str = str.replace（''ê''，''E''）

str = str.replace（''ê' '，''E''）

str = str.replace（'''''，''E''）

str = str.replace（'' ？''，''E''）

str = str.replace（'''''，''A''）

str = str.replace（ '''''，''A''）

str = str.replace（''à''，''A''）

str = str。替换（''à''，''A''）

str = str.replace（''á''，''A''）

str = str.replace（''？''，''A''）

str = str.replace（'''''，''A''）

str = str.replace（'''''，''A''）

str = str.replace（''''，''A''）

str = str.replace（'''''，''A''）

str = str.replace（'''''，''我''）

str = str.replace（'''''，''我'）

str = str.replace（'''''，''我'）

str = str.replace（'''''，''我'）

str = str.replace（' ？''，''O''）

str = str.replace（'''''，''O''）

str = str.replace（ '''''，''O''）

str = str.replace（'''''，''O''）

str = str。替换（''ú''，''U''）

str = str.replace（''''，''''）

str = str。替换（''''，''''）

str = str.replace（''''，''''）

如果normalisation1 ==" O"：

str = str.replace（''AVENUE''，''AV''）

str = str.replace（''BOULEVARD ''，''BD''）

str = str.replace（''FAUBOURG''，''FBG''）

str = str.replace（' 'GENERAL''，''GAL''）

str = str.replace（''COMMANDANT''，''CMDT''）

str = str.replace （''MARECHAL''，''MAL''）

str = str.replace（''PRESIDENT''，''PRDT''）

str = str .rep lace（''SAINT''，''ST''）

str = str.replace（''SAINTE''，''STE''）

str = str.replace（''LOTISSEMENT''，''很多'）

str = str.replace（''RESIDENCE''，''RES''）

str = str.replace（''IMMEUBLE''，''IMM''）

str = str.replace（''IMEUBLE''，''IMM''）

str = str.replace（''BATIMENT''，''BAT''）

if normalisation2 ==" O"：

str = str.replace（''MONSIEUR''，''M''）

str = str.replace（''MR''，''M''）

str = str.replace（''MADAME''，''MME''）

str = str.replace（''MADEMOISELLE''，''MLLE''）

str = str.replace（''DOCTEUR''，''DR''）

str = str.replace（''PROFESSEUR''，''PR''）

str = str.replace（''MONSEIGNEUR''，''MGR''）

str = str.repla ce（''M ME''，''MME''）

if normalisation3 ==" O"：

str = str.replace（''PLACE' '，''PL''）

str = str.replace（''IMPASSE''，''IMP''）

str = str.replace（'' ESPLANADE''，''ESP''）

str = str.replace（''ROND POINT''，''RPT''）

str = str.replace （''ROUTE''，''RTE''）

str = str.replace（''PASSAGE''，''PAS''）

str = str .replace（''SQUARE''，''SQ''）

str = str.replace（''ALLEE''，''ALL''）

str = str.replace（''ESCALIER''，''ESC''）

str = str.replace（''ETAGE''，''ETG''）

str = str.replace（''PORTE''，''PTE''）

str = str.replace（''APPARTEMENT''，''APT''）

str = str.replace（''APARTEMENT''，''APT''）

str = str.replace（''AVENUE' ，''AV''）

str = str.replace（''BOULEVARD''，''BD''）

str = str.replace（''ZONE D ACTIVITE''，''ZA''）

str = str.replace（''ZONE D ACTIVITEE''，''ZA''）

str = str .replace（''ZONE D AMENAGEMENT CONCERTE''，''ZAC''）

str = str.replace（''ZONE D AMENAGEMENT CONCERTEE''，''ZAC''） >
str = str.replace（''ZONE INDUSTRELLE''，''ZI''）

str = str.replace（''CENTER COMMERCIAL''，''CCAL'' ）

str = str.replace（''CENTER''，''CTRE''）

str = str.replace（''C.CIAL''，' 'CCAL''）

str = str.replace（''CTRE CIAL''，''CCAL''）

str = str.replace（''CTRE CCAL ''，'''CCAL''）

str = str.replace（''GALERIE''，''GAL''）

str = str.replace（' 'MARTYR''，''M''）

str = str.replace（' ANCIENS''，''AC''）

str = str.replace（''ANCIEN''，''AC''）

str = str.replace（ ''REVEREND PERE''，''R P''）

if normalisation4 ==" O"：

str = str.replace（' '; \'''''，''''）

str = str.replace（''\'''''''''''''''''$ $
str = str.replace（''\''''，''''）

str = str.replace（'' - ''，''''）

str = str.replace（''，''，''''）

str = str.replace（''\\''，''''） />
str = str.replace（''\ /''，''''）

str = str.replace（''&''，'''' ）

str = str.replace（''％''，''''）

str = str.replace（''*''，'''' ）

str = str.replace（''''，''''）

str = str.replace（''。''，''''）

st r = str.replace（''_''，''''）

str = str.replace（''''，''''）

str = str.replace（''''，''''）

str = str.replace（''？''，''''）

str = str.replace（''％''，''''）

str = str.replace（''|''，''''）

str = str.replace（''''，''''）

str = str.replace（'''''' '''）

str = str.replace（''''，''''）

fiC.write（str）

compteur + = 1

print compteur，" \ n"

print" FINIT"

fiA.close（）

fiC.close（）

解决方案

bussiere bussiere写道：
嗨我做的一个用于格式化字符串的程序，
我添加了：
＃！/ usr / bin / python
＃ - * - 编码：utf-8 - * -
在开始我的剧本但

str = str.replace（'''''，''C''）
...
不起作用它让我"并且，而不是通过E重新启动é

您确定您的脚本和您的输入文件*实际上是*编码的

utf-8 ？如果它没有按预期工作，它可能是latin-1，只是

就像你的帖子一样。尝试将编码更改为latin-1。它现在有效吗？

- Christoph

似乎对我来说很好。

x ="é？"
x = x.replace（''é''，''E''）
''E \ xc7''x = x.replace（'''''，''C''）
x
''E \ xc7''x = x .replace（''？''，''C''）
x

''EC''

你也应该能够使用.upper（）方法将
大写在单个语句中的字符串中的所有内容：

tstr = ligneA.upper（）

注意：你不应该使用''str''作为变量，因为它将掩盖内置的str函数。

-Larry Bates

bussiere bussiere写道：我正在制作一个格式化字符串的程序，
或
我已经补充说：
＃！/ usr / bin / python
＃ - * - 编码：utf-8 - * -
str = str.replace（'''''，''C''）
str = str.replace（' ''''，''E''）
str = str.replace（''''，''E''）
str = str.replace（''è''，' 'E''）
str = str.replace（''è''，''E''）
str = str.replace（''ê''，''E''）

不起作用它让我并且，如果有人知道它可能会很棒，而不是通过E

重新

问候语Bussiere
ps：我已添加整个剧本如下：

__________________________________________________ ________________________

＃！/ usr / bin / python
＃ - * - 编码：utf -8 - * -
导入fileinput，glob，string，sys，os，re

fichA = raw_input（" Entrez le nom du fichier d''entree："）
print（" \ n"）
fichC = raw_input（" Entrez le nom du fichier de sortie："）
print（" \ n"）
normalisation1 = raw_input（" Normaliser les adresses 1（例如：Avenue->
AV）（O / N）ou A tour tout normaliser \ n"）
normalisation1 = normalisation1.upper（）

如果normalisation1！=" A"：
print（" \ n"）
normalisation2 = raw_input（" Normaliserlescivilités（例如：
Docteur-） > DR）（O / N）\ n&q uot;）
normalisation2 = normalisation2.upper（）
print（" \ n"）
normalisation3 = raw_input（" Normaliser les Adresses 2（例如：
Place- > PL）（O / N）\ n）
normalisation3 = normalisation3.upper（）

normalisation4 = raw_input（" Normaliser les caracteres / et - （例如：
/ - >）（O / N）\ n"）
normalisation4 = normalisation4.upper（）

如果normalisation1 ==A：
normalisation1 =" ;正常化2 =O正常化3 =O正常化4 =O
;）
fiC = open（fichC，" w"）

compteur = 0
而1：

ligneA = fiA .readline（）

如果ligneA ==""：

中断
如果ligneA！=""：
str = ligneA
str = str.replace（''''，''A''）
str = str.replace（''b''，''B''）
str = str.replace（''c''，''C''）
str = str.replace（''d''， ''D''）
str = str.replace（''e''，''E''）
str = str.replace（''f''，''F''）
str = str.replace（''g''，''G''）
str = str.replace（''h''，''H''）
str = str.replace（''我'，''我'）
str = str.replace（''j''，''J''）
str = str.replace（'' k''，''K''）
str = str.replace（''l''，''L''）
str = str.replace（''m''，''' M''）
str = str.replace（''n''，''N''）
str = str.replace（''o''，''O''）
str = str.replace（''p''，''P''）
str = str.replace（''q''，''Q''）
str = str。替换（''r''，''R''）
str = str.replace（'s''，''S''）
str = str.replace（''t' '，''T '）
str = str.replace（''''，''U''）
str = str.replace（''v''，''V''）
str = str.replace（''w''，''W''）
str = str.replace（''x''，''X''）
str = str.replace（ ''y''，''Y''）
str = str.replace（''z''，''Z''）

str = str.replace（'' ？''，''C''）
str = str.replace（''？''，''C''）
str = str.replace（'''''''' E''）
str = str.replace（'''''，''E''）
str = str.replace（''è''，''E''）
str = str.replace（''è''，''E''）
str = str.replace（''ê''，''E''）
str = str。替换（''ê''，''E''）
str = str.replace（''？''，''E''）
str = str.replace（''？' '，' E''）
str = str.replace（'''''，''A''）
str = str.replace（''？''，''A''）
str = str.replace（''à''，''A''）
str = str.replace（''à''，''A''）
str = str。替换（''á''，''A''）
str = str.replace（''？''，''A''）
str = str.replace（''？' '，''A''）
str = str.replace（'''''，''A''）
str = str.replace（''''，''A' '）
str = str.replace（'''''，''A''）
str = str.replace（'''''，''我''）
str = str.replace（''？''，''我'）
str = str.replace（'''''，''我'）
str = str.replace（ '''''，''我''）
str = str.replace（''？''，''O''）
str = str.replace（''？''， '' O '）
str = str.replace（''？''，''O''）
str = str.replace（'''''，''O''）
str = str.replace（''ú''，''U''）
str = str.replace（''''，''''）
str = str.replace（'' ''，''''）
str = str.replace（''''，''''）

如果normalisation1 ==" O"：
str = str.replace（''AVENUE''，''AV''）
str = str.replace（''BOULEVARD''，''BD''）
str = str.replace（' 'FAUBOURG''，''FBG''）
str = str.replace（''GENERAL''，''GAL''）
str = str.replace（'''COMMANDANT''，' 'CMDT''）
str = str.replace（''MARECHAL''，''MAL''）
str = str.replace（''PRESIDENT''，''PRDT''）
str = str.repla ce（''SAINT''，''ST''）
str = str.replace（''SAINTE''，''STE''）
str = str.replace（''LOTISSEMENT' '，''很多''）
str = str.replace（''RESIDENCE''，''RES''）
str = str.replace（''IMMEUBLE''，''IMM' '）
str = str.replace（''IMEUBLE''，''IMM''）
str = str.replace（''BATIMENT''，''BAT''）

如果normalisation2 ==" O"：
str = str.replace（''MONSIEUR''，''M''）
str = str.replace（''MR'' ，''M''）
str = str.replace（''MADAME''，''MME''）
str = str.replace（''MADEMOISELLE''，''MLLE'' ）
str = str.replace（''DOCTEUR''，''DR''）
str = str.replace（''PROFESSEUR''，''PR''）
str = str.replace（''MONSEIGNEUR''，''MGR''）
str = str.replace（''M ME''，''MME''）

if normalisation3 ==" O"：
str = str.replace（''PLACE''，''PL''）
str = str.replace（''IMPASSE''，''IMP' '）
str = str.replace（''ESPLANADE''，''ESP''）
str = str.replace（''ROND POINT''，''RPT''）
str = str.replace（''ROUTE''，''RTE''）
str = str.replace（''PASSAGE''，''PAS''）
str = str.replace （''SQUARE''，''SQ''）
str = str.replace（''ALLEE''，''ALL''）
str = str.replace（''ESCALIER'' ，''ESC''）
str = str.replace（''ETAGE''，''ETG''）
str = str.replace（''PORTE''，''PTE'' ）<无线电通信/> str = str.replace（''APPARTEMENT''，''APT''）
str = str.replace（''APARTEMENT''，''APT''）
str = str。替换（''AVENUE''，''AV''）
str = str.replace（''BOULEVARD''，''BD''）
str = str.replace（''ZONE D ACTIVITE''，''ZA''）
str = str.replace（''ZONE D ACTIVITEE''，''ZA''）
str = str.replace（''ZONE D AMENAGEMENT CONCERTE ''，'''ZAC''）
str = str.replace（''ZONE D AMENAGEMENT CONCERTEE''，''ZAC''）
str = str.replace（''ZONE INDUSTRELLE'' ，''ZI''）
str = str.replace（''CENTER COMMERCIAL''，''CCAL''）
str = str.replace（''CENTER''，''CTRE' '）
str = str.replace（''C'''，'''CCAL''）
str = str.replac e（''CTRE CIAL''，''CCAL''）
str = str.replace（''CTRE CCAL''，''CCAL''）
str = str.replace（'' GALERIE''，''GAL''）
str = str.replace（''MARTYR''，''M''）
str = str.replace（''ANCIENS''，'' AC''）
str = str.replace（''ANCIEN''，''AC''）
str = str.replace（''REVEREND PERE''，''R P''）

如果normalisation4 ==" O"：
str = str.replace（''; \"''，''''）
str = str。替换（''\'，''，''''）
str = str.replace（''\''''，''''）
str = str.replace（ '' - ''，''''）
str = str.replace（''，''，''''）
str = str.replace（''\\''' ，''''）
str = str.replace（''\ /''，''''）
str = str.replace（''&''，''''）
str = str.replace （''％''，''''）
str = str.replace（''*''，''''）
str = str.replace（''''，''' ''）
str = str.replace（''。''，''''）
str = str.replace（''_''，''''）
str = str.replace（''''，''''）
str = str.replace（''''，''''）
str = str.replace（'''''' ，''''）
str = str.replace（''％''，''''）
str = str.replace（''|''，''''） />

str = str.replace（''''，''''）
str = str.replace（'''''' '''）
str = str.replace（''''，' ''）
fiC.write（str）
compteur + = 1
print compteur，" \ n"

print" FINIT"
fiA.close（）
fiC.close（）

On 23/03/2006 10:07 PM，bussiere bussiere写道：< blockquote class =post_quotes>嗨我正在制作一个格式化字符串的程序，
或
我添加了：
＃！/ usr / bin / python
＃ - * - 编码：utf-8 - * -

在我的剧本开头但是

str = str.replace（''？''，''C ''）
str = str.replace（'''''，''E''）
str = str.replace（'''''，''E''）
str = str.replace（''è''，''E''）
str = str.replace（''è''，''E''）
str = str.replace （''ê''，''E''）

不起作用它让我并且，如果有人知道它可能很棒，而不是通过E

重新$

我在下面添加了一些评论......我希望他们有所帮助。

干杯，

John

问候语
Bussiere
ps：我添加了整个剧本下：
__________________________________________________ ________________________
[snip]
如果ligneA！=""：
str = ligneA
str = str.replace（''a ''，''A''）
[snip] str = str.replace（''z''，''Z''）

str = str.replace（'' ？''，''C''）
str = str.replace（''？''，''C''）
str = str.replace（'''''''' E''）
str = str.replace（''''，''E''）
str = str.replace（''è''，''E''）
[snip] str = str.replace（''ú''，''U''）

你可以更换使用带有合适表格的字符串translate（）方法，一次性移除所有这些升级和重音，

。

str = str.replace（''''， '''''
str = str.replace（''''，''''）
str = str.replace（''''，''''）

用于规范化空格的标准Python习语是

strg =''''。join（strg.split（））

strg =''ALLO BUSSIERE\tCA VA？ ''
strg.split（）
[''ALLO''，''BUSSIERE''，''CA''，''VA？'''''''。join（strg。 split（））
''ALLO BUSSIERE CA VA？''

[snip] if normalisation2 ==" O"：
str = str.replace （''MONSIEUR''，''M''）
str = str.replace（''MR''，''M''）

你需要这种方法要非常小心。您正在更改MR的每一次发生在字符串中，而不仅仅是整个单词word

意思是Monsieur。

可能出错的复制示例：strg =''MR IMRE NAGY，123 PRIMROSE STREET，SHAMROCK VALLEY''
strg.replace（''MR''，''M''）
''M IME NAGY，123 PRIMOSE STREET，SHAMOCK VALLEY''

一个真实的，非构造的历史课：某个数据库通过注释指示

重复记录" DUP"在姓氏领域

，例如SMITH DUP。幸运的是，在测试中检测到

所谓的清理导致DUPLESSIS成为PLESSIS和DUPRAT以使
成为RAT！

这里有两点：（1）将你的字符串分成单词或者令牌。

使用strg.split（）是一个开始，但你可能需要更多的东西

复杂的，例如" - "作为额外的标记分隔符。（2）不要写出所有这些代码行，而是考虑将这些

替换放在字典中：

title_substitution = {

''MONSIEUR''：''M''，

''MR''：''M''，

''MADAME''：''MME''，

＃etc

}

下一级改进是从文件。

[snip]
如果normalisation4 ==" O"：
str = str.replace（''; \"''，'''' ）
str = str.replace（''\'"''，''''）
str = str.replace（''\''''，''''）
str = str.replace（'' - ''，''''）
str = str.replace（''，''，''''）
str = str.replace （''\\''，''''）
str = str.replace（''\ /''，''''）
str = str.replace（''&''，''''）

[snip]

再次考虑字符串translate（）方法。

另外，考虑到其中一些角色可能有一些含义

你也许不应该被吹走，例如比较''SMITH& WESSON''

''SMITH ET WESSON'':-)

hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great

regards
Bussiere
ps : i''ve added the whole script under :

__________________________________________________ ________________________

#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re

fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()

if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()

if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0

while 1:

ligneA=fiA.readline()

if ligneA == "":

break

if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')

if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')

if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')

if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')

str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()

解决方案

bussiere bussiere wrote:
hi i''am making a program for formatting string,
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
...
doesn''t work it put me " and , instead of remplacing é by E

Are your sure your script and your input file *is* actually encoded with
utf-8? If it does not work as expected, it is probably latin-1, just
like your posting. Try changing the coding to latin-1. Does it work now?

-- Christoph

Seems to work fine for me.

x="é?"
x=x.replace(''é'',''E'') ''E\xc7'' x=x.replace(''?'',''C'')
x ''E\xc7'' x=x.replace(''?'',''C'')
x
''EC''

You should also be able to use .upper() method to
uppercase everything in the string in a single statement:

tstr=ligneA.upper()

Note: you should never use ''str'' as a variable as
it will mask the built-in str function.

-Larry Bates

bussiere bussiere wrote: hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great

regards
Bussiere
ps : i''ve added the whole script under :

__________________________________________________ ________________________

#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re

fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()

if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()

if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0

while 1:

ligneA=fiA.readline()

if ligneA == "":

break

if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')

if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')

if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')

if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')

str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()

On 23/03/2006 10:07 PM, bussiere bussiere wrote:
hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great
Hi, I''ve added some comments below ... I hope they help.
Cheers,
John

regards
Bussiere
ps : i''ve added the whole script under :
__________________________________________________ ________________________ [snip]
if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'') [snip] str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'') [snip] str = str.replace(''ú'',''U'')
You can replace ALL of this upshifting and accent removal in one blow by
using the string translate() method with a suitable table.
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
The standard Python idiom for normalising whitespace is
strg = '' ''.join(strg.split())

strg = '' ALLO BUSSIERE\tCA VA? ''
strg.split() [''ALLO'', ''BUSSIERE'', ''CA'', ''VA?''] '' ''.join(strg.split()) ''ALLO BUSSIERE CA VA?''
[snip] if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
You need to be very careful with this approach. You are changing EVERY
occurrence of "MR" in the string, not just where it is a whole "word"
meaning "Monsieur".
Copnstructed example of what can go wrong: strg = ''MR IMRE NAGY, 123 PRIMROSE STREET, SHAMROCK VALLEY''
strg.replace(''MR'', ''M'') ''M IME NAGY, 123 PRIMOSE STREET, SHAMOCK VALLEY''

A real, non-constructed history lesson: A certain database indicated
duplicate records by having the annotation "DUP" in the surname field
e.g. "SMITH DUP". Fortunately it was detected in testing that the
so-called clean-up was causing DUPLESSIS to become PLESSIS and DUPRAT to
become RAT!

Two points here: (1) Split up your strings into "words" or "tokens".
Using strg.split() is a start but you may need something more
sophisticated e.g. "-" as an additional token separator. (2) Instead of
writing out all those lines of code, consider putting those
substitutions in a dictionary:

title_substitution = {
''MONSIEUR'': ''M'',
''MR'': ''M'',
''MADAME'': ''MME'',
# etc
}
Next level of improvement is to read that stuff from a file.
[snip]
if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')

[snip]
Again, consider the string translate() method.
Also, consider that some of those characters may have some meaning that
you perhaps shouldn''t blow away e.g. compare ''SMITH & WESSON'' with
''SMITH ET WESSON'' :-)

这篇关于编码问题（é和è）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编码问题（é和è） [英] encoding problems (é and è)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

编码问题（é和è） [英] encoding problems (é and è)

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭