编码问题(é和è) [英] encoding problems (é and è)

查看:145
本文介绍了编码问题(é和è)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个用于格式化字符串的程序,



我已添加:

#!/ usr / bin / python

# - * - 编码:utf-8 - * -


在我的剧本开头但是

str = str.replace(''''',''C'')

str = str.replace(''''',''E'')

str = str.replace(''''',''E'')

str = str.replace(''è'',''E''' )

str = str.replace(''è'',''E'')

str = str.replace(''ê'',''E '')

不行,它让我并且,如果有人知道它可能会很棒,而不是通过E

重新


问候

Bussiere
ps:我已经添加了整个脚本:


__________________________________________________ ________________________



#!/ usr / bin / python

# - * - 编码:utf-8 - * -

导入文件输入,glob,字符串,sys,os,re


fichA = raw_input(" Entrez le nom du fichier d''entree:")

print(" \ n")

fichC = raw_input(" Entrez le nom du fichier de sortie:")

print(" \ n")

normalisation1 = raw_input (Normaliser les adresses 1(例如:Avenue->

AV)(O / N)ou A tour tout normaliser \ n)

normalisation1 = normalisation1 .upper()


if normalisation1!=" A":

print(" \ n")

normalisation2 = raw_in put(Normaliserlescivilités(例如:

Docteur-> DR)(O / N)\ n)

normalisation2 = normalisation2.upper()

print(" \ n")

normalisation3 = raw_input(" Normaliser les Adresses 2(例如:

Place-> PL)(O / N)\ n")

normalisation3 = normalisation3 .upper()

normalisation4 = raw_input(" Normaliser les caracteres / et - (例如:

/ - >)(O / N)\ n")

normalisation4 = normalisation4.upper()


如果normalisation1 ==" A":

normalisation1 =" O"

normalisation2 =" O"

normalisation3 =" O"

normalisation4 =" O"

fiA = open(fichA," r")

fiC = open(fichC," w")

compteur = 0


而1:


ligneA = fiA.readline()


if ligneA =="":


休息


if ligneA!="":

str = ligneA

str = str.replace('''',''A'')

str = str.replace(''b'',''B' ')

str = str.replace(''c'',''C'')

str = str.replace(''d'','' D'')

str = str.replace(''e'',''E'')

str = str.replace(''f'', ''F'')

str = str.replace(''g'',''G'')

str = str.replace(''h' ',''H'')

str = str.replace(''我',''我')

str = str.replace('' j'',''J'')

str = str.replace(''k'',''K'')

str = str.replace( ''l'',''L'')

str = str.replace('''',''M'')

str = str。替换(''n'',''N'')

str = str.replace(''o'',''O'')

str = str.replace(''p'',''P'')

str = str.replace(''q'',''Q' )

str = str.replace(''r'',''R'')

str = str.replace('s'',''S '')

str = str.replace(''t'',''T'')

str = str.replace(''u'',' '你')

str = str.replace(''v'',''V'')

str = str.replace(''w'' ,''W'')

str = str.replace(''x'',''X'')

str = str.replace(''y '',''Y'')

str = str.replace(''z'',''Z'')


str = str .replace(''?'',''C'')

str = str.replace(''''',''C'')

str = str.replace(''''',''E'')

str = str.replace(''''',''E'')

str = str.replace(''è'',''E'')

str = str.replace(''è'',''E'')

str = str.replace(''ê'',''E'')

str = str.replace(''ê' ',''E'')

str = str.replace(''''',''E'')

str = str.replace('' ?'',''E'')

str = str.replace(''''',''A'')

str = str.replace( ''''',''A'')

str = str.replace(''à'',''A'')

str = str。替换(''à'',''A'')

str = str.replace(''á'',''A'')

str = str.replace(''?'',''A'')

str = str.replace(''''',''A'')

str = str.replace(''''',''A'')

str = str.replace('''',''A'')

str = str.replace(''''',''A'')

str = str.replace(''''',''我'')

str = str.replace(''''',''我')

str = str.replace(''''',''我')

str = str.replace(''''',''我')

str = str.replace(' ?'',''O'')

str = str.replace(''''',''O'')

str = str.replace( ''''',''O'')

str = str.replace(''''',''O'')

str = str。替换(''ú'',''U'')

str = str.replace('''','''')

str = str。替换('''','''')

str = str.replace('''','''')


如果normalisation1 ==" O":

str = str.replace(''AVENUE'',''AV'')

str = str.replace(''BOULEVARD '',''BD'')

str = str.replace(''FAUBOURG'',''FBG'')

str = str.replace(' 'GENERAL'',''GAL'')

str = str.replace(''COMMANDANT'',''CMDT'')

str = str.replace (''MARECHAL'',''MAL'')

str = str.replace(''PRESIDENT'',''PRDT'')

str = str .rep lace(''SA​​INT'',''ST'')

str = str.replace(''SA​​INTE'',''STE'')

str = str.replace(''LOTISSEMENT'',''很多')

str = str.replace(''RESIDENCE'',''RES'')

str = str.replace(''IMMEUBLE'',''IMM'')

str = str.replace(''IMEUBLE'',''IMM'')

str = str.replace(''BATIMENT'',''BAT'')


if normalisation2 ==" O":

str = str.replace(''MONSIEUR'',''M'')

str = str.replace(''MR'',''M'')

str = str.replace(''MADAME'',''MME'')

str = str.replace(''MADEMOISELLE'',''MLLE'')

str = str.replace(''DOCTEUR'',''DR'')

str = str.replace(''PROFESSEUR'',''PR'')

str = str.replace(''MONSEIGNEUR'',''MGR'')

str = str.repla ce(''M ME'',''MME'')

if normalisation3 ==" O":

str = str.replace(''PLACE' ',''PL'')

str = str.replace(''IMPASSE'',''IMP'')

str = str.replace('' ESPLANADE'',''ESP'')

str = str.replace(''ROND POINT'',''RPT'')

str = str.replace (''ROUTE'',''RTE'')

str = str.replace(''PASSAGE'',''PAS'')

str = str .replace(''SQUARE'',''SQ'')

str = str.replace(''ALLEE'',''ALL'')

str = str.replace(''ESCALIER'',''ESC'')

str = str.replace(''ETAGE'',''ETG'')

str = str.replace(''PORTE'',''PTE'')

str = str.replace(''APPARTEMENT'',''APT'')

str = str.replace(''APARTEMENT'',''APT'')

str = str.replace(''AVENUE' ,''AV'')

str = str.replace(''BOULEVARD'',''BD'')

str = str.replace(''ZONE D ACTIVITE'',''ZA'')

str = str.replace(''ZONE D ACTIVITEE'',''ZA'')

str = str .replace(''ZONE D AMENAGEMENT CONCERTE'',''ZAC'')

str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'',''ZAC'') >
str = str.replace(''ZONE INDUSTRELLE'',''ZI'')

str = str.replace(''CENTER COMMERCIAL'',''CCAL'' )

str = str.replace(''CENTER'',''CTRE'')

str = str.replace(''C.CIAL'',' 'CCAL'')

str = str.replace(''CTRE CIAL'',''CCAL'')

str = str.replace(''CTRE CCAL '','''CCAL'')

str = str.replace(''GALERIE'',''GAL'')

str = str.replace(' 'MARTYR'',''M'')

str = str.replace(' ANCIENS'',''AC'')

str = str.replace(''ANCIEN'',''AC'')

str = str.replace( ''REVEREND PERE'',''R P'')


if normalisation4 ==" O":

str = str.replace(' '; \''''','''')

str = str.replace(''\'''''''''''''''''$ $
str = str.replace(''\'''','''')

str = str.replace('' - '','''')

str = str.replace('','','''')

str = str.replace(''\\'','''') />
str = str.replace(''\ /'','''')

str = str.replace(''&'','''' )

str = str.replace(''%'','''')

str = str.replace(''*'','''' )

str = str.replace('''','''')

str = str.replace(''。'','''')

st r = str.replace(''_'','''')

str = str.replace('''','''')

str = str.replace('''','''')

str = str.replace(''?'','''')

str = str.replace(''%'','''')

str = str.replace(''|'','''')





str = str.replace('''','''')

str = str.replace('''''' ''')

str = str.replace('''','''')

fiC.write(str)

compteur + = 1

print compteur," \ n"

print" FINIT"

fiA.close()

fiC.close()

解决方案

bussiere bussiere写道:

嗨我做的一个用于格式化字符串的程序,
我添加了:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
在开始我的剧本但

str = str.replace(''''',''C'')
...
不起作用它让我"并且,而不是通过E重新启动é




您确定您的脚本和您的输入文件*实际上是*编码的

utf-8 ?如果它没有按预期工作,它可能是latin-1,只是

就像你的帖子一样。尝试将编码更改为latin-1。它现在有效吗?


- Christoph


似乎对我来说很好。

x ="é?"
x = x.replace(''é'',''E'')
''E \ xc7''x = x.replace(''''',''C'')
x
''E \ xc7''x = x .replace(''?'',''C'')
x


''EC''


你也应该能够使用.upper()方法将
大写在单个语句中的字符串中的所有内容:


tstr = ligneA.upper()


注意:你不应该使用''str''作为变量,因为它将掩盖内置的str函数。


-Larry Bates


bussiere bussiere写道:我正在制作一个格式化字符串的程序,

我已经补充说:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
str = str.replace(''''',''C'')
str = str.replace(' '''',''E'')
str = str.replace('''',''E'')
str = str.replace(''è'',' 'E'')
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E'')

不起作用它让我并且,如果有人知道它可能会很棒,而不是通过E

重新

问候语Bussiere
ps:我已添加整个剧本如下:

__________________________________________________ ________________________

#!/ usr / bin / python
# - * - 编码:utf -8 - * -
导入fileinput,glob,string,sys,os,re

fichA = raw_input(" Entrez le nom du fichier d''entree:")
print(" \ n")
fichC = raw_input(" Entrez le nom du fichier de sortie:")
print(" \ n")
normalisation1 = raw_input(" Normaliser les adresses 1(例如:Avenue->
AV)(O / N)ou A tour tout normaliser \ n")
normalisation1 = normalisation1.upper()

如果normalisation1!=" A":
print(" \ n")
normalisation2 = raw_input(" Normaliserlescivilités(例如:
Docteur-) > DR)(O / N)\ n&q uot;)
normalisation2 = normalisation2.upper()
print(" \ n")
normalisation3 = raw_input(" Normaliser les Adresses 2(例如:
Place- > PL)(O / N)\ n)
normalisation3 = normalisation3.upper()

normalisation4 = raw_input(" Normaliser les caracteres / et - (例如:
/ - >)(O / N)\ n")
normalisation4 = normalisation4.upper()

如果normalisation1 ==A:
normalisation1 =" ;正常化2 =O正常化3 =O正常化4 =O
;)
fiC = open(fichC," w")

compteur = 0
而1:

ligneA = fiA .readline()

如果ligneA =="":

中断
如果ligneA!="":
str = ligneA
str = str.replace('''',''A'')
str = str.replace(''b'',''B'')
str = str.replace(''c'',''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'',''E'')
str = str.replace(''f'',''F'')
str = str.replace(''g'',''G'')
str = str.replace(''h'',''H'')
str = str.replace(''我',''我')
str = str.replace(''j'',''J'')
str = str.replace('' k'',''K'')
str = str.replace(''l'',''L'')
str = str.replace(''m'',''' M'')
str = str.replace(''n'',''N'')
str = str.replace(''o'',''O'')
str = str.replace(''p'',''P'')
str = str.replace(''q'',''Q'')
str = str。替换(''r'',''R'')
str = str.replace('s'',''S'')
str = str.replace(''t' ',''T ')
str = str.replace('''',''U'')
str = str.replace(''v'',''V'')
str = str.replace(''w'',''W'')
str = str.replace(''x'',''X'')
str = str.replace( ''y'',''Y'')
str = str.replace(''z'',''Z'')

str = str.replace('' ?'',''C'')
str = str.replace(''?'',''C'')
str = str.replace('''''''' E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E'')
str = str。替换(''ê'',''E'')
str = str.replace(''?'',''E'')
str = str.replace(''?' ',' E'')
str = str.replace(''''',''A'')
str = str.replace(''?'',''A'')
str = str.replace(''à'',''A'')
str = str.replace(''à'',''A'')
str = str。替换(''á'',''A'')
str = str.replace(''?'',''A'')
str = str.replace(''?' ',''A'')
str = str.replace(''''',''A'')
str = str.replace('''',''A' ')
str = str.replace(''''',''A'')
str = str.replace(''''',''我'')
str = str.replace(''?'',''我')
str = str.replace(''''',''我')
str = str.replace( ''''',''我'')
str = str.replace(''?'',''O'')
str = str.replace(''?'', '' O ')
str = str.replace(''?'',''O'')
str = str.replace(''''',''O'')
str = str.replace(''ú'',''U'')
str = str.replace('''','''')
str = str.replace('' '','''')
str = str.replace('''','''')

如果normalisation1 ==" O":
str = str.replace(''AVENUE'',''AV'')
str = str.replace(''BOULEVARD'',''BD'')
str = str.replace(' 'FAUBOURG'',''FBG'')
str = str.replace(''GENERAL'',''GAL'')
str = str.replace('''COMMANDANT'',' 'CMDT'')
str = str.replace(''MARECHAL'',''MAL'')
str = str.replace(''PRESIDENT'',''PRDT'')
str = str.repla ce(''SA​​INT'',''ST'')
str = str.replace(''SA​​INTE'',''STE'')
str = str.replace(''LOTISSEMENT' ',''很多'')
str = str.replace(''RESIDENCE'',''RES'')
str = str.replace(''IMMEUBLE'',''IMM' ')
str = str.replace(''IMEUBLE'',''IMM'')
str = str.replace(''BATIMENT'',''BAT'')

如果normalisation2 ==" O":
str = str.replace(''MONSIEUR'',''M'')
str = str.replace(''MR'' ,''M'')
str = str.replace(''MADAME'',''MME'')
str = str.replace(''MADEMOISELLE'',''MLLE'' )
str = str.replace(''DOCTEUR'',''DR'')
str = str.replace(''PROFESSEUR'',''PR'')
str = str.replace(''MONSEIGNEUR'',''MGR'')
str = str.replace(''M ME'',''MME'')

if normalisation3 ==" O":
str = str.replace(''PLACE'',''PL'')
str = str.replace(''IMPASSE'',''IMP' ')
str = str.replace(''ESPLANADE'',''ESP'')
str = str.replace(''ROND POINT'',''RPT'')
str = str.replace(''ROUTE'',''RTE'')
str = str.replace(''PASSAGE'',''PAS'')
str = str.replace (''SQUARE'',''SQ'')
str = str.replace(''ALLEE'',''ALL'')
str = str.replace(''ESCALIER'' ,''ESC'')
str = str.replace(''ETAGE'',''ETG'')
str = str.replace(''PORTE'',''PTE'' )<无线电通信/> str = str.replace(''APPARTEMENT'',''APT'')
str = str.replace(''APARTEMENT'',''APT'')
str = str。替换(''AVENUE'',''AV'')
str = str.replace(''BOULEVARD'',''BD'')
str = str.replace(''ZONE D ACTIVITE'',''ZA'')
str = str.replace(''ZONE D ACTIVITEE'',''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE '','''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'',''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'' ,''ZI'')
str = str.replace(''CENTER COMMERCIAL'',''CCAL'')
str = str.replace(''CENTER'',''CTRE' ')
str = str.replace(''C''','''CCAL'')
str = str.replac e(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace('' GALERIE'',''GAL'')
str = str.replace(''MARTYR'',''M'')
str = str.replace(''ANCIENS'','' AC'')
str = str.replace(''ANCIEN'',''AC'')
str = str.replace(''REVEREND PERE'',''R P'')

如果normalisation4 ==" O":
str = str.replace(''; \"'','''')
str = str。替换(''\','','''')
str = str.replace(''\'''','''')
str = str.replace( '' - '','''')
str = str.replace('','','''')
str = str.replace(''\\''' ,'''')
str = str.replace(''\ /'','''')
str = str.replace(''&'','''')
str = str.replace (''%'','''')
str = str.replace(''*'','''')
str = str.replace('''',''' '')
str = str.replace(''。'','''')
str = str.replace(''_'','''')
str = str.replace('''','''')
str = str.replace('''','''')
str = str.replace('''''' ,'''')
str = str.replace(''%'','''')
str = str.replace(''|'','''') />


str = str.replace('''','''')
str = str.replace('''''' ''')
str = str.replace('''',' '')
fiC.write(str)
compteur + = 1
print compteur," \ n"

print" FINIT"
fiA.close()
fiC.close()



On 23/03/2006 10:07 PM,bussiere bussiere写道:< blockquote class =post_quotes>嗨我正在制作一个格式化字符串的程序,

我添加了:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -

在我的剧本开头但是

str = str.replace(''?'',''C '')
str = str.replace(''''',''E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''è'',''E'')
str = str.replace (''ê'',''E'')

不起作用它让我并且,如果有人知道它可能很棒,而不是通过E

重新$


我在下面添加了一些评论......我希望他们有所帮助。

干杯,

John

问候语
Bussiere
ps:我添加了整个剧本下:
__________________________________________________ ________________________
[snip]
如果ligneA!="":
str = ligneA
str = str.replace(''a '',''A'')
[snip] str = str.replace(''z'',''Z'')

str = str.replace('' ?'',''C'')
str = str.replace(''?'',''C'')
str = str.replace('''''''' E'')
str = str.replace('''',''E'')
str = str.replace(''è'',''E'')
[snip] str = str.replace(''ú'',''U'')


你可以更换使用带有合适表格的字符串translate()方法,一次性移除所有这些升级和重音,



str = str.replace('''', '''''
str = str.replace('''','''')
str = str.replace('''','''')


用于规范化空格的标准Python习语是

strg =''''。join(strg.split())

strg =''ALLO BUSSIERE\tCA VA? ''
strg.split()
[''ALLO'',''BUSSIERE'',''CA'',''VA?'''''''。join(strg。 split())
''ALLO BUSSIERE CA VA?''


[snip] if normalisation2 ==" O":
str = str.replace (''MONSIEUR'',''M'')
str = str.replace(''MR'',''M'')


你需要这种方法要非常小心。您正在更改MR的每一次发生在字符串中,而不仅仅是整个单词word

意思是Monsieur。

可能出错的复制示例:strg =''MR IMRE NAGY,123 PRIMROSE STREET,SHAMROCK VALLEY''
strg.replace(''MR'',''M'')
''M IME NAGY,123 PRIMOSE STREET,SHAMOCK VALLEY''



一个真实的,非构造的历史课:某个数据库通过注释指示

重复记录" DUP"在姓氏领域

,例如SMITH DUP。幸运的是,在测试中检测到

所谓的清理导致DUPLESSIS成为PLESSIS和DUPRAT以使
成为RAT!


这里有两点:(1)将你的字符串分成单词或者令牌。

使用strg.split()是一个开始,但你可能需要更多的东西

复杂的,例如" - "作为额外的标记分隔符。 (2)不要写出所有这些代码行,而是考虑将这些

替换放在字典中:


title_substitution = {

''MONSIEUR'':''M'',

''MR'':''M'',

''MADAME'':''MME'',

#etc

}

下一级改进是从文件。

[snip]
如果normalisation4 ==" O":
str = str.replace(''; \"'','''' )
str = str.replace(''\'"'','''')
str = str.replace(''\'''','''')
str = str.replace('' - '','''')
str = str.replace('','','''')
str = str.replace (''\\'','''')
str = str.replace(''\ /'','''')
str = str.replace(''&'','''')



[snip]

再次考虑字符串translate()方法。

另外,考虑到其中一些角色可能有一些含义

你也许不应该被吹走,例如比较''SMITH& WESSON''

''SMITH ET WESSON'':-)


hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great

regards
Bussiere
ps : i''ve added the whole script under :


__________________________________________________ ________________________


#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re

fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()

if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()

if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0

while 1:

ligneA=fiA.readline()

if ligneA == "":

break

if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')

if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')

if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')

if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')




str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()

解决方案

bussiere bussiere wrote:

hi i''am making a program for formatting string,
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
...
doesn''t work it put me " and , instead of remplacing é by E



Are your sure your script and your input file *is* actually encoded with
utf-8? If it does not work as expected, it is probably latin-1, just
like your posting. Try changing the coding to latin-1. Does it work now?

-- Christoph


Seems to work fine for me.

x="é?"
x=x.replace(''é'',''E'') ''E\xc7'' x=x.replace(''?'',''C'')
x ''E\xc7'' x=x.replace(''?'',''C'')
x

''EC''

You should also be able to use .upper() method to
uppercase everything in the string in a single statement:

tstr=ligneA.upper()

Note: you should never use ''str'' as a variable as
it will mask the built-in str function.

-Larry Bates

bussiere bussiere wrote: hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great

regards
Bussiere
ps : i''ve added the whole script under :


__________________________________________________ ________________________


#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re

fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()

if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()

if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0

while 1:

ligneA=fiA.readline()

if ligneA == "":

break

if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')

if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')

if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')

if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')




str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()



On 23/03/2006 10:07 PM, bussiere bussiere wrote:

hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-

in the begining of my script but

str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great
Hi, I''ve added some comments below ... I hope they help.
Cheers,
John

regards
Bussiere
ps : i''ve added the whole script under :
__________________________________________________ ________________________ [snip]
if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'') [snip] str = str.replace(''z'', ''Z'')

str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'') [snip] str = str.replace(''ú'',''U'')
You can replace ALL of this upshifting and accent removal in one blow by
using the string translate() method with a suitable table.
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
The standard Python idiom for normalising whitespace is
strg = '' ''.join(strg.split())

strg = '' ALLO BUSSIERE\tCA VA? ''
strg.split() [''ALLO'', ''BUSSIERE'', ''CA'', ''VA?''] '' ''.join(strg.split()) ''ALLO BUSSIERE CA VA?''
[snip] if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
You need to be very careful with this approach. You are changing EVERY
occurrence of "MR" in the string, not just where it is a whole "word"
meaning "Monsieur".
Copnstructed example of what can go wrong: strg = ''MR IMRE NAGY, 123 PRIMROSE STREET, SHAMROCK VALLEY''
strg.replace(''MR'', ''M'') ''M IME NAGY, 123 PRIMOSE STREET, SHAMOCK VALLEY''



A real, non-constructed history lesson: A certain database indicated
duplicate records by having the annotation "DUP" in the surname field
e.g. "SMITH DUP". Fortunately it was detected in testing that the
so-called clean-up was causing DUPLESSIS to become PLESSIS and DUPRAT to
become RAT!

Two points here: (1) Split up your strings into "words" or "tokens".
Using strg.split() is a start but you may need something more
sophisticated e.g. "-" as an additional token separator. (2) Instead of
writing out all those lines of code, consider putting those
substitutions in a dictionary:

title_substitution = {
''MONSIEUR'': ''M'',
''MR'': ''M'',
''MADAME'': ''MME'',
# etc
}
Next level of improvement is to read that stuff from a file.
[snip]
if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')


[snip]
Again, consider the string translate() method.
Also, consider that some of those characters may have some meaning that
you perhaps shouldn''t blow away e.g. compare ''SMITH & WESSON'' with
''SMITH ET WESSON'' :-)


这篇关于编码问题(é和è)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆