编码问题(é和è) [英] encoding problems (é and è)
问题描述
我正在制作一个用于格式化字符串的程序,
或
我已添加:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
在我的剧本开头但是
>
str = str.replace(''''',''C'')
str = str.replace(''''',''E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E''' )
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E '')
不行,它让我并且,如果有人知道它可能会很棒,而不是通过E
重新
问候
Bussiere >
ps:我已经添加了整个脚本:
__________________________________________________ ________________________
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
导入文件输入,glob,字符串,sys,os,re
fichA = raw_input(" Entrez le nom du fichier d''entree:")
print(" \ n")
fichC = raw_input(" Entrez le nom du fichier de sortie:")
print(" \ n")
normalisation1 = raw_input (Normaliser les adresses 1(例如:Avenue->
AV)(O / N)ou A tour tout normaliser \ n)
normalisation1 = normalisation1 .upper()
if normalisation1!=" A":
print(" \ n")
normalisation2 = raw_in put(Normaliserlescivilités(例如:
Docteur-> DR)(O / N)\ n)
normalisation2 = normalisation2.upper()
print(" \ n")
normalisation3 = raw_input(" Normaliser les Adresses 2(例如:
Place-> PL)(O / N)\ n")
normalisation3 = normalisation3 .upper()
normalisation4 = raw_input(" Normaliser les caracteres / et - (例如:
/ - >)(O / N)\ n")
normalisation4 = normalisation4.upper()
如果normalisation1 ==" A":
normalisation1 =" O"
normalisation2 =" O"
normalisation3 =" O"
normalisation4 =" O"
fiA = open(fichA," r")
fiC = open(fichC," w")
compteur = 0
而1:
ligneA = fiA.readline()
if ligneA =="":
休息
if ligneA!="":
str = ligneA
str = str.replace('''',''A'')
str = str.replace(''b'',''B' ')
str = str.replace(''c'',''C'')
str = str.replace(''d'','' D'')
str = str.replace(''e'',''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'',''G'')
str = str.replace(''h' ',''H'')
str = str.replace(''我',''我')
str = str.replace('' j'',''J'')
str = str.replace(''k'',''K'')
str = str.replace( ''l'',''L'')
str = str.replace('''',''M'')
str = str。替换(''n'',''N'')
str = str.replace(''o'',''O'')
str = str.replace(''p'',''P'')
str = str.replace(''q'',''Q' )
str = str.replace(''r'',''R'')
str = str.replace('s'',''S '')
str = str.replace(''t'',''T'')
str = str.replace(''u'',' '你')
str = str.replace(''v'',''V'')
str = str.replace(''w'' ,''W'')
str = str.replace(''x'',''X'')
str = str.replace(''y '',''Y'')
str = str.replace(''z'',''Z'')
str = str .replace(''?'',''C'')
str = str.replace(''''',''C'')
str = str.replace(''''',''E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E'')
str = str.replace(''ê' ',''E'')
str = str.replace(''''',''E'')
str = str.replace('' ?'',''E'')
str = str.replace(''''',''A'')
str = str.replace( ''''',''A'')
str = str.replace(''à'',''A'')
str = str。替换(''à'',''A'')
str = str.replace(''á'',''A'')
str = str.replace(''?'',''A'')
str = str.replace(''''',''A'')
str = str.replace(''''',''A'')
str = str.replace('''',''A'')
str = str.replace(''''',''A'')
str = str.replace(''''',''我'')
str = str.replace(''''',''我')
str = str.replace(''''',''我')
str = str.replace(''''',''我')
str = str.replace(' ?'',''O'')
str = str.replace(''''',''O'')
str = str.replace( ''''',''O'')
str = str.replace(''''',''O'')
str = str。替换(''ú'',''U'')
str = str.replace('''','''')
str = str。替换('''','''')
str = str.replace('''','''')
如果normalisation1 ==" O":
str = str.replace(''AVENUE'',''AV'')
str = str.replace(''BOULEVARD '',''BD'')
str = str.replace(''FAUBOURG'',''FBG'')
str = str.replace(' 'GENERAL'',''GAL'')
str = str.replace(''COMMANDANT'',''CMDT'')
str = str.replace (''MARECHAL'',''MAL'')
str = str.replace(''PRESIDENT'',''PRDT'')
str = str .rep lace(''SAINT'',''ST'')
str = str.replace(''SAINTE'',''STE'')
str = str.replace(''LOTISSEMENT'',''很多')
str = str.replace(''RESIDENCE'',''RES'')
str = str.replace(''IMMEUBLE'',''IMM'')
str = str.replace(''IMEUBLE'',''IMM'')
str = str.replace(''BATIMENT'',''BAT'')
if normalisation2 ==" O":
str = str.replace(''MONSIEUR'',''M'')
str = str.replace(''MR'',''M'')
str = str.replace(''MADAME'',''MME'')
str = str.replace(''MADEMOISELLE'',''MLLE'')
str = str.replace(''DOCTEUR'',''DR'')
str = str.replace(''PROFESSEUR'',''PR'')
str = str.replace(''MONSEIGNEUR'',''MGR'')
str = str.repla ce(''M ME'',''MME'')
if normalisation3 ==" O":
str = str.replace(''PLACE' ',''PL'')
str = str.replace(''IMPASSE'',''IMP'')
str = str.replace('' ESPLANADE'',''ESP'')
str = str.replace(''ROND POINT'',''RPT'')
str = str.replace (''ROUTE'',''RTE'')
str = str.replace(''PASSAGE'',''PAS'')
str = str .replace(''SQUARE'',''SQ'')
str = str.replace(''ALLEE'',''ALL'')
str = str.replace(''ESCALIER'',''ESC'')
str = str.replace(''ETAGE'',''ETG'')
str = str.replace(''PORTE'',''PTE'')
str = str.replace(''APPARTEMENT'',''APT'')
str = str.replace(''APARTEMENT'',''APT'')
str = str.replace(''AVENUE' ,''AV'')
str = str.replace(''BOULEVARD'',''BD'')
str = str.replace(''ZONE D ACTIVITE'',''ZA'')
str = str.replace(''ZONE D ACTIVITEE'',''ZA'')
str = str .replace(''ZONE D AMENAGEMENT CONCERTE'',''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'',''ZAC'') >
str = str.replace(''ZONE INDUSTRELLE'',''ZI'')
str = str.replace(''CENTER COMMERCIAL'',''CCAL'' )
str = str.replace(''CENTER'',''CTRE'')
str = str.replace(''C.CIAL'',' 'CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL '','''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(' 'MARTYR'',''M'')
str = str.replace(' ANCIENS'',''AC'')
str = str.replace(''ANCIEN'',''AC'')
str = str.replace( ''REVEREND PERE'',''R P'')
if normalisation4 ==" O":
str = str.replace(' '; \''''','''')
str = str.replace(''\'''''''''''''''''$ $
str = str.replace(''\'''','''')
str = str.replace('' - '','''')
str = str.replace('','','''')
str = str.replace(''\\'','''') />
str = str.replace(''\ /'','''')
str = str.replace(''&'','''' )
str = str.replace(''%'','''')
str = str.replace(''*'','''' )
str = str.replace('''','''')
str = str.replace(''。'','''')
st r = str.replace(''_'','''')
str = str.replace('''','''')
str = str.replace('''','''')
str = str.replace(''?'','''')
str = str.replace(''%'','''')
str = str.replace(''|'','''')
str = str.replace('''','''')
str = str.replace('''''' ''')
str = str.replace('''','''')
fiC.write(str)
compteur + = 1
print compteur," \ n"
print" FINIT"
fiA.close()
fiC.close()
bussiere bussiere写道:嗨我做的一个用于格式化字符串的程序,
我添加了:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
>在开始我的剧本但
str = str.replace(''''',''C'')
...
不起作用它让我"并且,而不是通过E重新启动é
您确定您的脚本和您的输入文件*实际上是*编码的
utf-8 ?如果它没有按预期工作,它可能是latin-1,只是
就像你的帖子一样。尝试将编码更改为latin-1。它现在有效吗?
- Christoph
似乎对我来说很好。
x ="é?"
x = x.replace(''é'',''E'')
''E \ xc7''x = x.replace(''''',''C'')
x
''E \ xc7''x = x .replace(''?'',''C'')
x
''EC''
你也应该能够使用.upper()方法将
大写在单个语句中的字符串中的所有内容:
tstr = ligneA.upper()
注意:你不应该使用''str''作为变量,因为它将掩盖内置的str函数。
-Larry Bates
bussiere bussiere写道:我正在制作一个格式化字符串的程序,
或
我已经补充说:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
str = str.replace(''''',''C'')
str = str.replace(' '''',''E'')
str = str.replace('''',''E'')
str = str.replace(''è'',' 'E'')
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E'')
不起作用它让我并且,如果有人知道它可能会很棒,而不是通过E
重新
问候语Bussiere
ps:我已添加整个剧本如下:
__________________________________________________ ________________________
#!/ usr / bin / python
# - * - 编码:utf -8 - * -
导入fileinput,glob,string,sys,os,re
fichA = raw_input(" Entrez le nom du fichier d''entree:")
print(" \ n")
fichC = raw_input(" Entrez le nom du fichier de sortie:")
print(" \ n")
normalisation1 = raw_input(" Normaliser les adresses 1(例如:Avenue->
AV)(O / N)ou A tour tout normaliser \ n")
normalisation1 = normalisation1.upper()
如果normalisation1!=" A":
print(" \ n")
normalisation2 = raw_input(" Normaliserlescivilités(例如:
Docteur-) > DR)(O / N)\ n&q uot;)
normalisation2 = normalisation2.upper()
print(" \ n")
normalisation3 = raw_input(" Normaliser les Adresses 2(例如:
Place- > PL)(O / N)\ n)
normalisation3 = normalisation3.upper()
normalisation4 = raw_input(" Normaliser les caracteres / et - (例如:
/ - >)(O / N)\ n")
normalisation4 = normalisation4.upper()
如果normalisation1 ==A:
normalisation1 =" ;正常化2 =O正常化3 =O正常化4 =O
;)
fiC = open(fichC," w")
compteur = 0
而1:
ligneA = fiA .readline()
如果ligneA =="":
中断
如果ligneA!="":
str = ligneA
str = str.replace('''',''A'')
str = str.replace(''b'',''B'')
str = str.replace(''c'',''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'',''E'')
str = str.replace(''f'',''F'')
str = str.replace(''g'',''G'')
str = str.replace(''h'',''H'')
str = str.replace(''我',''我')
str = str.replace(''j'',''J'')
str = str.replace('' k'',''K'')
str = str.replace(''l'',''L'')
str = str.replace(''m'',''' M'')
str = str.replace(''n'',''N'')
str = str.replace(''o'',''O'')
str = str.replace(''p'',''P'')
str = str.replace(''q'',''Q'')
str = str。替换(''r'',''R'')
str = str.replace('s'',''S'')
str = str.replace(''t' ',''T ')
str = str.replace('''',''U'')
str = str.replace(''v'',''V'')
str = str.replace(''w'',''W'')
str = str.replace(''x'',''X'')
str = str.replace( ''y'',''Y'')
str = str.replace(''z'',''Z'')
str = str.replace('' ?'',''C'')
str = str.replace(''?'',''C'')
str = str.replace('''''''' E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''ê'',''E'')
str = str。替换(''ê'',''E'')
str = str.replace(''?'',''E'')
str = str.replace(''?' ',' E'')
str = str.replace(''''',''A'')
str = str.replace(''?'',''A'')
str = str.replace(''à'',''A'')
str = str.replace(''à'',''A'')
str = str。替换(''á'',''A'')
str = str.replace(''?'',''A'')
str = str.replace(''?' ',''A'')
str = str.replace(''''',''A'')
str = str.replace('''',''A' ')
str = str.replace(''''',''A'')
str = str.replace(''''',''我'')
str = str.replace(''?'',''我')
str = str.replace(''''',''我')
str = str.replace( ''''',''我'')
str = str.replace(''?'',''O'')
str = str.replace(''?'', '' O ')
str = str.replace(''?'',''O'')
str = str.replace(''''',''O'')
str = str.replace(''ú'',''U'')
str = str.replace('''','''')
str = str.replace('' '','''')
str = str.replace('''','''')
如果normalisation1 ==" O":
str = str.replace(''AVENUE'',''AV'')
str = str.replace(''BOULEVARD'',''BD'')
str = str.replace(' 'FAUBOURG'',''FBG'')
str = str.replace(''GENERAL'',''GAL'')
str = str.replace('''COMMANDANT'',' 'CMDT'')
str = str.replace(''MARECHAL'',''MAL'')
str = str.replace(''PRESIDENT'',''PRDT'')
str = str.repla ce(''SAINT'',''ST'')
str = str.replace(''SAINTE'',''STE'')
str = str.replace(''LOTISSEMENT' ',''很多'')
str = str.replace(''RESIDENCE'',''RES'')
str = str.replace(''IMMEUBLE'',''IMM' ')
str = str.replace(''IMEUBLE'',''IMM'')
str = str.replace(''BATIMENT'',''BAT'')
如果normalisation2 ==" O":
str = str.replace(''MONSIEUR'',''M'')
str = str.replace(''MR'' ,''M'')
str = str.replace(''MADAME'',''MME'')
str = str.replace(''MADEMOISELLE'',''MLLE'' )
str = str.replace(''DOCTEUR'',''DR'')
str = str.replace(''PROFESSEUR'',''PR'')
str = str.replace(''MONSEIGNEUR'',''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 ==" O":
str = str.replace(''PLACE'',''PL'')
str = str.replace(''IMPASSE'',''IMP' ')
str = str.replace(''ESPLANADE'',''ESP'')
str = str.replace(''ROND POINT'',''RPT'')
str = str.replace(''ROUTE'',''RTE'')
str = str.replace(''PASSAGE'',''PAS'')
str = str.replace (''SQUARE'',''SQ'')
str = str.replace(''ALLEE'',''ALL'')
str = str.replace(''ESCALIER'' ,''ESC'')
str = str.replace(''ETAGE'',''ETG'')
str = str.replace(''PORTE'',''PTE'' )<无线电通信/> str = str.replace(''APPARTEMENT'',''APT'')
str = str.replace(''APARTEMENT'',''APT'')
str = str。替换(''AVENUE'',''AV'')
str = str.replace(''BOULEVARD'',''BD'')
str = str.replace(''ZONE D ACTIVITE'',''ZA'')
str = str.replace(''ZONE D ACTIVITEE'',''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE '','''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'',''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'' ,''ZI'')
str = str.replace(''CENTER COMMERCIAL'',''CCAL'')
str = str.replace(''CENTER'',''CTRE' ')
str = str.replace(''C''','''CCAL'')
str = str.replac e(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace('' GALERIE'',''GAL'')
str = str.replace(''MARTYR'',''M'')
str = str.replace(''ANCIENS'','' AC'')
str = str.replace(''ANCIEN'',''AC'')
str = str.replace(''REVEREND PERE'',''R P'')
如果normalisation4 ==" O":
str = str.replace(''; \"'','''')
str = str。替换(''\','','''')
str = str.replace(''\'''','''')
str = str.replace( '' - '','''')
str = str.replace('','','''')
str = str.replace(''\\''' ,'''')
str = str.replace(''\ /'','''')
str = str.replace(''&'','''')
str = str.replace (''%'','''')
str = str.replace(''*'','''')
str = str.replace('''',''' '')
str = str.replace(''。'','''')
str = str.replace(''_'','''')
str = str.replace('''','''')
str = str.replace('''','''')
str = str.replace('''''' ,'''')
str = str.replace(''%'','''')
str = str.replace(''|'','''') />
str = str.replace('''','''')
str = str.replace('''''' ''')
str = str.replace('''',' '')
fiC.write(str)
compteur + = 1
print compteur," \ n"
print" FINIT"
fiA.close()
fiC.close()
On 23/03/2006 10:07 PM,bussiere bussiere写道:< blockquote class =post_quotes>嗨我正在制作一个格式化字符串的程序,
或
我添加了:
#!/ usr / bin / python
# - * - 编码:utf-8 - * -
在我的剧本开头但是
str = str.replace(''?'',''C '')
str = str.replace(''''',''E'')
str = str.replace(''''',''E'')
str = str.replace(''è'',''E'')
str = str.replace(''è'',''E'')
str = str.replace (''ê'',''E'')
不起作用它让我并且,如果有人知道它可能很棒,而不是通过E
重新$
我在下面添加了一些评论......我希望他们有所帮助。
干杯,
John
问候语
Bussiere
ps:我添加了整个剧本下:
__________________________________________________ ________________________
[snip]
如果ligneA!="":
str = ligneA
str = str.replace(''a '',''A'')
[snip] str = str.replace(''z'',''Z'')
str = str.replace('' ?'',''C'')
str = str.replace(''?'',''C'')
str = str.replace('''''''' E'')
str = str.replace('''',''E'')
str = str.replace(''è'',''E'')
[snip] str = str.replace(''ú'',''U'')
你可以更换使用带有合适表格的字符串translate()方法,一次性移除所有这些升级和重音,
。
str = str.replace('''', '''''
str = str.replace('''','''')
str = str.replace('''','''')
用于规范化空格的标准Python习语是
strg =''''。join(strg.split())
strg =''ALLO BUSSIERE\tCA VA? ''
strg.split()
[''ALLO'',''BUSSIERE'',''CA'',''VA?'''''''。join(strg。 split())
''ALLO BUSSIERE CA VA?''
[snip] if normalisation2 ==" O":
str = str.replace (''MONSIEUR'',''M'')
str = str.replace(''MR'',''M'')
你需要这种方法要非常小心。您正在更改MR的每一次发生在字符串中,而不仅仅是整个单词word
意思是Monsieur。
可能出错的复制示例:strg =''MR IMRE NAGY,123 PRIMROSE STREET,SHAMROCK VALLEY''
strg.replace(''MR'',''M'')
''M IME NAGY,123 PRIMOSE STREET,SHAMOCK VALLEY''
一个真实的,非构造的历史课:某个数据库通过注释指示
重复记录" DUP"在姓氏领域
,例如SMITH DUP。幸运的是,在测试中检测到
所谓的清理导致DUPLESSIS成为PLESSIS和DUPRAT以使
成为RAT!
这里有两点:(1)将你的字符串分成单词或者令牌。
使用strg.split()是一个开始,但你可能需要更多的东西
复杂的,例如" - "作为额外的标记分隔符。 (2)不要写出所有这些代码行,而是考虑将这些
替换放在字典中:
title_substitution = {
''MONSIEUR'':''M'',
''MR'':''M'',
''MADAME'':''MME'',
#etc
}
下一级改进是从文件。
[snip]
如果normalisation4 ==" O":
str = str.replace(''; \"'','''' )
str = str.replace(''\'"'','''')
str = str.replace(''\'''','''')
str = str.replace('' - '','''')
str = str.replace('','','''')
str = str.replace (''\\'','''')
str = str.replace(''\ /'','''')
str = str.replace(''&'','''')
[snip]
再次考虑字符串translate()方法。
另外,考虑到其中一些角色可能有一些含义
你也许不应该被吹走,例如比较''SMITH& WESSON''
''SMITH ET WESSON'':-)
hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-
in the begining of my script but
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great
regards
Bussiere
ps : i''ve added the whole script under :
__________________________________________________ ________________________
#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re
fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()
if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()
if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0
while 1:
ligneA=fiA.readline()
if ligneA == "":
break
if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')
str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')
if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')
if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()
bussiere bussiere wrote:hi i''am making a program for formatting string,
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-
in the begining of my script but
str = str.replace(''?'', ''C'')
...
doesn''t work it put me " and , instead of remplacing é by E
Are your sure your script and your input file *is* actually encoded with
utf-8? If it does not work as expected, it is probably latin-1, just
like your posting. Try changing the coding to latin-1. Does it work now?
-- Christoph
Seems to work fine for me.
x="é?"
x=x.replace(''é'',''E'') ''E\xc7'' x=x.replace(''?'',''C'')
x ''E\xc7'' x=x.replace(''?'',''C'')
x''EC''
You should also be able to use .upper() method to
uppercase everything in the string in a single statement:
tstr=ligneA.upper()
Note: you should never use ''str'' as a variable as
it will mask the built-in str function.
-Larry Bates
bussiere bussiere wrote: hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-
in the begining of my script but
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great
regards
Bussiere
ps : i''ve added the whole script under :
__________________________________________________ ________________________
#!/usr/bin/python
# -*- coding: utf-8 -*-
import fileinput, glob, string, sys, os, re
fichA=raw_input("Entrez le nom du fichier d''entree : ")
print ("\n")
fichC=raw_input("Entrez le nom du fichier de sortie : ")
print ("\n")
normalisation1 = raw_input("Normaliser les adresses 1 (ex : Avenue->
AV) (O/N) ou A pour tout normaliser \n")
normalisation1 = normalisation1.upper()
if normalisation1 != "A":
print ("\n")
normalisation2 = raw_input("Normaliser les civilités (ex :
Docteur-> DR) (O/N) \n")
normalisation2 = normalisation2.upper()
print ("\n")
normalisation3 = raw_input("Normaliser les Adresses 2 (ex :
Place-> PL) (O/N) \n")
normalisation3 = normalisation3.upper()
normalisation4 = raw_input("Normaliser les caracteres / et - (ex :
/ -> ) (O/N) \n" )
normalisation4 = normalisation4.upper()
if normalisation1 == "A":
normalisation1 = "O"
normalisation2 = "O"
normalisation3 = "O"
normalisation4 = "O"
fiA=open(fichA,"r")
fiC=open(fichC,"w")
compteur = 0
while 1:
ligneA=fiA.readline()
if ligneA == "":
break
if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'')
str = str.replace(''b'', ''B'')
str = str.replace(''c'', ''C'')
str = str.replace(''d'', ''D'')
str = str.replace(''e'', ''E'')
str = str.replace(''f'', ''F'')
str = str.replace(''g'', ''G'')
str = str.replace(''h'', ''H'')
str = str.replace(''i'', ''I'')
str = str.replace(''j'', ''J'')
str = str.replace(''k'', ''K'')
str = str.replace(''l'', ''L'')
str = str.replace(''m'', ''M'')
str = str.replace(''n'', ''N'')
str = str.replace(''o'', ''O'')
str = str.replace(''p'', ''P'')
str = str.replace(''q'', ''Q'')
str = str.replace(''r'', ''R'')
str = str.replace(''s'', ''S'')
str = str.replace(''t'', ''T'')
str = str.replace(''u'', ''U'')
str = str.replace(''v'', ''V'')
str = str.replace(''w'', ''W'')
str = str.replace(''x'', ''X'')
str = str.replace(''y'', ''Y'')
str = str.replace(''z'', ''Z'')
str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''ê'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''E'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''à'', ''A'')
str = str.replace(''á'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''a'', ''A'')
str = str.replace(''?'', ''A'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''I'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''?'', ''O'')
str = str.replace(''ú'',''U'')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
if normalisation1 == "O":
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''FAUBOURG'', ''FBG'')
str = str.replace(''GENERAL'', ''GAL'')
str = str.replace(''COMMANDANT'', ''CMDT'')
str = str.replace(''MARECHAL'', ''MAL'')
str = str.replace(''PRESIDENT'', ''PRDT'')
str = str.replace(''SAINT'', ''ST'')
str = str.replace(''SAINTE'', ''STE'')
str = str.replace(''LOTISSEMENT'', ''LOT'')
str = str.replace(''RESIDENCE'', ''RES'')
str = str.replace(''IMMEUBLE'', ''IMM'')
str = str.replace(''IMEUBLE'', ''IMM'')
str = str.replace(''BATIMENT'', ''BAT'')
if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
str = str.replace(''MADAME'', ''MME'')
str = str.replace(''MADEMOISELLE'', ''MLLE'')
str = str.replace(''DOCTEUR'', ''DR'')
str = str.replace(''PROFESSEUR'', ''PR'')
str = str.replace(''MONSEIGNEUR'', ''MGR'')
str = str.replace(''M ME'',''MME'')
if normalisation3 == "O":
str = str.replace(''PLACE'', ''PL'')
str = str.replace(''IMPASSE'', ''IMP'')
str = str.replace(''ESPLANADE'', ''ESP'')
str = str.replace(''ROND POINT'', ''RPT'')
str = str.replace(''ROUTE'', ''RTE'')
str = str.replace(''PASSAGE'', ''PAS'')
str = str.replace(''SQUARE'', ''SQ'')
str = str.replace(''ALLEE'', ''ALL'')
str = str.replace(''ESCALIER'', ''ESC'')
str = str.replace(''ETAGE'', ''ETG'')
str = str.replace(''PORTE'', ''PTE'')
str = str.replace(''APPARTEMENT'', ''APT'')
str = str.replace(''APARTEMENT'', ''APT'')
str = str.replace(''AVENUE'', ''AV'')
str = str.replace(''BOULEVARD'', ''BD'')
str = str.replace(''ZONE D ACTIVITE'', ''ZA'')
str = str.replace(''ZONE D ACTIVITEE'', ''ZA'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTE'', ''ZAC'')
str = str.replace(''ZONE D AMENAGEMENT CONCERTEE'', ''ZAC'')
str = str.replace(''ZONE INDUSTRELLE'', ''ZI'')
str = str.replace(''CENTRE COMMERCIAL'', ''CCAL'')
str = str.replace(''CENTRE'', ''CTRE'')
str = str.replace(''C.CIAL'',''CCAL'')
str = str.replace(''CTRE CIAL'',''CCAL'')
str = str.replace(''CTRE CCAL'',''CCAL'')
str = str.replace(''GALERIE'',''GAL'')
str = str.replace(''MARTYR'', ''M'')
str = str.replace(''ANCIENS'', ''AC'')
str = str.replace(''ANCIEN'', ''AC'')
str = str.replace(''REVEREND PERE'',''R P'')
if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''*'', '' '')
str = str.replace('' '', '' '')
str = str.replace(''.'', '' '')
str = str.replace(''_'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace(''?'', '' '')
str = str.replace(''%'', '' '')
str = str.replace(''|'', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
fiC.write(str)
compteur += 1
print compteur, "\n"
print "FINIT"
fiA.close()
fiC.close()
On 23/03/2006 10:07 PM, bussiere bussiere wrote:hi i''am making a program for formatting string,
or
i''ve added :
#!/usr/bin/python
# -*- coding: utf-8 -*-
in the begining of my script but
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''è'', ''E'')
str = str.replace(''ê'', ''E'')
doesn''t work it put me " and , instead of remplacing é by E
if someone have an idea it could be great
Hi, I''ve added some comments below ... I hope they help.
Cheers,
John
regards
Bussiere
ps : i''ve added the whole script under :
__________________________________________________ ________________________ [snip]
if ligneA != "":
str = ligneA
str = str.replace(''a'', ''A'') [snip] str = str.replace(''z'', ''Z'')
str = str.replace(''?'', ''C'')
str = str.replace(''?'', ''C'')
str = str.replace(''é'', ''E'')
str = str.replace(''é'', ''E'')
str = str.replace(''è'', ''E'') [snip] str = str.replace(''ú'',''U'')
You can replace ALL of this upshifting and accent removal in one blow by
using the string translate() method with a suitable table.
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
str = str.replace('' '', '' '')
The standard Python idiom for normalising whitespace is
strg = '' ''.join(strg.split())
strg = '' ALLO BUSSIERE\tCA VA? ''
strg.split() [''ALLO'', ''BUSSIERE'', ''CA'', ''VA?''] '' ''.join(strg.split()) ''ALLO BUSSIERE CA VA?''
[snip] if normalisation2 == "O":
str = str.replace(''MONSIEUR'', ''M'')
str = str.replace(''MR'', ''M'')
You need to be very careful with this approach. You are changing EVERY
occurrence of "MR" in the string, not just where it is a whole "word"
meaning "Monsieur".
Copnstructed example of what can go wrong: strg = ''MR IMRE NAGY, 123 PRIMROSE STREET, SHAMROCK VALLEY''
strg.replace(''MR'', ''M'') ''M IME NAGY, 123 PRIMOSE STREET, SHAMOCK VALLEY''
A real, non-constructed history lesson: A certain database indicated
duplicate records by having the annotation "DUP" in the surname field
e.g. "SMITH DUP". Fortunately it was detected in testing that the
so-called clean-up was causing DUPLESSIS to become PLESSIS and DUPRAT to
become RAT!
Two points here: (1) Split up your strings into "words" or "tokens".
Using strg.split() is a start but you may need something more
sophisticated e.g. "-" as an additional token separator. (2) Instead of
writing out all those lines of code, consider putting those
substitutions in a dictionary:
title_substitution = {
''MONSIEUR'': ''M'',
''MR'': ''M'',
''MADAME'': ''MME'',
# etc
}
Next level of improvement is to read that stuff from a file.
[snip]
if normalisation4 == "O":
str = str.replace('';\"'', '' '')
str = str.replace(''\"'', '' '')
str = str.replace(''\'''', '' '')
str = str.replace(''-'', '' '')
str = str.replace('','', '' '')
str = str.replace(''\\'', '' '')
str = str.replace(''\/'', '' '')
str = str.replace(''&'', '' '')
[snip]
Again, consider the string translate() method.
Also, consider that some of those characters may have some meaning that
you perhaps shouldn''t blow away e.g. compare ''SMITH & WESSON'' with
''SMITH ET WESSON'' :-)
这篇关于编码问题(é和è)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!