带有Python的Vcard解析器 [英] Vcard parser with Python
问题描述
我正在解析我的vcard信息(复制到txt文件)以提取name:number
并将其放入字典中.
I am parsing my vcard info (copied to a txt file)to extract name:number
and put it into a dictionary.
数据样本:
BEGIN:VCARD
VERSION:2.1
N:MEO;Apoio;;;
FN:Apoio MEO
TEL;CELL;PREF:1696
TEL;CELL:162 00
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:estrangeiro;Apoio MEO;no;;
FN:Apoio MEO no estrangeiro
TEL;CELL;PREF:+35196169000
END:VCARD
import re
file = open('Contacts.txt', 'r')
contacts = dict()
for line in file:
name = re.findall('FN:(.*)', line)
nm = ''.join(name)
if len(nm) == 0:
continue
contacts[nm] = contacts.get(nm)
print(contacts)
有了这个,我得到了一个带有名字的字典,但是对于数字却得到了None. {'name': None, 'name': None}
.
With this I am getting a dictionary with names but for numbers I am getting None. {'name': None, 'name': None}
.
我可以用re做吗?要使用相同的re.findall
表达式提取姓名和电话号码?
Can I do this with re? To extract both name and number with the same re.findall
expression?
推荐答案
您最好使用已经现有的库,而不是尝试重新发明轮子:
You should better use an already existing library instead of trying to reinvent the wheel:
pip install vobject
然后在python中
>>> import vobject
>>> s = """\
... BEGIN:VCARD
... VERSION:2.1
... N:MEO;Apoio;;;
... FN:Apoio MEO
... TEL;CELL;PREF:0123456789
... TEL;CELL:0123456768
... END:VCARD
... BEGIN:VCARD
... VERSION:2.1
... N:estrangeiro;Apoio MEO;no;;
... FN:Apoio MEO no estrangeiro
... TEL;CELL;PREF:+0123456789
... END:VCARD """
>>> vcard = vobject.readOne(s)
>>> vcard.prettyPrint()
VCARD
VERSION: 2.1
TEL: 1696
TEL: 162 00
FN: Apoio MEO
N: Apoio MEO
您就完成了!
因此,如果您要制作一本字典,您所需要做的就是:
so if you want to make a dictionary out of that, all you need to do is:
>>> {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
{'Apoio MEO': ['1696', '162 00']}
所以您可以将所有这些都变成一个函数:
so you could make all that into a function:
def parse_vcard(path):
with open(path, 'r') as f:
vcard = vobject.readOne(f.read())
return {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
从那里,您可以改进代码以在单个vobject
文件中处理多个vcard
,并使用更多手机更新dict
.
From there, you can improve the code to handle multiple vcard
s in a single vobject
file, and update the dict
with more phones.
N.B .:我的练习是将上面的代码从读取文件中的一个vcard更改为可以读取多个vcard的代码.提示:阅读vobject
的文档.
N.B.: I leave you as an exercise to change the code above from reading one and only one vcard within a file, into a code that can read several vcards. Hint: read the documentation of vobject
.
N.B .:我正在使用您的数据,并且我认为无论您写什么,这都是毫无意义的.但是有疑问,我已经修改了电话号码.
N.B.: I'm using your data, and I'm considering that whatever you wrote, it is meaningless. But in doubt, I have modified the phone numbers.
只是为了好玩,让我们看一下您的代码.首先有一个缩进问题,但我认为这是由于复制/粘贴错误bad造成的.
just for the fun, let's have a look at your code. First there's an indentation issue, but I'll consider this is because of bad copy/paste ☺.
① import re
② file = open('Contacts.txt', 'r')
③ contacts = dict()
④ for line in file:
⑤ name = re.findall('FN:(.*)', line)
⑥ nm = ''.join(name)
⑦ if len(nm) == 0:
⑧ continue
⑨ contacts[nm] = contacts.get(nm)
⑩ print(contacts)
因此,首先,在第②行有两个问题.您正在使用open()
打开文件,但没有关闭文件.如果要调用此函数来打开十亿个文件,则由于没有关闭文件,您将饿死系统的可用文件描述符.作为一个好习惯,您应该始终使用with构造:
so first, there are two issues at line ②. You're opening a file using open()
, but you're not closing the file. If you're calling this function to open one billion files, you'll starve your system's available file descriptors because you're not closing the files. As a good habit you should always use instead the with construct:
with open('...', '...') as f:
… your code here …
为您处理fd,并更好地显示可以在何处使用打开的文件.
that takes care of the fd for you, and better shows where you can make use of your opened file.
第二个问题是您正在调用变量file
,该变量掩盖了file
类型.希望file
类型很少使用,但这是一个坏习惯,因为有一天您可能不了解发生的错误,因为您已经用变量遮盖了类型.只是不使用它,它将为您节省一天的麻烦.
The second issue is that you're calling your variable file
, which is shadowing the file
type. Hopefully, the file
type is very rarely used, but it's a bad habit to have, as you might one day not understand a bug that happens because you've shadowed a type with a variable. Just don't use it, it'll save you trouble one day.
第⑤行和第⑥行,您将在每行上应用re.findall
正则表达式.您最好使用re.match()
,因为您已经在每行上进行了迭代,并且在该行中没有FN: something
.这样可以避免不必要的''.join(name)
,但是最好不要使用regex这样简单的事情,而最好使用str.split()
:
Line ⑤ and ⑥, you're applying a re.findall
regex on each line. You should better use re.match()
, as you're already iterating over each line, and you won't have FN: something
within that line. That will make you avoid the unnecessary ''.join(name)
But instead of using a regex for such a simple thing, you'd better use str.split()
:
if 'FN:' in line:
name = line.split(':')[-1]
第Line行不仅是多余的-如果您使用上面的if
,那么实际上是错误的.因为那样的话,您将跳过其中没有FN:
的所有行,这意味着您将永远不会提取电话号码,而只会提取姓名.
Line ⑦ is not only superfluous — if you use the if
above, but actually wrong. Because then you'll skip all lines that does not have FN:
within it, meaning that you'll never extract the phone numbers, just the name.
最后,Line行绝对没有意义.基本上,您正在做的事情等同于:
Finally Line ⑧ makes absolutely no sense. Basically, what you're doing is equivalent of:
if nm in contacts.keys():
contacts[nm] = contacts[nm]
else:
contacts[nm] = None
总而言之,在您的代码中,您要做的只是提取名称,而您甚至不必理会电话号码.所以当你说:
All in all, in your code, all you do is extract names, and you don't even bother with the telephones number. So when you say:
通过这个操作,我得到的是带有名称的字典,但是对于数字,我却没有输入.
With this I am getting a dictionary with names but for numbers I am getting None
这没有任何意义,因为您实际上并不是在尝试提取电话号码.
it makes no sense, as you're actually not trying to extract phone numbers.
我可以用re做这个吗?要使用相同的
re.findall
表达式提取姓名和电话号码?
Can I do this with re? To extract both name and number with the same
re.findall
expression?
是的,您可以在整个文件中,或者至少对于每个vcard,使用看起来像(未调试的正则表达式很可能无法正常工作)的东西:
yes, you could, with something that would look like (untested regex that's very likely to be not working), over the whole file, or at least for each vcard:
FN:(?P<name>[^\n]*).*TEL[^:]*:(?P<phone>[^\n])
但是,当您拥有一个可以完全为您完成的lib时,何必麻烦呢!
but why bother, when you've got a lib that does it perfectly for you!
这篇关于带有Python的Vcard解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!