带有Python的Vcard解析器 [英] Vcard parser with Python

查看:178
本文介绍了带有Python的Vcard解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析我的vcard信息(复制到txt文件)以提取name:number并将其放入字典中.

I am parsing my vcard info (copied to a txt file)to extract name:number and put it into a dictionary.

数据样本:


BEGIN:VCARD  
VERSION:2.1  
N:MEO;Apoio;;;  
FN:Apoio MEO  
TEL;CELL;PREF:1696  
TEL;CELL:162 00  
END:VCARD  
BEGIN:VCARD  
VERSION:2.1  
N:estrangeiro;Apoio MEO;no;;  
FN:Apoio MEO no estrangeiro  
TEL;CELL;PREF:+35196169000  
END:VCARD  

import re
file = open('Contacts.txt', 'r')
contacts = dict()

    for line in file:
            name = re.findall('FN:(.*)', line)
            nm = ''.join(name)

            if len(nm) == 0:
                continue
            contacts[nm] = contacts.get(nm)
    print(contacts)

有了这个,我得到了一个带有名字的字典,但是对于数字却得到了None. {'name': None, 'name': None}.

With this I am getting a dictionary with names but for numbers I am getting None. {'name': None, 'name': None}.

我可以用re做吗?要使用相同的re.findall表达式提取姓名和电话号码?

Can I do this with re? To extract both name and number with the same re.findall expression?

推荐答案

您最好使用已经现有的库,而不是尝试重新发明轮子:

You should better use an already existing library instead of trying to reinvent the wheel:

pip install vobject

然后在python中

>>> import vobject
>>> s = """\
... BEGIN:VCARD
... VERSION:2.1
... N:MEO;Apoio;;;
... FN:Apoio MEO
... TEL;CELL;PREF:0123456789
... TEL;CELL:0123456768
... END:VCARD
... BEGIN:VCARD
... VERSION:2.1
... N:estrangeiro;Apoio MEO;no;;
... FN:Apoio MEO no estrangeiro
... TEL;CELL;PREF:+0123456789
... END:VCARD """
>>> vcard = vobject.readOne(s)
>>> vcard.prettyPrint()
 VCARD
    VERSION: 2.1
    TEL: 1696
    TEL: 162 00
    FN: Apoio MEO
    N:  Apoio  MEO 

您就完成了!

因此,如果您要制作一本字典,您所需要做的就是:

so if you want to make a dictionary out of that, all you need to do is:

>>> {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
{'Apoio MEO': ['1696', '162 00']}

所以您可以将所有这些都变成一个函数:

so you could make all that into a function:

def parse_vcard(path):
    with open(path, 'r') as f:
        vcard = vobject.readOne(f.read())
        return {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }

从那里,您可以改进代码以在单个vobject文件中处理多个vcard,并使用更多手机更新dict.

From there, you can improve the code to handle multiple vcards in a single vobject file, and update the dict with more phones.

N.B .:我的练习是将上面的代码从读取文件中的一个vcard更改为可以读取多个vcard的代码.提示:阅读vobject 的文档.

N.B.: I leave you as an exercise to change the code above from reading one and only one vcard within a file, into a code that can read several vcards. Hint: read the documentation of vobject.

N.B .:我正在使用您的数据,并且我认为无论您写什么,这都是毫无意义的.但是有疑问,我已经修改了电话号码.

N.B.: I'm using your data, and I'm considering that whatever you wrote, it is meaningless. But in doubt, I have modified the phone numbers.

只是为了好玩,让我们看一下您的代码.首先有一个缩进问题,但我认为这是由于复制/粘贴错误bad造成的.

just for the fun, let's have a look at your code. First there's an indentation issue, but I'll consider this is because of bad copy/paste ☺.

① import re
② file = open('Contacts.txt', 'r')
③ contacts = dict()

④ for line in file:
⑤     name = re.findall('FN:(.*)', line)
⑥     nm = ''.join(name)

⑦     if len(nm) == 0:
⑧         continue
⑨     contacts[nm] = contacts.get(nm)

⑩ print(contacts)

因此,首先,在第②行有两个问题.您正在使用open()打开文件,但没有关闭文件.如果要调用此函数来打开十亿个文件,则由于没有关闭文件,您将饿死系统的可用文件描述符.作为一个好习惯,您应该始终使用with构造:

so first, there are two issues at line ②. You're opening a file using open(), but you're not closing the file. If you're calling this function to open one billion files, you'll starve your system's available file descriptors because you're not closing the files. As a good habit you should always use instead the with construct:

with open('...', '...') as f:
    … your code here …

为您处理fd,并更好地显示可以在何处使用打开的文件.

that takes care of the fd for you, and better shows where you can make use of your opened file.

第二个问题是您正在调用变量file,该变量掩盖了file类型.希望file类型很少使用,但这是一个坏习惯,因为有一天您可能不了解发生的错误,因为您已经用变量遮盖了类型.只是不使用它,它将为您节省一天的麻烦.

The second issue is that you're calling your variable file, which is shadowing the file type. Hopefully, the file type is very rarely used, but it's a bad habit to have, as you might one day not understand a bug that happens because you've shadowed a type with a variable. Just don't use it, it'll save you trouble one day.

第⑤行和第⑥行,您将在每行上应用re.findall正则表达式.您最好使用re.match(),因为您已经在每行上进行了迭代,并且在该行中没有FN: something.这样可以避免不必要的''.join(name),但是最好不要使用regex这样简单的事情,而最好使用str.split():

Line ⑤ and ⑥, you're applying a re.findall regex on each line. You should better use re.match(), as you're already iterating over each line, and you won't have FN: something within that line. That will make you avoid the unnecessary ''.join(name) But instead of using a regex for such a simple thing, you'd better use str.split():

if 'FN:' in line:
    name = line.split(':')[-1]

第Line行不仅是多余的-如果您使用上面的if,那么实际上是错误的.因为那样的话,您将跳过其中没有FN:的所有行,这意味着您将永远不会提取电话号码,而只会提取姓名.

Line ⑦ is not only superfluous — if you use the if above, but actually wrong. Because then you'll skip all lines that does not have FN: within it, meaning that you'll never extract the phone numbers, just the name.

最后,Line行绝对没有意义.基本上,您正在做的事情等同于:

Finally Line ⑧ makes absolutely no sense. Basically, what you're doing is equivalent of:

if nm in contacts.keys():
    contacts[nm] = contacts[nm]
else:
    contacts[nm] = None

总而言之,在您的代码中,您要做的只是提取名称,而您甚至不必理会电话号码.所以当你说:

All in all, in your code, all you do is extract names, and you don't even bother with the telephones number. So when you say:

通过这个操作,我得到的是带有名称的字典,但是对于数字,我却没有输入.

With this I am getting a dictionary with names but for numbers I am getting None

这没有任何意义,因为您实际上并不是在尝试提取电话号码.

it makes no sense, as you're actually not trying to extract phone numbers.

我可以用re做这个吗?要使用相同的re.findall表达式提取姓名和电话号码?

Can I do this with re? To extract both name and number with the same re.findall expression?

是的,您可以在整个文件中,或者至少对于每个vcard,使用看起来像(未调试的正则表达式很可能无法正常工作)的东西:

yes, you could, with something that would look like (untested regex that's very likely to be not working), over the whole file, or at least for each vcard:

FN:(?P<name>[^\n]*).*TEL[^:]*:(?P<phone>[^\n])

但是,当您拥有一个可以完全为您完成的lib时,何必麻烦呢!

but why bother, when you've got a lib that does it perfectly for you!

这篇关于带有Python的Vcard解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆