检查嵌套字典中的成员资格 [英] Checking for membership inside nested dict

查看:124
本文介绍了检查嵌套字典中的成员资格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对这一个的后续问题:





原来我很傻,使用错误的ID字段



我在这里使用Python 3.x,btw。



我有一个员工,由字符串directory_id索引。每个值都是一个带有员工属性(电话号码,姓氏等)的嵌套字典。这些值之一是辅助ID,例如internal_id,另一个是其管理员,称为manager_internal_id。 internal_id字段是非强制性的,并不是每个员工都有一个。

  {'6443410501':{'manager_internal_id' :'989634','givenName':'Mary','phoneNumber':'+65 3434 3434','sn':'Jones','internal_id':'434214'} 
'8117062158' manager_internal_id':'180682','givenName':'John','phoneNumber':'+65 3434 3434','sn':'Ashmore','internal_id':''}
'9227629067' 'manager_internal_id':'347394','givenName':'Wright','phoneNumber':'+65 3434 3434','sn':'Earl','internal_id':'257839'}
'1724696976' :{'manager_internal_id':'907239','givenName':'Jane','phoneNumber':'+65 3434 3434','sn':'Bronte','internal_id':'629067'}

}



(我简化了一些字段,这两个都使它更容易阅读,也为privac y /合规性原因)。



这里的问题是我们通过其directory_id对每个员工进行索引(关键),但是当我们查找他们的经理时,我们需要通过他们的internal_id。



之前,当我们的dict使用internal_id作为关键字时,employee.keys()是internal_ids的列表,我正在使用成员资格检查。现在,我的if语句的最后一部分将不起作用,因为internal_ids是dict值的一部分,而不是键本身。

  def lookup_supervisor(manager_internal_id,employees):
如果manager_internal_id不为None,manager_internal_id!=和manager_internal_id在employees.keys()中:
return(employees [manager_internal_id] ['mail' ],employee [manager_internal_id] ['givenName'],employees [manager_internal_id] ['sn'])
else:
return('Supervisor Not Found','Supervisor Not Found','Supervisor Not Found ')

所以第一个问题是,如何修复if语句来检查manager_internal_id是否存在于dict的internal_ids列表中?



我已经尝试将employee.keys()替换为employee.values(),这不起作用。此外,我希望有一些更高效的东西,不知道是否有办法获得一个值的子集,特别是员工的所有条目[directory_id] ['internal_id']。



希望有一些Pythonic的方式来做到这一点,没有使用大量的嵌套for / if循环。



我的第二个问题是那么我干净地返回所需的员工属性(mail,givenname,surname等)。我的for循环遍历每个员工,并调用lookup_supervisor。我在这里感觉有点笨拙/ stumped。

  def tidy_data(employees):
for directory_id,data in employees.items():
#我们真的不应该像这样传递员工 - 呃,类?
data ['SupervisorEmail'],data ['SupervisorFirstName'],data ['SupervisorSurname'] = lookup_supervisor(data ['manager_internal_id'],employees)

我应该重新设计我的数据结构吗?还有另一种方法?



编辑:我稍微调整了代码,见下文:

  class Employees:

def import_gd_dump(self,input_file =test.csv):
gd_extract = csv.DictReader open(input_file),dialect ='excel')
self.employees = {row ['directory_id']:row in gd_extract}

def write_gd_formatted(self,output_file =gd_formatted .csv):
gd_output_fieldnames =('internal_id','mail','givenName','sn','dbcostcenter','directory_id','manager_internal_id','PHFull','PHFull_message','SupervisorEmail ','SupervisorFirstName','SupervisorSurname')
try:
gd_formatted = csv.DictWriter(open(output_file,'w',newline =''),fieldnames = gd_output_fieldnames,extrasaction ='ignore' dialect ='excel')
除了IOError:
print('无法打开文件,IO错误(是否锁定?)
sys.exit(1)

headers = {n:n for g in gd_output_fieldnames}
gd_formatted.writerow(headers)
for internal_id,self.employees.items()中的数据:
gd_formatted.writerow(data)

def tidy_data(self):
for directory_id,data in self。 employees.items():
data ['PHFull'],data ['PHFull_message'] = self.clean_phone_number(data ['telephoneNumber'])
data ['SupervisorEmail'],data ['SupervisorFirstName '],data ['SupervisorSurname'] = self.lookup_supervisor(data ['manager_internal_id'])

def clean_phone_number(self,original_telephone_number):
standard_format = re.compile(r'^ \ +(?P< intl_prefix> \d {2})\((P< AREA_CODE> \d)\?)(P< local_first_half> \d {4}?) - (P<?; local_second_half> \d {4})')
extra_zero = re.compile(r'^ \ +(?P missing_hyphen = re.compile(r'^ \ +(?P< intl_prefix> \d {2})\(0(?P< area_code> \d)\ )(?P< local_first_half> \d {4})(?P< local_second_half> \d {4})')
如果standard_format.search(original_telephone_number):
result = standard_format.search (original_telephone_number)
return'0'+ result.group('area_code')+ result.group('local_first_half')+ result.group('local_second_half')''
elif extra_zero.search original_telephone_number)
result = extra_zero.search(original_telephone_number)
return'0'+ result.group('area_code')+ result.group('local_first_half')+ result.group('local_second_half') ,'区域代码额外零 - 请求用户修复。 '
elif missing_hyphen.search(original_telephone_number):
result = missing_hyphen.search(original_telephone_number)
return'0'+ result.group('area_code')+ result.group('local_first_half' )+ result.group('local_second_half'),'本地组件中缺少连字符 - 要求用户进行修复。 '
else:
return'',Number does not match format。原文为:+ original_telephone_number

def lookup_supervisor(self,manager_internal_id):
如果manager_internal_id不是None,manager_internal_id!=:#和manager_internal_id在self.employees.values()中:
return(employees [manager_internal_id] ['mail'],employees [manager_internal_id] ['givenName'],雇员[manager_internal_id] ['sn'])
else:
return('Supervisor Not Found','Supervisor Not Found','Supervisor Not Found')

如果__name__ =='__main__':
our_employees = Employees()
our_employees.import_gd_dump('test.csv')
our_employees.tidy_data()
our_employees.write_gd_formatted()$ b $我想(1)。(b)我想(1)。我正在寻找一种更好的结构/存储员工/员工的方式,以及(2)特别是与lookup_supervisor()有问题。\



应该我正在创建一个员工类,并将这些内嵌嵌入到员工中?



我甚至应该在做我正在做的tidy_data(),并调用clean_phone_number()和lookup_supervisor()在dict的项目上的for循环? Urgh。

解决方案

我的python技能很差,所以我太无知了,写出什么我在任何合理的时间都记住。但是我知道如何做OO分解。



为什么 Employees 类可以做所有的工作?您的单体Employees类有以下几种类型:




  • 从文件读取和写入数据 - 也称为序列化

  • 管理和访问个人员工的数据

  • 管理员工之间的关系。



我建议你创建一个类来处理列出的每个任务组。



定义一个 Employee 跟踪或员工数据,并处理现场处理/整理任务。



使用 Employees 类作为员工对象的容器。它可以处理跟踪员工主管等任务。



定义一个虚拟基类EmployeeLoader来定义一个接口(load,store,??)。然后实现CSV文件序列化的子类。 (虚拟基类是可选的 - 我不知道Python如何处理虚拟类,所以这可能甚至没有意义。)



所以:




  • 创建一个具有文件名称的 EmployeeCSVLoader 的实例。

  • 加载器然后可以构建一个 Employees 对象并解析文件。

  • 随着每个记录被读取,一个新的Employee对象将被创建并存储在Employees对象中。

  • 现在请求Employees对象填充主管链接。

  • 迭代Employees对象的集合的员工,并要求每个人自己整理。

  • 最后,让序列化对象处理更新数据文件。



为什么这个设计值得付出努力?



它使事情更容易理解。较小的,以任务为重点的对象更容易为。创建清晰,一致的API。



如果您发现需要XML序列化格式,添加新格式将变得微不足道。将您的虚拟加载器类子类化以处理XML解析/生成。现在,您可以无缝地在CSV和XML格式之间移动。



总之,使用对象来简化和构建数据。将共同的数据和行为划分为不同的类。让每个课程都把重点放在单一类型的能力上。如果你的班级是一个集合,访问者,工厂,厨房水槽,API永远不可用:它将会太大,并加载不同的方法组。但是,如果您的课程保持主题,他们将很容易测试,维护,使用,重用和扩展。


This is a followup questions to this one:

Python DictReader - Skipping rows with missing columns?

Turns out I was being silly, and using the wrong ID field.

I'm using Python 3.x here, btw.

I have a dict of employees, indexed by a string, "directory_id". Each value is a nested dict with employee attributes (phone number, surname etc.). One of these values is a secondary ID, say "internal_id", and another is their manager, call it "manager_internal_id". The "internal_id" field is non-mandatory, and not every employee has one.

{'6443410501': {'manager_internal_id': '989634', 'givenName': 'Mary', 'phoneNumber': '+65 3434 3434', 'sn': 'Jones', 'internal_id': '434214'}
'8117062158': {'manager_internal_id': '180682', 'givenName': 'John', 'phoneNumber': '+65 3434 3434', 'sn': 'Ashmore', 'internal_id': ''}
'9227629067': {'manager_internal_id': '347394', 'givenName': 'Wright', 'phoneNumber': '+65 3434 3434', 'sn': 'Earl', 'internal_id': '257839'}
'1724696976': {'manager_internal_id': '907239', 'givenName': 'Jane', 'phoneNumber': '+65 3434 3434', 'sn': 'Bronte', 'internal_id': '629067'}

}

(I've simplified the fields a little, both to make it easier to read, and also for privacy/compliance reasons).

The issue here is that we index (key) each employee by their directory_id, but when we lookup their manager, we need to find managers by their "internal_id".

Before, when our dict was using internal_id as the key, employee.keys() was a list of internal_ids, and I was using a membership check on this. Now, the last part of my if statement won't work, since the internal_ids is part of the dict values, instead of the key itself.

def lookup_supervisor(manager_internal_id, employees):
    if manager_internal_id is not None and manager_internal_id != "" and manager_internal_id in employees.keys():
        return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn'])
    else:
        return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found')

So the first question is, how do I fix the if statement to check whether the manager_internal_id is present in the dict's list of internal_ids?

I've tried substituting employee.keys() with employee.values(), that didn't work. Also, I'm hoping for something a little more efficient, not sure if there's a way to get a subset of the values, specifically, all the entries for employees[directory_id]['internal_id'].

Hopefully there's some Pythonic way of doing this, without using a massive heap of nested for/if loops.

My second question is, how do I then cleanly return the required employee attributes (mail, givenname, surname etc.). My for loop is iterating over each employee, and calling lookup_supervisor. I'm feeling a bit stupid/stumped here.

def tidy_data(employees):
    for directory_id, data in employees.items():
        # We really shouldnt' be passing employees back and forth like this - hmm, classes?
        data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = lookup_supervisor(data['manager_internal_id'], employees)

Should I redesign my data-structure? Or is there another way?

EDIT: I've tweaked the code slightly, see below:

class Employees:

    def import_gd_dump(self, input_file="test.csv"):
        gd_extract = csv.DictReader(open(input_file), dialect='excel')
        self.employees = {row['directory_id']:row for row in gd_extract}

    def write_gd_formatted(self, output_file="gd_formatted.csv"):
        gd_output_fieldnames = ('internal_id', 'mail', 'givenName', 'sn', 'dbcostcenter', 'directory_id', 'manager_internal_id', 'PHFull', 'PHFull_message', 'SupervisorEmail', 'SupervisorFirstName', 'SupervisorSurname')
        try:
            gd_formatted = csv.DictWriter(open(output_file, 'w', newline=''), fieldnames=gd_output_fieldnames, extrasaction='ignore', dialect='excel')
        except IOError:
            print('Unable to open file, IO error (Is it locked?)')
            sys.exit(1)

        headers = {n:n for n in gd_output_fieldnames}
        gd_formatted.writerow(headers)
        for internal_id, data in self.employees.items():
            gd_formatted.writerow(data)

    def tidy_data(self):
        for directory_id, data in self.employees.items():
            data['PHFull'], data['PHFull_message'] = self.clean_phone_number(data['telephoneNumber'])
            data['SupervisorEmail'], data['SupervisorFirstName'], data['SupervisorSurname'] = self.lookup_supervisor(data['manager_internal_id'])

    def clean_phone_number(self, original_telephone_number):
        standard_format = re.compile(r'^\+(?P<intl_prefix>\d{2})\((?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})')
        extra_zero = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})-(?P<local_second_half>\d{4})')
        missing_hyphen = re.compile(r'^\+(?P<intl_prefix>\d{2})\(0(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<local_second_half>\d{4})')
        if standard_format.search(original_telephone_number):
            result = standard_format.search(original_telephone_number)
            return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), ''
        elif extra_zero.search(original_telephone_number):
            result = extra_zero.search(original_telephone_number)
            return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Extra zero in area code - ask user to remediate. '
        elif missing_hyphen.search(original_telephone_number):
            result = missing_hyphen.search(original_telephone_number)
            return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), 'Missing hyphen in local component - ask user to remediate. '
        else:
            return '', "Number didn't match format. Original text is: " + original_telephone_number    

    def lookup_supervisor(self, manager_internal_id):
        if manager_internal_id is not None and manager_internal_id != "":# and manager_internal_id in self.employees.values():
            return (employees[manager_internal_id]['mail'], employees[manager_internal_id]['givenName'], employees[manager_internal_id]['sn'])
        else:
            return ('Supervisor Not Found', 'Supervisor Not Found', 'Supervisor Not Found')

if __name__ == '__main__':
    our_employees = Employees()
    our_employees.import_gd_dump('test.csv')
    our_employees.tidy_data()
    our_employees.write_gd_formatted()

I guess (1). I'm looking for a better way to structure/store Employee/Employees, and (2) I'm having issues in particular with lookup_supervisor().\

Should I be creating an Employee Class, and nesting these inside Employees?

And should I even be doing what I'm doing with tidy_data(), and calling clean_phone_number() and lookup_supervisor() on a for loop on the dict's items? Urgh. confused.

解决方案

My python skills are poor, so I am far too ignorant to write out what I have in mind in any kind of reasonable time. But I do know how to do OO decomposition.

Why does the Employees class to do all the work? There are several types of things that your monolithic Employees class does:

  • Read and write data from a file - aka serialization
  • Manage and access data from individual employees
  • Manage relationships between exmployees.

I suggest that you create a class to handle each task group listed.

Define an Employee class to keep track or employee data and handle field processing/tidying tasks.

Use the Employees class as a container for employee objects. It can handle tasks like tracking down an Employee's supervisor.

Define a virtual base class EmployeeLoader to define an interface (load, store, ?? ). Then implement a subclass for CSV file serialization. (The virtual base class is optional--I'm not sure how Python handles virtual classes, so this may not even make sense.)

So:

  • create an instance of EmployeeCSVLoader with a file name to work with.
  • The loader can then build an Employees object and parse the file.
  • As each record is read, a new Employee object will be created and stored in the Employees object.
  • Now ask the Employees object to populate supervisor links.
  • Iterate over the Employees object's collection of employees and ask each one to tidy itself.
  • Finally, let the serialization object handle updating the data file.

Why is this design worth the effort?

It makes things easier to understand. Smaller, task focused objects are easier to create clean, consistent APIs for.

If you find that you need an XML serialization format, it becomes trivial to add the new format. Subclass your virtual loader class to handle the XML parsing/generation. Now you can seamlessly move between CSV and XML formats.

In summary, use objects to simplify and structure your data. Section off common data and behaviors into separate classes. Keep each class tightly focused on a single type of ability. If your class is a collection, accessor, factory, kitchen sink, the API can never be usable: it will be too big and loaded with dissimilar groups of methods. But if your classes stay on topic, they will be easy to test, maintain, use, reuse, and extend.

这篇关于检查嵌套字典中的成员资格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆