如何测试一个dict是否是另一个dict的子集? [英] How to test if one dict is subset of another?

查看:74
本文介绍了如何测试一个dict是否是另一个dict的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我有一些代码可以实现以下功能:


- 收集有关成千上万件商品的信息(在这种情况下,作业

在一个
计算群集上运行)

- 将信息存储为Job的列表(每个作业一个)项目

(基本上包裹

字典将属性名称映射到值)


然后对数据进行一些计算。

代码需要做的事情之一就是经常在列表中查找并找到工作

某个类:

j在工作中


if(j.get(''user'')==''jeff''和j.get(''state'')==' 'running''):

do_something()


此操作最终是性能的限制因素。

我是什么我想试试,如果有可能,那就做点什么

这样:


如果j.subset_attr({''user'':' 'jeff'',''state'':''running''}}:

do_something()

其中subset_attr会看到传入的dict是否是子集

底层属性字典j:


j1'的字典:{''user'':''jeff'',''开始'':43,''排队'':''qlong'',

''陈述'''''跑''}

j2'的字典:{''用户'': ''jeff'',''start'':57,''queue'':''qlong'',

''州'':''排队''}


所以在第二个片段中,如果j是j1,那么subset_attr将返回

true,对于j2,答案将是假的(因为''state''值

不一样。)


有什么建议吗?约束:答案必须适用于两个python

2.2和2.3(最好是之后的所有蟒蛇)。


JT

推荐答案

Jay Tee schrieb:
Jay Tee schrieb:




我有一些代码基本上可以做到以下几点:


- 收集有关成千上万件物品的信息(在这种情况下,工作

正在运行在
计算群集上)

- 将信息存储为作业项目列表(每个作业一个)

(基本上包裹

字典将属性名称映射到值)


然后对数据进行一些计算。

代码需要做的事情之一就是经常在列表中查找并找到工作

某个类:

j在工作中


if(j.get(''user'')==''jeff''和j.get(''state'')==' 'running''):

do_something()


此操作最终是性能的限制因素。

我是什么我想试试,如果有可能,那就做点什么

这样:


如果j.subset_attr({''user'':' 'jeff'',''state'':''running''}}:

do_something()


其中subset_attr会看到dict是否通过in是

基础属性字典的一部分j:
Hi,

I have some code that does, essentially, the following:

- gather information on tens of thousands of items (in this case, jobs
running on a
compute cluster)
- store the information as a list (one per job) of Job items
(essentially wrapped
dictionaries mapping attribute names to values)

and then does some computations on the data. One of the things the
code needs to do, very often, is troll through the list and find jobs
of a certain class:

for j in jobs:
if (j.get(''user'') == ''jeff'' and j.get(''state'')==''running'') :
do_something()

This operation is ultimately the limiting factor in the performance.
What I would like to try, if it is possible, is instead do something
like this:

if j.subset_attr({''user'' : ''jeff'', ''state'' : ''running''}) :
do_something()
where subset_attr would see if the dict passed in was a subset of the
underlying attribute dict of j:



这仍然需要在作业中的所有项目上运行。没有收获。

This would still need to run over all items in jobs. No gain.


>

j1'的词典:{''user'':''jeff'',''开始'':43,''排队'':''qlong'',

''陈述'''''正在运行''}

j2''s dict:{''user'':''jeff'',''start'':57,''queue'':''qlong'',

''state'':' '排队''}


所以在第二个片段中,如果j是j1那么subset_attr将返回

true,对于j2,答案将是假的(因为州的价值

不一样)。
>
j1''s dict : { ''user'' : ''jeff'', ''start'' : 43, ''queue'' : ''qlong'',
''state'' : ''running'' }
j2''s dict : { ''user'' : ''jeff'', ''start'' : 57, ''queue'' : ''qlong'',
''state'' : ''queued'' }

so in the second snippet, if j was j1 then subset_attr would return
true, for j2 the answer would be false (because of the ''state'' value
not being the same).



如果你的工作字典对于密钥集是不可变的(不是来自

它的实现,而是来自它的用法),你可以做的事情来增强性能就是创建一个索引。取一个谓语如


def p(j):

返回j.get(''user'')=='''jeff''


并建立一个清单


jeffs_jobs = [如果p(j),j为工作中的j]


那么你只能测试这些。或者,如果你有很多这样的谓词/动作对,请尝试在所有作业上循环一次,

相应地应用谓词和动作。


Diez

If you''re jobs dictionary is immutable regarding the key-set (not from
it''s implementation, but from its usage), the thing you can do to
enhance performance is to create an index. Take a predicate like

def p(j):
return j.get(''user'') == ''jeff''

and build a list

jeffs_jobs = [j for j in jobs if p(j)]

Then you can test only over these. Alternatively, if you have quite a
few of such predicate/action-pairs, try and loop once over all jobs,
applynig the predicates and actions accordingly.

Diez


Jay Tee写道:
Jay Tee wrote:



我有一些代码基本上可以执行以下操作:


- 收集有关数万件物品的信息(在这种情况下,工作
b $ b计算群集上运行


- 将信息存储为作业项目列表(每个作业一个)

(基本上包裹

字典将属性名称映射到值)


然后对数据进行一些计算。

代码需要做的事情之一就是经常在列表中查找并找到工作

某个类:

j在工作中


if(j.get(''user'')==''jeff''和j.get(''state'')==' 'running''):

do_something()


此操作最终是性能的限制因素。

我是什么我想试试,如果有可能,那就做点什么

这样:


如果j.subset_attr({''user'':' 'jeff'',''state'':''running''}}:

do_something()


其中subset_attr会看到dict是否通过in是

底层属性字典的一部分j:


j1'的字典:{''user'':''jeff'' ,''开始'':43,''排队'':''qlong'',

''州'':''正在运行''}

j2 's dict :{''user'':''jeff'',''start'':57,''queue'':''qlong'',

''state'':''排队''}


所以在第二个片段中,如果j是j1那么subset_attr将返回

true,对于j2,答案将是假的(因为''州''价值

不一样)。


有什么建议吗?约束:答案必须适用于两个python

2.2和2.3(最好是之后的所有pythons)。
Hi,

I have some code that does, essentially, the following:

- gather information on tens of thousands of items (in this case, jobs
running on a
compute cluster)
- store the information as a list (one per job) of Job items
(essentially wrapped
dictionaries mapping attribute names to values)

and then does some computations on the data. One of the things the
code needs to do, very often, is troll through the list and find jobs
of a certain class:

for j in jobs:
if (j.get(''user'') == ''jeff'' and j.get(''state'')==''running'') :
do_something()

This operation is ultimately the limiting factor in the performance.
What I would like to try, if it is possible, is instead do something
like this:

if j.subset_attr({''user'' : ''jeff'', ''state'' : ''running''}) :
do_something()
where subset_attr would see if the dict passed in was a subset of the
underlying attribute dict of j:

j1''s dict : { ''user'' : ''jeff'', ''start'' : 43, ''queue'' : ''qlong'',
''state'' : ''running'' }
j2''s dict : { ''user'' : ''jeff'', ''start'' : 57, ''queue'' : ''qlong'',
''state'' : ''queued'' }

so in the second snippet, if j was j1 then subset_attr would return
true, for j2 the answer would be false (because of the ''state'' value
not being the same).

Any suggestions? Constraint : the answer has to work for both python
2.2 and 2.3 (and preferably all pythons after that).



使用RDBMS(数据库),他们往往擅长这种操作。


Peter

Use a RDBMS (a database), they tend to be good at this kind of operations.

Peter


2月19日上午11点07分,Peter Otten< __ pete ... @ web.dewrote:
On Feb 19, 11:07 am, Peter Otten <__pete...@web.dewrote:

使用RDBMS(数据库),他们往往擅长这种操作。
Use a RDBMS (a database), they tend to be good at this kind of operations.



是的,其中一个选项是metakit ... sqlite和buzhug两个看起来很有希望但是蟒蛇2.2和2.3的约束有统治那个。

metakit的缺点是它不是纯粹的python,这意味着可能会有b
整合问题。该系统必须部署在全球200多个站点上,基于RHEL 3和4的系统,以及一些Debian

集群,并运行实际生产。 ..


因此我希望尽可能找到纯粹的蟒蛇解决方案。

它看起来很严峻。


JT

yeah, one of the options is metakit ... sqlite and buzhug both looked
promising but the constraint of pythons 2.2 and 2.3 ruled that out.
disadvantage of metakit is that it''s not pure python, meaning possible
integration problems. the system has to be deployed at 200+ sites
worldwide on a mix of RHEL 3 and 4 based systems, with some Debian
clusters thrown in, and running real production ...

hence my desire to find a pure-python solution if at all possible.
it''s looking grim.

JT


这篇关于如何测试一个dict是否是另一个dict的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆