为数据集生成随机JSON结构排列 [英] Generate random JSON structure permutations for a data set

查看:67
本文介绍了为数据集生成随机JSON结构排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成JSON结构的许多不同排列作为同一数据集的表示,最好不必对实现进行硬编码.例如,给定以下JSON:

I want to generate many different permutations of JSON structures as a representation of the same data set, preferably without having to hard code the implementation. For example, given the following JSON:

{"name": "smith", "occupation": "agent", "enemy": "humanity", "nemesis": "neo"}`

应产生许多不同的排列,例如:

Many different permutations should be produced, such as:

  • 更改名称:{"name":"smith"}- > {"last_name":"smith"}
  • 更改顺序:{"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."}
  • 安排变更:{"name":"...","occupation":"..."} -> "smith":{"occupation":"..."}
  • 更改模板:{"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."}
  • change in name : {"name":"smith"}- > {"last_name":"smith"}
  • change in order: {"name":"...","occupation":"..."} -> {"occupation":"...", "name":"..."}
  • change in arrangement: {"name":"...","occupation":"..."} -> "smith":{"occupation":"..."}
  • change in template: {"name":"...","occupation":"..."} -> "status": 200, "data":{"name":"...","occupation":"..."}
  • etc.

当前,实现如下:

我正在使用itertools.permutations和OrderedDict()来选择可能的键和各个值组合以及它们返回的顺序.

I am using itertools.permutations and OrderedDict() to range through the possible key and respective value combinations as well as the order in which they are returned.

key_permutations = SchemaLike(...).permutate()

all_simulacrums = []
for key_permutation in key_permutations:
   simulacrums = OrderedDict(key_permutation)
   all_simulacrums.append(simulacrums)
for x in itertools.permutations(all_simulacrums.items()):
    test_data = json.dumps(OrderedDict(p))
    print(test_data)
    assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

当我尝试实现排列和模板的排列时,会发生我的问题. 我不知道如何最好地实现此功能,有什么建议吗?

My problem occurs when I try to implement the permutations of arrangement and template. I don't know how best to implement this functionality, any suggestions?

推荐答案

要订购,只需使用订购的字典:

For ordering, just use ordered dicts:

>>> data = OrderedDict(foo='bar', bacon='eggs', bar='foo', eggs='bacon')
>>> for p in itertools.permutations(data.items()):
...     test_data = json.dumps(OrderedDict(p))
...     print(test_data)
...     assert json.loads(test_data) == data, 'Oops! {} != {}'.format(test_data, data)

{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "bacon", "bar": "foo"}
{"foo": "bar", "bar": "foo", "bacon": "eggs", "eggs": "bacon"}
{"foo": "bar", "bar": "foo", "eggs": "bacon", "bacon": "eggs"}
{"foo": "bar", "eggs": "bacon", "bacon": "eggs", "bar": "foo"}
{"foo": "bar", "eggs": "bacon", "bar": "foo", "bacon": "eggs"}
{"bacon": "eggs", "foo": "bar", "bar": "foo", "eggs": "bacon"}
{"bacon": "eggs", "foo": "bar", "eggs": "bacon", "bar": "foo"}
{"bacon": "eggs", "bar": "foo", "foo": "bar", "eggs": "bacon"}
{"bacon": "eggs", "bar": "foo", "eggs": "bacon", "foo": "bar"}
{"bacon": "eggs", "eggs": "bacon", "foo": "bar", "bar": "foo"}
{"bacon": "eggs", "eggs": "bacon", "bar": "foo", "foo": "bar"}
{"bar": "foo", "foo": "bar", "bacon": "eggs", "eggs": "bacon"}
{"bar": "foo", "foo": "bar", "eggs": "bacon", "bacon": "eggs"}
{"bar": "foo", "bacon": "eggs", "foo": "bar", "eggs": "bacon"}
{"bar": "foo", "bacon": "eggs", "eggs": "bacon", "foo": "bar"}
{"bar": "foo", "eggs": "bacon", "foo": "bar", "bacon": "eggs"}
{"bar": "foo", "eggs": "bacon", "bacon": "eggs", "foo": "bar"}
{"eggs": "bacon", "foo": "bar", "bacon": "eggs", "bar": "foo"}
{"eggs": "bacon", "foo": "bar", "bar": "foo", "bacon": "eggs"}
{"eggs": "bacon", "bacon": "eggs", "foo": "bar", "bar": "foo"}
{"eggs": "bacon", "bacon": "eggs", "bar": "foo", "foo": "bar"}
{"eggs": "bacon", "bar": "foo", "foo": "bar", "bacon": "eggs"}
{"eggs": "bacon", "bar": "foo", "bacon": "eggs", "foo": "bar"}

相同的原理可以应用于键/值的排列:

The same principle can be applied for key/value permutations:

>>> for p in itertools.permutations(data.keys()):
...:     test_data = json.dumps(OrderedDict(zip(p, data.values())))
...:     print(test_data)
...:     
{"foo": "bar", "bacon": "eggs", "bar": "foo", "eggs": "bacon"}
{"foo": "bar", "bacon": "eggs", "eggs": "foo", "bar": "bacon"}
{"foo": "bar", "bar": "eggs", "bacon": "foo", "eggs": "bacon"}
{"foo": "bar", "bar": "eggs", "eggs": "foo", "bacon": "bacon"}
{"foo": "bar", "eggs": "eggs", "bacon": "foo", "bar": "bacon"}
{"foo": "bar", "eggs": "eggs", "bar": "foo", "bacon": "bacon"}
{"bacon": "bar", "foo": "eggs", "bar": "foo", "eggs": "bacon"}
{"bacon": "bar", "foo": "eggs", "eggs": "foo", "bar": "bacon"}
{"bacon": "bar", "bar": "eggs", "foo": "foo", "eggs": "bacon"}
{"bacon": "bar", "bar": "eggs", "eggs": "foo", "foo": "bacon"}
{"bacon": "bar", "eggs": "eggs", "foo": "foo", "bar": "bacon"}
{"bacon": "bar", "eggs": "eggs", "bar": "foo", "foo": "bacon"}
{"bar": "bar", "foo": "eggs", "bacon": "foo", "eggs": "bacon"}
{"bar": "bar", "foo": "eggs", "eggs": "foo", "bacon": "bacon"}
{"bar": "bar", "bacon": "eggs", "foo": "foo", "eggs": "bacon"}
{"bar": "bar", "bacon": "eggs", "eggs": "foo", "foo": "bacon"}
{"bar": "bar", "eggs": "eggs", "foo": "foo", "bacon": "bacon"}
{"bar": "bar", "eggs": "eggs", "bacon": "foo", "foo": "bacon"}
{"eggs": "bar", "foo": "eggs", "bacon": "foo", "bar": "bacon"}
{"eggs": "bar", "foo": "eggs", "bar": "foo", "bacon": "bacon"}
{"eggs": "bar", "bacon": "eggs", "foo": "foo", "bar": "bacon"}
{"eggs": "bar", "bacon": "eggs", "bar": "foo", "foo": "bacon"}
{"eggs": "bar", "bar": "eggs", "foo": "foo", "bacon": "bacon"}
{"eggs": "bar", "bar": "eggs", "bacon": "foo", "foo": "bacon"}

以此类推...如果不需要所有组合,则可以只使用一组预定义的键/值.您还可以将for循环与random.choice一起使用以掷硬币,以跳过某些组合,或者使用random.shuffle冒着重复组合的风险.

And so on... You can just use a predefined set of keys/values if you don't need all combinations. You can also use a for loop with random.choice to flip a coin in order to skip some combinations or use random.shuffle at the risk of repeating combinations.

对于模板,我想您必须创建一个不同模板的列表(如果需要嵌套结构,则为列表列表),然后对其进行迭代以创建数据.为了提供更好的建议,我们需要对您想要的内容进行更严格的说明.

For the template thing I guess you must create a list (or a list of lists if you want nested structures) of different templates and then iterate over it in order to create your data. In order to give a better suggestion we need a more constrained specification of what you want.

请注意,有几个库可以在Python中生成测试数据:

Note that there are several libraries that generate test data in Python:

>>> from faker import Faker
>>> faker = Faker()
>>> faker.credit_card_full().strip().split('\n')
['VISA 13 digit', 'Jerry Gutierrez', '4885274641760 04/24', 'CVC: 583']

Faker 具有多种模式,可以轻松创建自己的自定义伪造数据提供者.

Faker has several schemas and it is easy to create your own custom fake data providers.

这篇关于为数据集生成随机JSON结构排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆