如何使用python以字典等格式合并多个重复的键名 [英] How to merge multiple duplicate key names using python in a format like dictionary

查看:48
本文介绍了如何使用python以字典等格式合并多个重复的键名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有像字典这样格式的数据,其中数据有多个重复键,列表中的字符串作为值重复多次,我想合并所有具有相同名称及其值的键,数据发生在使用类似于字典的格式,但不是实际的字典,我将其称为字典仅仅是因为它的存在方式.

I had data in a format like dictionary where I the data had multiple duplicate keys repeated multiple times with strings in a list as values, I want to merge all the keys with the same name and their values, the data was happened to be in a format like dictionary but not an actual dictionary I am referring it as dictionary simply because of the way it was existed.

#Data 如下所示,

#Data I had looks like below,

"city":["New York", "Paris", "London"],
"country":["India", "France", "Italy"],
"city":["New Delhi", "Tokio", "Wuhan"],
"organisation":["ITC", "Google", "Facebook"],
"country":["Japan", "South Korea", "Germany"],
"organisation":["TATA", "Amazon", "Ford"]

我有 1000 个重复的键,其中包含一些重复且唯一的值,我想根据键合并或附加这些值.

I had 1000s of duplicate keys repeating with some repeated and unique values which I wanted merge or append based on key.

#预期输出

"city":["New York", "Paris", "London", "New Delhi", "Tokio", "Wuhan"],
"country":["India", "France", "Italy", "Japan", "South Korea", "Germany"],
"organisation":["ITC", "Google", "Facebook", "TATA", "Amazon", "Ford"],

任何人都可以提出建议.

Can anyone suggest.

推荐答案

  • 已确定这不是dict,而是类似于 JSON 语法的 LR(1) 语法
  • 采用这种方法使用 LR 解析器对其进行解析和标记
  • https://lark-parser.readthedocs.io/en/latest/json_tutorial.html 展示了如何解析 JSON
  • 需要一个小的调整,以便重复键工作(将dict视为列表,请参阅代码)
  • 已使用 pandas 从解析器获取输出并根据需要重塑
    • it's been established this is not a dict, it's a LR(1) grammar that is similar to a JSON grammar
    • taking this approach parse and tokenise it with an LR parser
    • https://lark-parser.readthedocs.io/en/latest/json_tutorial.html shows how to parse JSON
    • needs a small adaptation so that duplicate keys work (consider a dict as a list, see code)
    • have used pandas to take output from parser and reshape as you require
    • from lark import Transformer
      from lark import Lark
      import pandas as pd
      json_parser = Lark(r"""
          ?value: dict
                | list
                | string
                | SIGNED_NUMBER      -> number
                | "true"             -> true
                | "false"            -> false
                | "null"             -> null
      
          list : "[" [value ("," value)*] "]"
      
          dict : "{" [pair ("," pair)*] "}"
          pair : string ":" value
      
          string : ESCAPED_STRING
      
          %import common.ESCAPED_STRING
          %import common.SIGNED_NUMBER
          %import common.WS
          %ignore WS
      
          """, start='value')
      class TreeToJson(Transformer):
          def string(self, s):
              (s,) = s
              return s[1:-1]
          def number(self, n):
              (n,) = n
              return float(n)
      
          list = list
          pair = tuple
          dict = list # deal with issue of repeating keys...
      
          null = lambda self, _: None
          true = lambda self, _: True
          false = lambda self, _: False
      
      js = """{
          "city":["New York", "Paris", "London"],
          "country":["India", "France", "Italy"],
          "city":["New Delhi", "Tokio", "Wuhan"],
          "organisation":["ITC", "Google", "Facebook"],
          "country":["Japan", "South Korea", "Germany"],
          "organisation":["TATA", "Amazon", "Ford"]
      }"""    
          
      tree = json_parser.parse(js)
      
      pd.DataFrame(TreeToJson().transform(tree), columns=["key", "list"]).explode(
          "list"
      ).groupby("key").agg({"list": lambda s: s.unique().tolist()}).to_dict()["list"]
      

      输出

      {'city': ['New York', 'Paris', 'London', 'New Delhi', 'Tokio', 'Wuhan'],
       'country': ['India', 'France', 'Italy', 'Japan', 'South Korea', 'Germany'],
       'organisation': ['ITC', 'Google', 'Facebook', 'TATA', 'Amazon', 'Ford']}
      

      这篇关于如何使用python以字典等格式合并多个重复的键名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆