pandas 桌刮 [英] Pandas table scrape

查看:81
本文介绍了 pandas 桌刮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出将表转换为JSON记录的最佳方法.目前,我可以按需输出,但是表格的格式让我有些困惑.下面的示例应说明:

I am trying to figure the best approach of converting a table into JSON records. At present I have the output as desired however the format of the table is puzzling me a little. The example below should explain:

ID   Product        Item_Material   Owner           Interest %
123  Test Item 1    Electric        Elctrotech              60%
null null           null            Spark inc               40%
124  Test Item 2    Wood            TY Toys                 100%
125  Test Item 3    Plastic         NA Materials            100%

我想要的新行JSON是我想要的,但是我希望通过某种方式将嵌套表行实现为嵌套JSON格式(如果它是父行的一部分).

My new line JSON is what I want but I am looking to somehow achieve the nested table rows into a nested JSON format if part of the parent row.

{"ID":"Test Item 1", "Item_Material":"Electric", "Owner":"Elctrotech","Interest %":"60%"}
{"ID":null, "Item_Material":null, "Owner":"Spark inc","Insterest %":"40%"} 
{"ID":"Test Item 2", "Item_Material":"Wood", "Owner":"TY Toys","Insterest %":"100%"}
{"ID":"Test Item 3","Item_Material":"Plastic","Owner":"NA Materials","Interest %":"100%"}

目标是让第一行JSON像这样吗?

The aim would be to have the first row JSON something like this?

{"ID":"Test Item 1", "Item_Material":"Electric", "Owners": [{"Owner": "Elctrotech", "Interest %":"60%", "Owner":"Spark inc","Interest %":"40%"}]}

数据来自使用Beautiful Soup的刮擦表,我提供的表中的行均位于单独的<tr>标记中,因此当将其拖入熊猫数据框中时,将以这种方式显示.我不知道是否有功能甚至可以在大熊猫中合并到上一行,因此每个产品"可以有一个JSON记录.有时每个项目可能有多个所有者",而不仅仅是2个.

The data originates from a scraped table using Beautiful Soup, the rows in the table I have provided are all in separate <tr> tags so when pulled into a pandas dataframe it is presented this way. I dont know if there is functionality to even merge in pandas to the row above so I can have one JSON record per 'Product'. Sometimes there can be multiple 'Owners' per item not just 2.

推荐答案

输出dict行与您期望的不同,但是您的dict sintax错误.试试这个.仅限熊猫

The output dict line is not the same that you expected, but your dict sintax was wrong. Try this. Only with Pandas

p=[[123,"Test Item 1","Electric","Elctrotech","60%"], [124,"Test Item 2","Wood"," TY Toys","100%"],[125,"Test Item 1","Plastic","NA Materials","100%"], [123,"Test Item 1","Foo","Bar","80%"], [123,"Test Item 1","Electric","TRY TRY TRY","70%"]]

x=pd.DataFrame(p, columns=["ID","Product","Item_Material","Owner","Interest %"])

d=dict(ID="", Item_Material="", Owners={"Owner":[], "Interest %":[]})
x_gb=x.groupby(["Product", "Item_Material"])
grouped_Series_Owner = x_gb["Owner"].apply(list).to_dict()
grouped_Series_Interest = x_gb["Interest %"].apply(list).to_dict()
for k in out.keys():
    d["Item_Material"]=out[k]["Item_Material"]
    d["ID"]=out[k]["Product"]
    d["Owners"]["Owner"]= grouped_Series_Owner[(out[k]["Product"], out[k]["Item_Material"])]
    d["Owners"]["Interest %"]= grouped_Series_Interest[(out[k]["Product"], out[k]["Item_Material"])]
    print(d)

这篇关于 pandas 桌刮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆