使用具有字典列表的数据框的列为该数据框创建其他列 [英] Use the column of a dataframe that has a list of dictionaries to create other columns for the dataframe

查看:80
本文介绍了使用具有字典列表的数据框的列为该数据框创建其他列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框中有一个类型为object的列,其值如下:

I have a column in my dataframe of type object that has values like:

for i in df3['placeholders'][:10]:

Output:
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,25,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'date', 'label': '08 October - 13 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'education', 'label': 'B.Com'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Mumbai Suburbs, Navi Mumbai, Mumbai'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': '50,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Chennai'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '2,00,000 - 2,25,000 PA.'}, {'type': 'location', 'label': 'Bengaluru(JP Nagar)'}]
[{'type': 'experience', 'label': '0-3 Yrs'}, {'type': 'salary', 'label': '80,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-5 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Hyderabad'}]
[{'type': 'experience', 'label': '0-1 Yrs'}, {'type': 'salary', 'label': '1,25,000 - 2,00,000 PA.'}, {'type': 'location', 'label': 'Mumbai'}]
[{'type': 'date', 'label': '08 October - 17 October'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Pune(Bavdhan)'}]
[{'type': 'experience', 'label': '0-2 Yrs'}, {'type': 'salary', 'label': 'Not disclosed'}, {'type': 'location', 'label': 'Jaipur'}]
[{'type': 'experience', 'label': '0-0 Yrs'}, {'type': 'salary', 'label': '1,00,000 - 1,50,000 PA.'}, {'type': 'location', 'label': 'Delhi NCR(Sector-81 Noida)'}]

我想通过从此列中提取特征来向现有数据框中添加更多列,

I want to add more columns to my existing dataframe by extracting features from this column such that

类型"的值= 列名

标签"的值= 列下的值

最终预期输出:

df.head(3)

Output:

..... experience, salary, location, date, education

..... 0-1 Yrs, 1,00,000 - 1,25,000 PA., Chennai, nan, nan
..... nan, 1,00,000 - 1,25,000 PA., Chennai, 08 October - 13 October, nan
..... nan, Not disclosed, Mumbai Suburbs, Navi Mumbai, Mumbai, nan, B.Com

第一个答案有效.

后来,我尝试对具有相同问题的新数据集的第一次响应中建议的相同代码.我收到以下错误:

The first answer worked.

Later, I tried the same code suggested in the first response for a new dataset with same issue. I got the following error:

<ipython-input-23-ad8e644044af> in <listcomp>(.0)
----> 1 new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
      2 # Make a dict of dicts
      3 col_val_dict = {}
      4 for col_name in new_columns:
      5     col_val_dict[col_name] = {}

TypeError: 'float' object is not iterable

我的输入列:

RatingDistribution
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 4}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 4}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 4}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 4}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 4}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 5}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 5}, {'Name': 'Company Culture', 'count': 5}, {'Name': 'Career Growth', 'count': 5}, {'Name': 'Work Satisfaction', 'count': 5}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 3}, {'Name': 'Job Security', 'count': 3}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 3}, {'Name': 'Work Satisfaction', 'count': 4}]
[{'Name': 'Work-Life Balance', 'count': 3}, {'Name': 'Skill Development', 'count': 5}, {'Name': 'Salary & Benefits', 'count': 5}, {'Name': 'Job Security', 'count': 1}, {'Name': 'Company Culture', 'count': 3}, {'Name': 'Career Growth', 'count': 1}, {'Name': 'Work Satisfaction', 'count': 1}]

我的代码:

new_columns = set([d['Name'] for l in dfr.RatingDistribution.values for d in l ])
# Make a dict of dicts 
col_val_dict = {}
for col_name in new_columns:
    col_val_dict[col_name] = {}
    # For each column name look to see if a row has that as a type
    # If so, get the label for that dict
    # otherwise fill it with NaN
    for i,l in enumerate(dfr.placeholders.values):
        the_label = [d['count'] for d in l if d['Name'] == col_name]
        if the_label:
            col_val_dict[col_name][i] = the_label[0]
        else:
            col_val_dict[col_name][i] = np.NaN
            
# Merge this new dfa with the old one
merged_dfa = pd.concat([dfr,pd.DataFrame(col_val_dict)],axis='columns')
dfr.shape

第一行出现错误.我无法弄清楚为什么它会引发浮动错误.

I'm getting error in the very first line. I'm not able to figure out why it is throwing me the float error.

请帮助

推荐答案

# Get the unique types (column names)
new_columns = set([d['type'] for l in df3.placeholders.values for d in l ])
# Make a dict of dicts 
col_val_dict = {}
for col_name in new_columns:
    col_val_dict[col_name] = {}
    # For each column name look to see if a row has that as a type
    # If so, get the label for that dict
    # otherwise fill it with NaN
    for i,l in enumerate(df3.placeholders.values):
        the_label = [d['label'] for d in l if d['type'] == col_name]
        if the_label:
            col_val_dict[col_name][i] = the_label[0]
        else:
            col_val_dict[col_name][i] = np.NaN
            
# Merge this new df with the old one
merged_df = pd.concat([df3,pd.DataFrame(col_val_dict)],axis='columns')

这篇关于使用具有字典列表的数据框的列为该数据框创建其他列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆