分割df中的每一行并为每个元素添加值 [英] Split every row in df and add value to each element

查看:55
本文介绍了分割df中的每一行并为每个元素添加值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的df:

I have a df that looks like this:

user_index  movie_index  genre_index          cast_index
3590        1514         10|12|17|35          46|534
63          563          4|2|1|8              9|27

,是从以下位置生成的:

and was generated from:

import pandas as pd
ds = pd.DataFrame({'user_index': [3590,63], 'movie_index': [1514,563], 
'genre_index':['10|12|17|35', '4|2|1|8'], 'cast_index':['46|534', '9|27']})

我需要用'|'分隔每一行(而将每一行转换为列表),并向每个元素添加一些值以获取此类df(此处,在'genre_index'列中逐元素添加'5',在'user_index'列中逐元素添加'2' ):

I need to split every row by '|' (whereas converting every row to list) and to add to each element some value to get such df (here, '5' is added element-wise in column 'genre_index', '2' is added element-wise in column 'user_index'):

    user_index  movie_index  genre_index          cast_index
    [3592]      [1514]       [15,17,22,38]        [46,534]
    [65]        [563]        [9,7,6,13]            [9,27]

为实现这一点,我创建了一个函数,该函数将列作为参数,将其拆分并逐元素添加一个值(我不将'df'作为参数,因为每个列的附加值都会有所不同)像这样:

to achieve this, I create a function that takes column as an argument, splits it and adds a value element-wise (I don't take 'df' as argument as an added value would be different for each column) looks like this:

def df_convertion(input_series, offset):
    column = input_series.str.split('|', expand=False).apply(lambda x: x + offset)
    return (column)

但是很明显,整个事情并没有按预期工作(我已经尝试过'genre_index'列)并返回这样的错误:

but apparently the whole thing doesn't work as desired (I've tried for 'genre_index' column) and returns such an error:

TypeError: can only concatenate list (not "int") to list

在修复它方面的任何帮助将不胜感激!

Any help in fixing it would be very appreciated!

推荐答案

这是我建议使用apply的那些罕见情况之一.尝试看看您是否可以对数据使用其他某种形式的表示形式.

This is one of those rare times I'll suggest using apply. Try to see whether you can use some other form of representation for your data.

offset_dct = {'user_index': 2, 'genre_index': 5}
df = df.fillna('').astype(str).apply(lambda x: [
    [int(z) + offset_dct.get(x.name, 0) for z in y.split('|')] for y in x])

df
  cast_index       genre_index movie_index user_index
0  [46, 534]  [15, 17, 22, 40]      [1514]     [3592]
1    [9, 27]     [9, 7, 6, 13]       [563]       [65]

这篇关于分割df中的每一行并为每个元素添加值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆