dfply:更改字符串列:TypeError [英] dfply: Mutating string column: TypeError

查看:104
本文介绍了dfply:更改字符串列:TypeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的熊猫数据框包含文件"列,这些列是带有文件路径的字符串.我正在尝试使用dfply更改此列,例如

My pandas dataframe contains a column "file" which are strings with a file path. I am trying to use dfply to mutate this column like

resultstatsDF.reset_index() >> mutate(dirfile = os.path.join(os.path.basename(os.path.dirname(X.file)),os.path.basename(X.file)))

但是我得到了错误

TypeError: __index__ returned non-int (type Call)

我做错了什么?我该怎么办?

What did I do wrong? How do I do it right?

推荐答案

由于我的问题已被投票通过,我想这对某些人来说仍然很有趣.到目前为止,我已经在Python中学到了很多东西,让我回答一下,也许它将对其他用户有所帮助.

Since my question was up-voted, I guess, it is still interesting to some people. Having learned quite a bit in Python so far, let me answer it, maybe it is going to be helpful to other users.

首先,让我们导入所需的软件包

First, let us import the required packages

import pandas as pd
from dfply import *
from os.path import basename, dirname, join

并制作所需的pandas DataFrame

and make the required pandas DataFrame

resultstatsDF = pd.DataFrame({'file': ['/home/user/this/file1.png', '/home/user/that/file2.png']})

这是

                        file
0  /home/user/this/file1.png
1  /home/user/that/file2.png

我们看到仍然出现错误(尽管由于dfply的不断发展而发生了变化):

We see that we still get an error (though it changed due to continuous development of dfply):

resultstatsDF.reset_index() >> \
mutate(dirfile = join(basename(dirname(X.file)), basename(X.file)))

TypeError:索引返回了非整数(类型Intent)

TypeError: index returned non-int (type Intention)

原因是,因为mutate处理序列,但是我们需要一个函数处理元素.在这里,我们可以使用 pandas.Series.apply 的熊猫,可用于系列. 但是,我们还需要一个自定义函数,可以将其应用于系列file的每个元素. 一切放在一起,我们最终得到了代码

The reason is, because mutate works on series, but we need a function working on elements. Here we can use the function pandas.Series.apply of pandas, which works on series. However, we also need a custom function that we can apply on each element of the series file. Everything put together we end up with the code

def extract_last_dir_plus_filename(series_element):
    return join(basename(dirname(series_element)), basename(series_element))

resultstatsDF.reset_index() >> \
mutate(dirfile = X.file.apply(extract_last_dir_plus_filename))

输出

   index                       file         dirfile
0      0  /home/user/this/file1.png  this/file1.png
1      1  /home/user/that/file2.png  that/file2.png

在没有dfply的mutate的情况下进行此操作,我们可以选择替代

Doing this without dfply's mutate, we could write alternatively

resultstatsDF['dirfile'] = resultstatsDF.file.apply(extract_last_dir_plus_filename)

这篇关于dfply:更改字符串列:TypeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆