用 pandas 处理路径的最佳方法 [英] Best way to handle path with pandas

查看:74
本文介绍了用 pandas 处理路径的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我有一个带有路径的 pd.DataFrame 时,我最终做了很多 .map(lambda path: Path(path).{method_name},或者apply(axis=1) 例如:

When I have a pd.DataFrame with paths, I end up doing a lot of .map(lambda path: Path(path).{method_name}, or apply(axis=1) e.g:

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    .assign(full_path=lambda df: df.apply(lambda row: Path(row.base_dir) / row.file_name, axis=1))
)
  base_dir file_name     full_path
0    dir_A    file_0  dir_A/file_0
1    dir_B    file_1  dir_B/file_1

这对我来说似乎很奇怪,尤其是因为 pathlib 确实实现了 /,所以像 df.base_dir/df.file_name 这样的东西会更 Pythonic和自然.

It seems odd to me especially because pathlib does implement / so that something like df.base_dir / df.file_name would be more pythonic and natural.

我没有找到在 Pandas 中实现的任何 path 类型,是不是我遗漏了什么?

I have not found any path type implemented in pandas, is there something I am missing?

我发现最好一次性做一个 astype(path) 然后至少对于与 pathlib 的路径连接,它是矢量化的:

I have found it may be better to once for all do sort of a astype(path) then at least for path concatenation with pathlib it is vectorized:

(
    pd.DataFrame({'base_dir': ['dir_A', 'dir_B'], 'file_name': ['file_0', 'file_1']})
    # this is where I would expect `astype({'base_dir': Path})`
    .assign(**{col_name:lambda df: df[col_name].map(Path) for col_name in ["base_dir", "file_name"]})
    .assign(full_path=lambda df: df.base_dir / df.file_name)
)

推荐答案

看起来最简单的方法是:

It seems like the easiest way would be:

df.base_dir.map(Path) / df.file_name.map(Path)

它节省了对 lambda 函数的需求,但您仍然需要映射到路径".

It saves the need for a lambda function, but you still need to map to 'Path'.

或者,只需:

df.base_dir.str.cat(df.file_name, sep="/")

后者在 Windows 上不起作用(谁在乎,对吧?:) 但可能会运行得更快.

The latter won't work on Windows (who cares, right? :) but will probably run faster.

这篇关于用 pandas 处理路径的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆