正则表达式返回文件名,删除路径和文件扩展名 [英] Regex return file name, remove path and file extension

查看:169
本文介绍了正则表达式返回文件名,删除路径和文件扩展名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame,其中包含文件名的文本列.我想返回不带路径或文件扩展名的文件名.通常,我的文件名已编号,但不一定要编号.例如:

I have a data.frame that contains a text column of file names. I would like to return the file name without the path or the file extension. Typically, my file names have been numbered, but they don't have to be. For example:

df<-data.frame(data=c("a","b"),fileNames=c("C:/a/bb/ccc/NAME1.ext","C:/a/bb/ccc/d D2/name2.ext"))

我想返回的等价物

df<-data.frame(data=c("a","b"),fileNames=c("NAME","name"))

但是我无法弄清楚使用gsub的光滑正则表达式.例如,我可以使用以下命令来消除扩展名(只要文件名以数字结尾):

but I cannot figure out the slick regular expression to do this with gsub. For example, I can get rid of the extension with (provided the file name ends with a number):

gsub('([0-9]).ext','',df[,"fileNames"])

尽管我一直在尝试各种模式(通过阅读此站点上的regex帮助文件和类似的解决方案),但是我无法获得一个regex来返回最后一个"/"和第一个."之间的文本.对于类似问题的任何想法或建议,我们将不胜感激!

Though I've been trying various patterns (by reading the regex help files and similar solutions on this site), I can't get a regex to return the text between the last "/" and the first ".". Any thoughts or forwards to similar questions are much appreciated!

我得到的最好的是:

 gsub('*[[:graph:]_]/|*[[:graph:]_].ext','',df[,"fileNames"])

但这1)并不能消除所有前导路径字符,而2)取决于特定的文件扩展名.

But this 1) doesn't get rid of all the leading path characters and 2) is dependent on a specific file extension.

推荐答案

也许这将使您更接近解决方案:

Perhaps this will get you closer to your solution:

library(tools)
basename(file_path_sans_ext(df$fileNames))
# [1] "NAME1" "name2"

file_path_sans_ext函数来自工具"包(我相信通常随R附带),它将提取直至(但不包括)扩展名的路径. basename函数将删除您的路径信息.

The file_path_sans_ext function is from the "tools" package (which I believe usually comes with R), and that will extract the path up to (but not including) the extension. The basename function will then get rid of your path information.

或者,要从file_path_sans_ext获取并对其进行一些修改,可以尝试:

Or, to take from file_path_sans_ext and modify it a bit, you can try:

sub("(.*\\/)([^.]+)(\\.[[:alnum:]]+$)", "\\2", df$fileNames)
# [1] "NAME1" "name2"

在这里,我已经捕获"了"fileNames"变量的所有三个部分,因此,如果只需要文件路径,则可以将"\\2"更改为"\\1",如果只需要文件扩展名,您将其更改为"\\3".

Here, I've "captured" all three parts of the "fileNames" variables, so if you wanted just the file paths, you would change "\\2" to "\\1", and if you wanted just the file extensions, you would change it to "\\3".

这篇关于正则表达式返回文件名,删除路径和文件扩展名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆