正则表达式返回文件名,删除路径和文件扩展名 [英] Regex return file name, remove path and file extension
问题描述
我有一个data.frame,其中包含文件名的文本列.我想返回不带路径或文件扩展名的文件名.通常,我的文件名已编号,但不一定要编号.例如:
I have a data.frame that contains a text column of file names. I would like to return the file name without the path or the file extension. Typically, my file names have been numbered, but they don't have to be. For example:
df<-data.frame(data=c("a","b"),fileNames=c("C:/a/bb/ccc/NAME1.ext","C:/a/bb/ccc/d D2/name2.ext"))
我想返回的等价物
df<-data.frame(data=c("a","b"),fileNames=c("NAME","name"))
但是我无法弄清楚使用gsub的光滑正则表达式.例如,我可以使用以下命令来消除扩展名(只要文件名以数字结尾):
but I cannot figure out the slick regular expression to do this with gsub. For example, I can get rid of the extension with (provided the file name ends with a number):
gsub('([0-9]).ext','',df[,"fileNames"])
尽管我一直在尝试各种模式(通过阅读此站点上的regex帮助文件和类似的解决方案),但是我无法获得一个regex来返回最后一个"/"和第一个."之间的文本.对于类似问题的任何想法或建议,我们将不胜感激!
Though I've been trying various patterns (by reading the regex help files and similar solutions on this site), I can't get a regex to return the text between the last "/" and the first ".". Any thoughts or forwards to similar questions are much appreciated!
我得到的最好的是:
gsub('*[[:graph:]_]/|*[[:graph:]_].ext','',df[,"fileNames"])
但这1)并不能消除所有前导路径字符,而2)取决于特定的文件扩展名.
But this 1) doesn't get rid of all the leading path characters and 2) is dependent on a specific file extension.
推荐答案
也许这将使您更接近解决方案:
Perhaps this will get you closer to your solution:
library(tools)
basename(file_path_sans_ext(df$fileNames))
# [1] "NAME1" "name2"
file_path_sans_ext
函数来自工具"包(我相信通常随R附带),它将提取直至(但不包括)扩展名的路径. basename
函数将删除您的路径信息.
The file_path_sans_ext
function is from the "tools" package (which I believe usually comes with R), and that will extract the path up to (but not including) the extension. The basename
function will then get rid of your path information.
或者,要从file_path_sans_ext
获取并对其进行一些修改,可以尝试:
Or, to take from file_path_sans_ext
and modify it a bit, you can try:
sub("(.*\\/)([^.]+)(\\.[[:alnum:]]+$)", "\\2", df$fileNames)
# [1] "NAME1" "name2"
在这里,我已经捕获"了"fileNames"变量的所有三个部分,因此,如果只需要文件路径,则可以将"\\2"
更改为"\\1"
,如果只需要文件扩展名,您将其更改为"\\3"
.
Here, I've "captured" all three parts of the "fileNames" variables, so if you wanted just the file paths, you would change "\\2"
to "\\1"
, and if you wanted just the file extensions, you would change it to "\\3"
.
这篇关于正则表达式返回文件名,删除路径和文件扩展名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!