Windows文件路径的Python正则表达式 [英] Python regular expression for Windows file path

查看:895
本文介绍了Windows文件路径的Python正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题(使用正则表达式可能无法轻松解决)是我希望能够从任意字符串中提取Windows文件路径.我能够(我尝试过其他方法)最接近的方法是使用以下正则表达式:

The problem, and it may not be easily solved with a regex, is that I want to be able to extract a Windows file path from an arbitrary string. The closest that I have been able to come (I've tried a bunch of others) is using the following regex:

[a-zA-Z]:\\([a-zA-Z0-9() ]*\\)*\w*.*\w*

将选择文件的开头,并设计为查看字符串的模式(在初始驱动器号之后),后跟反斜杠,并以文件名,可选点和可选扩展名结尾.

Which picks up the start of the file and is designed to look at patterns (after the initial drive letter) of strings followed by a backslash and ending with a file name, optional dot, and optional extension.

接下来就是困难了.由于最大路径长度为260个字符,因此我只需要计算起始位置以外的260个字符.但是,由于文件名中允许使用空格(和其他字符),因此我需要确保没有其他反斜杠,这些反斜杠可能表明先前的字符是文件夹的名称,而其后的不是文件名本身

The difficulty is what happens, next. Since the maximum path length is 260 characters, I only need to count 260 characters beyond the start. But since spaces (and other characters) are allowed in file names I would need to make sure that there are no additional backslashes that could indicate that the prior characters are the name of a folder and that what follows isn't the file name, itself.

我可以肯定没有完美的隔离感(完美是善良的敌人),但我想知道是否有人可以提出最佳可能"的解决方案?

I am pretty certain that there isn't a perfect solition (the perfect being the enemy of the good) but I wondered if anyone could suggest a "best possible" solution?

推荐答案

以下是基于您的表达式,使我能够在Windows上获取路径:[a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).*.此处提供了使用示例: https://regex101.com/r/SXUlVX/1

Here's the expression I got, based on yours, that allow me to get the path on windows : [a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).* . An example of it being used is available here : https://regex101.com/r/SXUlVX/1

首先,我将捕获组从([a-zA-Z0-9() ]*\\)*更改为((?:[a-zA-Z0-9() ]*\\)*).
您的原始表达式一个接一个地捕获每个XXX\(例如:Users\ Users\).
我的与(?:[a-zA-Z0-9() ]*\\)*匹配.这使我可以在捕获之前捕获XXX\YYYY\ZZZ\的串联.这样,它可以让我获得完整的路径.

First, I changed the capture group from ([a-zA-Z0-9() ]*\\)* to ((?:[a-zA-Z0-9() ]*\\)*).
Your original expression captures each XXX\ one after another (eg : Users\ the Users\).
Mine matches (?:[a-zA-Z0-9() ]*\\)*. This allows me to capture the concatenation of XXX\YYYY\ZZZ\ before capturing. As such, it allows me to get the full path.

我所做的第二个更改与文件名有关:我将匹配不包含\的任何字符组(捕获组为贪婪的).这使我可以处理奇怪的文件名.

The second change I made is related to the filename : I'll just match any group of character that does not contain \ (the capture group being greedy). This allows me to take care of strange file names.

另一个可行的正则表达式为:[a-zA-Z]:\\((?:.*?\\)*).*,如以下示例所示: https://regex101.com/r/SXUlVX/2

Another regex that would work would be : [a-zA-Z]:\\((?:.*?\\)*).* as shown in this example : https://regex101.com/r/SXUlVX/2

这次,我使用.*?\\匹配路径的XXX\部分.
.*?将以非贪婪的方式进行匹配:因此,.*?\\将与文本的最短部分匹配,后跟一个反斜杠.

This time, I used .*?\\ to match the XXX\ parts of the path.
.*? will match in a non-greedy way : thus, .*?\\ will match the bare minimum of text followed by a back-slash.

如果对表达式有任何疑问,请不要犹豫.
我也鼓励您尝试使用以下方法查看表达式的效果: https://regex101.com .这也列出了您可以在正则表达式中使用的不同令牌.

Do not hesitate if you have any question regarding the expressions.
I'd also encourage you to try to see how well your expression works using : https://regex101.com . This also has a list of the different tokens you can use in your regex.

由于我以前的回答没有用(尽管我需要花一些时间来找出确切的原因),所以我正在寻找另一种方法来做您想要的事情.我设法使用字符串拆分和连接来实现.
命令是"\\".join(TARGETSTRING.split("\\")[1:-1]).
工作原理:将原始字符串分成多个子字符串列表.然后,我删除第一部分和最后一部分(从第二个元素到第二个元素的[1:-1],然后将结果列表转换回字符串).

Edit : As my previous answer did not work (though I'll need to spend some times to find out exactly why), I looked for another way to do what you want. And I managed to do so using string splitting and joining.
The command is "\\".join(TARGETSTRING.split("\\")[1:-1]).
How does this work : Is plit the original string into a list of substrings, based. I then remove the first and last part ([1:-1]from 2nd element to the one before the last) and transform the resulting list back into a string.

无论给出的值是路径还是文件的完整地址,此方法均有效. Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred是文件路径 Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred\是目录路径

This works, whether the value given is a path or the full address of a file. Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred is a file path Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred\ is a directory path

这篇关于Windows文件路径的Python正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆