Windows文件路径的Python正则表达式 [英] Python regular expression for Windows file path
问题描述
这个问题(使用正则表达式可能无法轻松解决)是我希望能够从任意字符串中提取Windows文件路径.我能够(我尝试过其他方法)最接近的方法是使用以下正则表达式:
The problem, and it may not be easily solved with a regex, is that I want to be able to extract a Windows file path from an arbitrary string. The closest that I have been able to come (I've tried a bunch of others) is using the following regex:
[a-zA-Z]:\\([a-zA-Z0-9() ]*\\)*\w*.*\w*
将选择文件的开头,并设计为查看字符串的模式(在初始驱动器号之后),后跟反斜杠,并以文件名,可选点和可选扩展名结尾.
Which picks up the start of the file and is designed to look at patterns (after the initial drive letter) of strings followed by a backslash and ending with a file name, optional dot, and optional extension.
接下来就是困难了.由于最大路径长度为260个字符,因此我只需要计算起始位置以外的260个字符.但是,由于文件名中允许使用空格(和其他字符),因此我需要确保没有其他反斜杠,这些反斜杠可能表明先前的字符是文件夹的名称,而其后的不是文件名本身
The difficulty is what happens, next. Since the maximum path length is 260 characters, I only need to count 260 characters beyond the start. But since spaces (and other characters) are allowed in file names I would need to make sure that there are no additional backslashes that could indicate that the prior characters are the name of a folder and that what follows isn't the file name, itself.
我可以肯定没有完美的隔离感(完美是善良的敌人),但我想知道是否有人可以提出最佳可能"的解决方案?
I am pretty certain that there isn't a perfect solition (the perfect being the enemy of the good) but I wondered if anyone could suggest a "best possible" solution?
推荐答案
以下是基于您的表达式,使我能够在Windows上获取路径:[a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).*
.此处提供了使用示例: https://regex101.com/r/SXUlVX/1
Here's the expression I got, based on yours, that allow me to get the path on windows : [a-zA-Z]:\\((?:[a-zA-Z0-9() ]*\\)*).*
. An example of it being used is available here : https://regex101.com/r/SXUlVX/1
首先,我将捕获组从([a-zA-Z0-9() ]*\\)*
更改为((?:[a-zA-Z0-9() ]*\\)*)
.
您的原始表达式一个接一个地捕获每个XXX\
(例如:Users\
Users\
).
我的与(?:[a-zA-Z0-9() ]*\\)*
匹配.这使我可以在捕获之前捕获XXX\YYYY\ZZZ\
的串联.这样,它可以让我获得完整的路径.
First, I changed the capture group from ([a-zA-Z0-9() ]*\\)*
to ((?:[a-zA-Z0-9() ]*\\)*)
.
Your original expression captures each XXX\
one after another (eg : Users\
the Users\
).
Mine matches (?:[a-zA-Z0-9() ]*\\)*
. This allows me to capture the concatenation of XXX\YYYY\ZZZ\
before capturing. As such, it allows me to get the full path.
我所做的第二个更改与文件名有关:我将匹配不包含\
的任何字符组(捕获组为贪婪的).这使我可以处理奇怪的文件名.
The second change I made is related to the filename : I'll just match any group of character that does not contain \
(the capture group being greedy). This allows me to take care of strange file names.
另一个可行的正则表达式为:[a-zA-Z]:\\((?:.*?\\)*).*
,如以下示例所示: https://regex101.com/r/SXUlVX/2
Another regex that would work would be : [a-zA-Z]:\\((?:.*?\\)*).*
as shown in this example : https://regex101.com/r/SXUlVX/2
这次,我使用.*?\\
匹配路径的XXX\
部分.
.*?
将以非贪婪的方式进行匹配:因此,.*?\\
将与文本的最短部分匹配,后跟一个反斜杠.
This time, I used .*?\\
to match the XXX\
parts of the path.
.*?
will match in a non-greedy way : thus, .*?\\
will match the bare minimum of text followed by a back-slash.
如果对表达式有任何疑问,请不要犹豫.
我也鼓励您尝试使用以下方法查看表达式的效果: https://regex101.com .这也列出了您可以在正则表达式中使用的不同令牌.
Do not hesitate if you have any question regarding the expressions.
I'd also encourage you to try to see how well your expression works using : https://regex101.com . This also has a list of the different tokens you can use in your regex.
由于我以前的回答没有用(尽管我需要花一些时间来找出确切的原因),所以我正在寻找另一种方法来做您想要的事情.我设法使用字符串拆分和连接来实现.
命令是"\\".join(TARGETSTRING.split("\\")[1:-1])
.
工作原理:将原始字符串分成多个子字符串列表.然后,我删除第一部分和最后一部分(从第二个元素到第二个元素的[1:-1]
,然后将结果列表转换回字符串).
Edit : As my previous answer did not work (though I'll need to spend some times to find out exactly why), I looked for another way to do what you want. And I managed to do so using string splitting and joining.
The command is "\\".join(TARGETSTRING.split("\\")[1:-1])
.
How does this work : Is plit the original string into a list of substrings, based. I then remove the first and last part ([1:-1]
from 2nd element to the one before the last) and transform the resulting list back into a string.
无论给出的值是路径还是文件的完整地址,此方法均有效.
Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred
是文件路径
Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred\
是目录路径
This works, whether the value given is a path or the full address of a file.
Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred
is a file path
Program Files (x86)\\Adobe\\Acrobat Distiller\\acrbd.exe fred\
is a directory path
这篇关于Windows文件路径的Python正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!