递归"&正常化QUOT;文件名 [英] recursively "normalize" filenames
问题描述
我的意思是在文件名中摆脱特殊字符的,等等。
我做了一个脚本,即可以递归重命名文件[http://pastebin.com/raw.php?i=kXeHbDQw]:
例如:之前:
这i.s我的文件(1).TXT
在运行脚本之后:
此-I-S-MY-文件1.txt的
确定。这里是:
但是:当我想充分,测试它像这样的文件名:
<$p$p><$c$c>¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txtáíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&'()*+,:;<=>?@[\\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£.txt
失败[http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ SH renamer.sh directorythathasthefiles
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV:不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and等等
$
所以MV不能处理特殊字符..:\\
我的工作就可以了好几个小时。
有没有人有工作吗? [可以处理在两行字符[文件名]吗?]
MV
处理特殊字符就好了。你的脚本没有。
在没有特定的顺序:
-
您正在使用
找到
来找到所有的目录,而LS
每个目录分开。-
为什么要用
在...深度
如果你能做到的究竟的使用相同的有一个的命令?找到-maxdepth 100型ð
-
这使得任意深度限制不必要的
找到型ð
-
不要的曾经的解析
LS的输出
,尤其的如果你可以让找到
搞定,太- 而不是找ð型
-
请确保它工作在最坏的情况下可能的:
发现 - 不是型ð-print0 |而读-r -d'文件名;做
这将停止
读
吃某些逃逸和窒息与换行字符的文件名。
-
-
您正在重复的全部的
LS |替换
周期的每一个字符的。 不要 - 它杀死的性能。遍历每个目录中的所有文件的一次的,只是用多的,或多次更换一<SED
code> SED 命令。SEDS / A / A / G; S / I / I /克; ......
(我要建议
sed的'Y / AI / AI /
,但不幸的是这似乎不统一code的工作。也许perl的-CS -Mutf8 -pe'Y / AI / AI /
会的。) -
您还在想着在ASCII:等特殊字符 - ASCII $ C $ 33 CS ... ..255的。没有。
-
这些天,大多数系统使用UTF-8编码的Uni code,其中有一个的多的范围更广的特字 - 这么大,通过列出他们一颗颗一个人会毫无意义。 (它甚至的字节的 - E是一个字节,E是三个字节)
-
真正的ASCII有128个字符。你现在心里有是ISO 8859字符集(有时称为ANSI) - 尤其是ISO 8859-1。但他们一路到8859-16,只有ASCII部分保持不变。
-
-
回声-n $(命令)
是相当无用的。 -
有更容易的方法来找到特定的路径的目录和基名。例如,你可以做
目录= $(目录名称$路径)
oldnname = $(基名$路径)
#$过滤器使用oldName
MV$ PATH,$目录/ $ NEWNAME -
待办事项的不的使用
egrep的
来检查错误。检查程序的返回code。 (像你已经做CD
) -
和,而不是过滤掉其他错误,做...
如果[[-e $目录/ $ NEWNAME]];然后
回声的目标已经存在,跳绳:$使用oldName - &GT; $ NEWNAME
继续
其他
MV$ PATH,$目录/ $ NEWNAME
科幻 -
的每吨
SED的/ ------------ / - / G'
呼叫可以被改为单正则表达式:的sed -r的/ - {2} / - / G'
-
的
[]
S INTR [富] [巴]
是不必要的。他们只是引起TR
来代替[
到[
,和]
到]
。 -
真的吗?
回声$ FOLDERNAME| SEDS / $ / \\ // G
这个怎么样呢?
回声$ FOLDERNAME /
最后,使用 排毒
。
i mean getting rid of special chars in filenames, etc.
i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:
e.g.: before:
THIS i.s my file (1).txt
after running the script:
This-i-s-my-file-1.txt
Ok. here it is:
But: when i wanted to test it "fully", with filenames like this:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?@[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£.txt
it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and so on
$
so "mv" can't handle special chars.. :\
i worked on it for many hours..
does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]
mv
handles special characters just fine. Your script doesn't.
In no particular order:
You are using
find
to find all directories, andls
each directory separately.Why use
for DEPTH in...
if you can do exactly the same with one command?find -maxdepth 100 -type d
Which makes the arbitrary depth limit unnecessary
find -type d
Don't ever parse the output of
ls
, especially if you can letfind
handle that, toofind -not -type d
Make sure it works in the worst possible case:
find -not -type d -print0 | while read -r -d '' FILENAME; do
This stops
read
from eating certain escapes and choking on filenames with new-line characters.
You are repeating the entire
ls | replace
cycle for every single character. Don't - it kills performance. Loop overeach directoryall files once, and just use multiplesed
's, or multiple replacements in onesed
command.sed 's/á/a/g; s/í/i/g; ...'
(I was going to suggest
sed 'y/áí/ai/'
, but unfortunately that doesn't seem to work with Unicode. Perhapsperl -CS -Mutf8 -pe 'y/áí/ai/'
would.)You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command)
is rather useless.There are much easier ways to find the directory and basename given a path. For example, you can do
directory=$(dirname "$path") oldnname=$(basename "$path") # filter $oldname mv "$path" "$directory/$newname"
Do not use
egrep
to check for errors. Check the program's return code. (Like you already do withcd
.)And instead of filtering out other errors, do...
if [[ -e $directory/$newname ]]; then echo "target already exists, skipping: $oldname -> $newname" continue else mv "$path" "$directory/$newname" fi
The ton of
sed 's/------------/-/g'
calls can be changed to a single regexp:sed -r 's/-{2,}/-/g'
The
[ ]
s intr [foo] [bar]
are unnecessary. They just causetr
to replace[
to[
, and]
to]
.Seriously?
echo "$FOLDERNAME" | sed "s/$/\//g"
How about this instead?
echo "$FOLDERNAME/"
And finally, use detox
.
这篇关于递归&QUOT;&正常化QUOT;文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!