递归＆QUOT;＆正常化QUOT;文件名 [英] recursively "normalize" filenames

查看：215 发布时间：2016/8/3 11:34:14 linux bash sh

本文介绍了递归＆QUOT;＆正常化QUOT;文件名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的意思是在文件名中摆脱特殊字符的，等等。

我做了一个脚本，即可以递归重命名文件[http://pastebin.com/raw.php?i=kXeHbDQw]：

例如：之前：

 这i.s我的文件（1）.TXT

在运行脚本之后：

 此-I-S-MY-文件1.txt的

确定。这里是：

但是：当我想充分，测试它像这样的文件名：

<$p$p><$c$c>¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&'()*+,:;<=>?@[\\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£.txt

失败[http://pastebin.com/raw.php?i=iu8Pwrnr]：

  $ SH renamer.sh directorythathasthefiles
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’•–—˜™š›œžŸ¡¢£':无此文件或目录
MV：不能与stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!\"#$%&\\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and等等
$

所以MV不能处理特殊字符..：\\

我的工作就可以了好几个小时。

有没有人有工作吗？ [可以处理在两行字符[文件名]吗？]

解决方案

MV 处理特殊字符就好了。你的脚本没有。

在没有特定的顺序：

您正在使用找到来找到所有的目录，而 LS 每个目录分开。
1. 为什么要用在...深度如果你能做到的究竟的使用相同的有一个的命令？
```
 找到-maxdepth 100型ð
 
```
2. 这使得任意深度限制不必要的
```
 找到型ð
 
```
3. 不要的曾经的解析 LS的输出，尤其的如果你可以让找到搞定，太
```
   - 而不是找ð型
 
```
4. 请确保它工作在最坏的情况下可能的：
```
 发现 - 不是型ð-print0 |而读-r -d'文件名;做
 
```
  这将停止读吃某些逃逸和窒息与换行字符的文件名。

您正在重复的全部的 LS |替换周期的每一个字符的。不要 - 它杀死的性能。遍历~~每个目录~~中的所有文件的一次的，只是用多的，或多次更换一< SED code> SED 命令。
```
  SEDS / A / A / G; S / I / I /克; ......
 
```
（我要建议 sed的'Y / AI / AI / ，但不幸的是这似乎不统一code的工作。也许 perl的-CS -Mutf8 -pe'Y / AI / AI / 会的。）

您还在想着在ASCII：等特殊字符 - ASCII $ C $ 33 CS ... ..255的。没有。
1. 这些天，大多数系统使用UTF-8编码的Uni code，其中有一个的多的范围更广的特字 - 这么大，通过列出他们一颗颗一个人会毫无意义。（它甚至的字节的 - E是一个字节，E是三个字节）
2. 真正的ASCII有128个字符。你现在心里有是ISO 8859字符集（有时称为ANSI） - 尤其是ISO 8859-1。但他们一路到8859-16，只有ASCII部分保持不变。

回声-n $（命令）是相当无用的。

有更容易的方法来找到特定的路径的目录和基名。例如，你可以做

 目录= $（目录名称$路径）
oldnname = $（基名$路径）
＃$过滤器使用oldName
MV$ PATH，$目录/ $ NEWNAME

待办事项的不的使用 egrep的来检查错误。检查程序的返回code。（像你已经做 CD ）

和，而不是过滤掉其他错误，做...

 如果[[-e $目录/ $ NEWNAME]];然后
    回声的目标已经存在，跳绳：$使用oldName  - ＆GT; $ NEWNAME
    继续
其他
    MV$ PATH，$目录/ $ NEWNAME
科幻

的每吨 SED的/ ------------ / - / G'呼叫可以被改为单正则表达式：
```
 的sed -r的/  -  {2} /  -  / G'
 
```

的 [] S IN TR [富] [巴] 是不必要的。他们只是引起 TR 来代替 [到 [，和] 到] 。

真的吗？

 回声$ FOLDERNAME| SEDS / $ / \\ // G

这个怎么样呢？

 回声$ FOLDERNAME /

最后，使用 排毒 。

i mean getting rid of special chars in filenames, etc.

i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:

e.g.: before:

THIS i.s my file (1).txt

after running the script:

This-i-s-my-file-1.txt

Ok. here it is:

But: when i wanted to test it "fully", with filenames like this:

¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?@[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£.txt

it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:

$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’""•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and so on
$

so "mv" can't handle special chars.. :\

i worked on it for many hours..

does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]

解决方案

mv handles special characters just fine. Your script doesn't.

In no particular order:

You are using find to find all directories, and ls each directory separately.
1. Why use for DEPTH in... if you can do exactly the same with one command?
```
find -maxdepth 100 -type d
```
2. Which makes the arbitrary depth limit unnecessary
```
find -type d
```
3. Don't ever parse the output of ls, especially if you can let find handle that, too
```
find -not -type d
```
4. Make sure it works in the worst possible case:
```
find -not -type d -print0 | while read -r -d '' FILENAME; do
```
  This stops read from eating certain escapes and choking on filenames with new-line characters.
You are repeating the entire ls | replace cycle for every single character. Don't - it kills performance. Loop over ~~each directory~~ all files once, and just use multiple sed's, or multiple replacements in one sed command.
```
sed 's/á/a/g; s/í/i/g; ...'
```
(I was going to suggest sed 'y/áí/ai/', but unfortunately that doesn't seem to work with Unicode. Perhaps perl -CS -Mutf8 -pe 'y/áí/ai/' would.)
You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
1. These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
2. True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command) is rather useless.

There are much easier ways to find the directory and basename given a path. For example, you can do

directory=$(dirname "$path")
oldnname=$(basename "$path")
# filter $oldname
mv "$path" "$directory/$newname"

Do not use egrep to check for errors. Check the program's return code. (Like you already do with cd.)

And instead of filtering out other errors, do...

if [[ -e $directory/$newname ]]; then
    echo "target already exists, skipping: $oldname -> $newname"
    continue
else
    mv "$path" "$directory/$newname"
fi

The ton of sed 's/------------/-/g' calls can be changed to a single regexp:
```
sed -r 's/-{2,}/-/g'
```
The [ ]s in tr [foo] [bar] are unnecessary. They just cause tr to replace [ to [, and ] to ].

Seriously?

echo "$FOLDERNAME" | sed "s/$/\//g"

How about this instead?

echo "$FOLDERNAME/"

And finally, use detox.

这篇关于递归＆QUOT;＆正常化QUOT;文件名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

递归＆QUOT;＆正常化QUOT;文件名 [英] recursively "normalize" filenames

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

递归＆QUOT;＆正常化QUOT;文件名 [英] recursively &quot;normalize&quot; filenames

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

递归＆QUOT;＆正常化QUOT;文件名 [英] recursively "normalize" filenames

登录关闭