写一个函数,以取代硬链接的重复文件 [英] Writing a function to replace duplicate files with hardlinks

查看:150
本文介绍了写一个函数,以取代硬链接的重复文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要写一个bash脚本,通过指定的目录中的文件进行迭代,并且替换文件的副本与硬链接。现在,我的整个功能是这样的:

I need to write a bash script that iterates through the files of a specified directory and replaces duplicates of files with hardlinks. Right now, my entire function looks like this:

#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory

D=$1
cd $D           #go to directory specified as default input

fileNum=0       #loop counter

DIR=".*|*"
for f in $DIR           #for every file in the directory
do
    files[$fileNum]=$f      #save that file into the array
    fileNum=$((fileNum+1))      #increment the counter
done

for((j=0; j<$fileNum; j++))             #for every file
do
    if [ -f "$files[$j]" ]          #access that file in the array
    then
        for((k=0; k<$fileNum; k++))     #for every other file
        do
            if [ -f "$files[$k]" ]      #access other files in the array
            then
                test[cmp -s ${files[$j]} ${files[$k]}]      #compare if the files are identical
                [ln ${files[$j]} ${files[$k]}]          #change second file to a hard link
            fi
        done
    fi
done

基本上是:


  • 遍历指定目录深度为1的所有文件

  • 把文件内容到数组

  • 每个数组项与所有其他数组项比较和硬链接替换重复

test目录中有四个文件:A,B,C,D

The test directory has four files: a, b, c, d

a和b是不同的,但c和d是重复(它们是空的)。运行该脚本后,ls -l命令显示所有的文件仍然只有1固网,所以剧本似乎已经基本上做什么。

a and b are different, but c and d are duplicates (they are empty). After running the script, ls -l shows that all of the files still only have 1 hardlink, so the script appears to have basically done nothing.

我在哪里去了?

推荐答案

您可能会使得这个过程比必要有点困难自己。已经有一个Linux命令 fdupes ,扫描目录在进行逐字节的md5sum,日期和放大器;时间比较,以确定文件是否是彼此的副本。它可以很容易地找到并返回是重复的文件组。你只剩下使用效果。

You may be making this process a bit harder on yourself than necessary. There is already a Linux command fdupes that scans a directory conducting a byte-by-byte, md5sum, date & time comparison to determine whether files are duplicates of one another. It can easily find and return groups of files that are duplicates. Your are left with only using the results.

下面是一个使用这个工具的工作的一个简单的例子。 注意这个简单的例子仅适用于那些文件名不包含在其中的空间。你将不得不如果你正在处理包含空格的文件名进行修改。这是为了表明一种方法,使用的工具,已经做了你想要的东西。此外注意实际 LN 命令如下注释掉。该方案只是打印它会怎么做。测试时,可以删除对 LN 命令一旦你满意的结果后评论。

Below is a quick example of using this tool for the job. NOTE this quick example works only for filenames that do not contain spaces within them. You will have to modify it if you are dealing with filenames containing spaces. This is intended to show an approach to using a tool that already does what you want. Also note the actual ln command is commented out below. The program just prints what it would do. After testing you can remove the comment to the ln command once you are satisfied with the results.

#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory using fdupes

[ -d "$1" ] || {                  # test valid directory supplied
    printf "error: invalid directory '%s'.  usage: %s <dir>\n" "$1" "${0//\//}"
    exit 1
}

type fdupes &>/dev/null || {      # verify fdupes is available in path
    printf "error: 'fdupes' required. Program not found within your path\n"
    exit 1
}

pushd "$1" &>/dev/null            # go to directory specified as default input

declare -a files                  # declare files and dupes array
declare -a dupes

## read duplicate files into files array
IFS=$'\n' read -d '' -a files < <(fdupes --sameline .)

## for each list of duplicates
for ((i = 0; i < ${#files[@]}; i++)); do

    printf "\n duplicate files   %s\n\n" "${files[i]}"

    ## split into original files (no interal 'spaces' allowed in filenames)
    dupes=( ${files[i]} )

    ## for the 1st duplicate on
    for ((j = 1; j < ${#dupes[@]}; j++)); do

        ## create hardlink to original (actual command commented)
        printf "   ln -f %s %s\n" "${dupes[0]}" "${dupes[j]}"
        # ln -f "${dupes[0]}" "${dupes[j]}"

    done

done

exit 0

输出/示例

$ bash rmdupes.sh dat

 duplicate files   ./output.dat ./tmptest ./env4.dat.out

   ln -f ./output.dat ./tmptest
   ln -f ./output.dat ./env4.dat.out

 duplicate files   ./vh.conf ./vhawk.conf

   ln -f ./vh.conf ./vhawk.conf

 duplicate files   ./outfile.txt ./newfile.txt

   ln -f ./outfile.txt ./newfile.txt

 duplicate files   ./z1 ./z1cpy

   ln -f ./z1 ./z1cpy

这篇关于写一个函数,以取代硬链接的重复文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆