如何获取目录的最新文件 [英] How to wget the more recent file of a directory

查看:107
本文介绍了如何获取目录的最新文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个bash脚本,该脚本可以下载并安装最新的每日程序版本(RStudio).是否可以使wget仅下载目录 http://www.rstudio.org/download/daily/desktop/吗?

I would like to write a bash script that downloads and install the latest daily build of program (RStudio). Is it possible to make wget to download only the most recent file in the directory http://www.rstudio.org/download/daily/desktop/ ?

推荐答案

文件似乎按发布日期排序,每个新版本都是一个新条目,其新名称反映了版本号更改,因此请检查时间戳某个文件似乎是不必要的.

The files seem to be sorted by the release date, with each new release being a new entry with a new name reflecting the version number change, so checking timestamps of a certain file seems unnecessary.

此外,您还提供了指向目录"的链接,该目录实质上是一个网页. AFAIK,http中没有目录(这是一种在给定地址为您提供数据的通信协议)之类的东西.您看到的是服务器生成的清单,类似于Windows文件夹,尽管它仍然是一个网页,但仍易于使用.

Also, you have provided a link to a "directory", which essentially is a web page. AFAIK, there is no such thing as a directory in http (which is a communication protocol serving you data at the given address). What you see is a listing generated by the server that resembles windows folders for the ease of use, though it's still a web page.

话虽如此,您可以抓取该网页.以下代码从列表的第一个位置下载文件(假设第一个是最新的):

Having that said, you can scrape that web page. The following code downloads the file at first position on the listing (assuming the first one is the most recent one):

#!/bin/bash

wget -q -O tmp.html http://www.rstudio.org/download/daily/desktop/ubuntu64/
RELEASE_URL=`cat tmp.html | grep -m 1 -o -E "https[^<>]*?amd64.deb" | head -1`
rm tmp.html

# TODO Check if the old package name is the same as in RELEASE_URL.

# If not, then get the new version.
wget -q $RELEASE_URL

现在,您可以对照本地最新版本进行检查,并在必要时进行安装.

Now you can check it against your local most-recent version, and install if necessary.

更新后的版本,可以进行简单的版本检查并安装软件包.

Updated version, which does simple version checking and installs the package.

#!/bin/bash

MY_PATH=`dirname "$0"`
RES_DIR="$MY_PATH/res"

# Piping from stdout suggested by Chirlo.
RELEASE_URL=`wget -q -O - http://www.rstudio.org/download/daily/desktop/ubuntu64/ | grep -m 1 -o "https[^\']*"`

if [ "$RELEASE_URL" == "" ]; then
    echo "Package index not found. Maybe the server is down?"
    exit 1
fi

mkdir -p "$RES_DIR"
NEW_PACKAGE=${RELEASE_URL##https*/}
OLD_PACKAGE=`ls "$RES_DIR"`

if [ "$OLD_PACKAGE" == "" ] || [ "$OLD_PACKAGE" != "$NEW_PACKAGE" ]; then

    cd "$RES_DIR"
    rm -f $OLD_PACKAGE

    echo "New version found. Downloading..."
    wget -q $RELEASE_URL

    if [ ! -e "$NEW_PACKAGE" ]; then
        echo "Package not found."
        exit 1
    fi

    echo "Installing..."
    sudo dpkg -i $NEW_PACKAGE

else
    echo "rstudio up to date."
fi

和一些评论:

  • 该脚本保留最新版本的本地res/目录(完全是 一个文件),然后将其名称与新抓取的包名称进行比较. 这很脏(拥有文件并不意味着它已经被 过去成功安装).最好解析 dpkg -l的输出,但程序包的名称可能会略有变化 与刮擦的不一样.
  • 您仍然需要输入 sudo的密码,因此不会100%自动.有几个 解决方法,尽管没有监督,您可能会遇到 先前提到的问题.
  • The script keeps a local res/ dir with the latest version (exactly one file) and compares it's name with the newly scraped package name. This is dirty (having a file doesn't mean that it has been successfully installed in the past). It would be better to parse the output of dpkg -l, but the name of the package might slightly differ from the scraped one.
  • You will still need to enter the password for sudo, so it won't be 100% automatic. There are a few ways around this, though without supervision you might encounter the previously stated problem.

这篇关于如何获取目录的最新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆