使用< title>重命名HTML文件.标签 [英] Renaming HTML files using <title> tags

查看:135
本文介绍了使用< title>重命名HTML文件.标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编程的新手.我有一个包含子文件夹的文件夹,其中包含数千个通用名称为html的文件,即1006.htm,1007.htm,我想使用文件中的标签对其进行重命名.

I'm a relatively new to programming. I have a folder, with subfolders, which contain several thousand html files that are generically named, i.e. 1006.htm, 1007.htm, that I would like to rename using the tag from within the file.

例如,如果文件1006.htm包含Page Title,我想将其重命名为Page Title.htm.理想情况下,将空格替换为破折号.

For example, if file 1006.htm contains Page Title , I would like to rename it Page Title.htm. Ideally spaces are replaced with dashes.

我一直在使用bash脚本在shell中工作,但是没有运气.我该如何使用bash或python?

I've been working in the shell with a bash script with no luck. How do I do this, with either bash or python?

这是我到目前为止所拥有的.

this is what I have so far..

#!/usr/bin/env bashFILES=/Users/Ben/unzipped/*
for f in $FILES
do
   if [ ${FILES: -4} == ".htm" ]
      then
    awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $FILES
   fi
done

我也尝试过

#!/usr/bin/env bash
for f in *.html;
   do
   title=$( grep -oP '(?<=<title>).*(?=<\/title>)' "$f" )
   mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html   
done

但是从终端上我得到了一个错误,提示如何使用grep ...

But I get an error from the terminal exlaing how to use grep...

推荐答案

在bash脚本中使用awk代替grep,它应该可以工作:

use awk instead of grep in your bash script and it should work:

#!/bin/bash   
for f in *.html;
   do
   title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
   mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html   
done

别忘了在第一行更改bash env;)

don't forget to change your bash env on the first line ;)

编辑所有修改后的完整答案

#!/bin/bash
for f in `find . -type f | grep \.html`
   do
   title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
   mv -i "$f" "${title//[ ]/-}".html
done

这篇关于使用&lt; title&gt;重命名HTML文件.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆