Grep命令问题-程序输出中的Grep文本? [英] Grep command questions - Grep text from program output?

查看:106
本文介绍了Grep命令问题-程序输出中的Grep文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从json文件youtube-dl中提取信息,并将其中的一些信息grep到.txt文件中.

I'm trying to extract information from the json file youtube-dl and grep some information from it to a .txt file.

下载视频时youtube-dl的输出示例.

Example the output from youtube-dl when downloading a video.

[info] Writing video description to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).description
[info] Writing video description metadata as JSON to: /Users/ACCOUNT/Downloads/Rick Astley - Never Gonna Give You Up (Video).info.json

我的想法

  1. Grep .json和.description文件路径,以在以后的grep命令中使用.
  2. 运行以下脚本的工作版本,并将新文本添加到.description文件中的描述文本上方.
  3. (将.description重命名为.txt)

我更喜欢这种方法,因为youtube-dl仅需要运行一次.

I prefer this method because youtube-dl is only needed to run one time.

如果在Mac和Linux上还有其他通用命令(例如grep)可以使它变得简单,那么我认为使用它们代替grep没问题.

If there are other univeral commands that work on mac and Linux as grep that can make it simple then I see no problem to use them instead of grep.

问题

  • 如何grep文件路径并在脚本示例中下文所述的其他命令中使用它?
  • 如何运行以下脚本,但如何在该文本文件中的当前描述文本上方添加所有信息?
  • 当它从json文件中获取信息时,它也会得到之前和之后.因此,视频名称变为:"VIDEO NAME",但只希望它为VIDEO NAME.
  • 如何从json文件中复制标签?标签在.json文件:"tags": ["music", "video", "classic"]中看起来像这样.想要获得"music", "video", "classic".
  • How to grep the file paths and use it in other commands described below in the script examples?
  • How to run the script below but adding all that information above the current description text in that text file?
  • When it get information from the json file it also gets " before and after. So a video name becomes: "VIDEO NAME", but want it VIDEO NAME only.
  • How to grep the TAGS from the json file? Tags look like this in .json file: "tags": ["music", "video", "classic"]. Want to get "music", "video", "classic".

脚本示例

    txtfile="$GREP_DESCRIPTION_FROM_YOUTUBE-DL_OUTPUT"
    jsonfile="$GREP_JSON_FROM_YOUTUBE-DL_OUTPUT"

    echo TITLE >> $txtfile
    grep -o '"title": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo CHANNEL >> $txtfile
    grep -o '"uploader": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo CHANNEL URL >> $txtfile
    grep -o '"uploader_url": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo UPLOAD DATE >> $txtfile
    grep -o '"upload_date": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo TAGS >> $txtfile
    grep -o '"tags": *"[^"]*"' $jsonfile | grep -o '"[^"]*"$' >> $txtfile
    echo \ >> $txtfile
    
    echo URL >> $txtfile
    echo $url >> $txtfile
    echo \ >> $txtfile
    
    echo DESCRIPTION >> $txtfile

推荐答案

youtube-dl --help | grep "dump-json"
    -j, --dump-json                  Simulate, quiet but print JSON information.

使用此选项,根本不需要下载视频.只需将youtube-dl的输出传递到适当的JSON解析器即可.我建议.

With this option there's no need to download a video at all. Simply pipe the output of youtube-dl to a proper JSON parser. I would recommend xidel.

youtube-dl -j https://www.youtube.com/watch?v=dQw4w9WgXcQ | xidel - -se '
  $json/(
    "- TITLE -",
    title,"",
    "- CHANNEL -",
    uploader,"",
    "- CHANNEL URL -",
    uploader_url,"",
    "- UPLOAD DATE -",
    upload_date,"",
    "- URL -",
    webpage_url,"",
    "- TAGS -",
    substring-before(
      substring(serialize-json(tags),2),
      "]"
    ),"",
    "- DESCRIPTION -",
    description
  )
'

如果您已经下载了视频和JSON(我以--write-info-json为前提),则可以使用--get-filename检索文件名:

If you already downloaded the video and JSON (with --write-info-json I presume), then you can retrieve the filename with --get-filename:

youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ
Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.mp4

jsonfile=$(youtube-dl --get-filename https://www.youtube.com/watch?v=dQw4w9WgXcQ)

xidel -s "${jsonfile/.mp4/.info}.json" -e '
  $json/(
    [...]
  )
' > "${jsonfile/.mp4/.info}.txt"

命令输出或"里克·阿斯特利-永远不会放弃你(视频)-dQw4w9WgXcQ.info.txt "的内容:

Output of command, or content of 'Rick Astley - Never Gonna Give You Up (Video)-dQw4w9WgXcQ.info.txt':

- TITLE -
Rick Astley - Never Gonna Give You Up (Video)

- CHANNEL -
RickAstleyVEVO

- CHANNEL URL -
http://www.youtube.com/user/RickAstleyVEVO

- UPLOAD DATE -
20091024

- URL -
https://www.youtube.com/watch?v=dQw4w9WgXcQ

- TAGS -
"the boys soundtrack", "the boys amazon prime", "Never gonna give you up the boys", "RickAstleyvevo", "vevo", "official", "Rick Roll", "video", "music video", "Rick Astley album", "rick astley official", "single", "album", "together forever", "Never Gonna Give You Up", "Whenever You Need Somebody", "pop", "rickrolled", "WRECK-IT RALPH 2", "Fortnite song Fortnite item shop Fortnite time shop today Fortnite montage", "Fortnite event", "Fortnite dance", "fortnite never gonna give you up"

- DESCRIPTION -
Rick Astley's official music video for "Never Gonna Give You Up" Listen to Rick Astley: https://RickAstley.lnk.to/_listenYD Subscribe to the official Rick As...

实际上,如果您只需要这些信息,就不需要youtube-dl.解析html-source就足够了.

Actually there's no need for youtube-dl if this information is all you're after. Parsing the html-source would suffice.

xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
  "- TITLE -",
  //meta[@itemprop="name"]/@content,"",
  "- CHANNEL -",
  //span[@itemprop="author"]/link/@content,"",
  "- CHANNEL URL -",
  //span[@itemprop="author"]/link/@href,"",
  "- UPLOAD DATE -",
  //meta[@itemprop="datePublished"]/@content,"",
  "- URL -",
  //meta[@property="og:url"]/@content,"",
  "- TAGS -",
  join(
    //meta[@property="og:video:tag"]/outer-html() ! substring-before(
      substring-after(.,"content=")
      ,">"
    ),
    ", "
  ),"",
  "- DESCRIPTION -",
  //meta[@itemprop="description"]/@content
'

html源还具有包含所需所有信息的巨大JSON.提取起来有点困难,但是可以做到.与其他两种解决方案相比,这种来源"是可替代的.没有截断的视频描述:

The html-source also has a huge JSON with all the information you need. It's a bit more difficult to extract, but can be done. Compared to the other two solutions, this "source" doesn't have a truncated video description:

xidel -s https://www.youtube.com/watch?v=dQw4w9WgXcQ -e '
  let $json:=json(
        //script/extract(.,"ytplayer.config = (.+?\});",1)[.]
      )/args,
      $a:=json($json/player_response)/videoDetails,
      $b:=json($json/player_response)/microformat
  return (
    "- TITLE -",
    $a/title,"",
    "- CHANNEL -",
    $a/author,"",
    "- CHANNEL URL -",
    $b//ownerProfileUrl,"",
    "- UPLOAD DATE -",
    $b//publishDate,"",
    "- URL -",
    $json/loaderUrl,"",
    "- TAGS -",
    substring-before(
      substring(serialize-json($a/keywords),2),
      "]"
    ),"",
    "- DESCRIPTION -",
    $a/shortDescription
  )
'

这篇关于Grep命令问题-程序输出中的Grep文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆