通过curl和cut在unix的HTML标记中抓取信息 [英] Scraping information within HTML tags in unix with curl and cut

查看：64 发布时间：2021/5/9 20:49:45 bash unix awk sed cut

本文介绍了通过curl和cut在unix的HTML标记中抓取信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想弄清楚 最简单 的方法，该方法可以从Unix中的远程资源中获取HTML/XML标签之间的内容.这就是我认为它将起作用的方式.

I'd like to figure out the simplest way to grab content between HTML/XML tags from a remote resource in unix. Here's how I thought it would work.

curl https://www.google.com | grep "<title>" | cut -d '<title>' -f2 | cut -d '</title>' -f1

因此，我首先资源，用我想要的标签grep行(有时表示整个HTML，因为这几天许多网站都被缩小了).

So first I curl the resource, grep out the line with the tag I want (which sometimes means the whole HTML, because many websites are minified these days).

然后我切开了< title> 之后的所有内容，最后我切掉了''之后的所有内容.

Then I cut out everything after the opening <title> and finally I cut out anything after the closing ''.

但这会导致错误: curl:(23)书写体失败(0！= 8192).该错误似乎是由于 cut 不支持除单个字符之外的任何分隔符而引起的.

This causes an error though: curl: (23) Failed writing body (0 != 8192). The error seems to happen because cut does not support delimiters that are anything but a single character.

我喜欢我的方法是一种非常简单的方法来获取我想要的数据，并且正在寻找另一种简单的方法来实现它.

I liked how my approach was a very minimalist way to grab the data I wanted and am looking for another simple way to do it.

此外，即使它确实起作用，我的解决方案也仅适用于HTML中的单个标记.对于出现多次的标签该怎么办?

Additionally, even if it did work, my solution would only work for a single tag in the HTML. What about for a tag that appears many times?

通过curl和cut在unix的HTML标记中抓取信息 [英] Scraping information within HTML tags in unix with curl and cut

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

通过curl和cut在unix的HTML标记中抓取信息 [英] Scraping information within HTML tags in unix with curl and cut

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭