将数据从一个文件传输到另一个文件(Bash)-Web搜集 [英] Get data from one file to another (Bash) - Web Scraping

查看:62
本文介绍了将数据从一个文件传输到另一个文件(Bash)-Web搜集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 bash 进行网页抓取.我将这些URL保存在名为 URL.txt 的文件中.

I am doing web scraping with bash. I have these URLs saved in a file called URL.txt.

?daypartId=1&catId=1
?daypartId=1&catId=11
?daypartId=1&catId=2

我想将这些URL传递到另一个文件 main.sh 中的数组,该文件将附加在基本URL中 https://www.mcdelivery.com.pk/pk/browse/menu.html **(在此附加)** .我想将所有 URL.txt 文件中的URl一一追加到基本URL的末尾.

I want to pass these URL to an array in another file main.sh which would append in the base URL https://www.mcdelivery.com.pk/pk/browse/menu.html**(append here)**. I want to append all the URl in URL.txt file in the end of the base URL one by one.

我想出了从 URL.txt 中提取URL的代码,但是无法将其一个接一个地附加到基本URL.

I have come up with the code to extract the URL from the URL.txt but it is unable to append it to the base URL one by one.

#!/bin/bash
ARRAY=()
while read -r LINE
do
    ARRAY+=("$LINE")
done < URL.txt

for LINE in "${ARRAY[@]}"
do    
    echo $LINE
    curl https://www.mcdelivery.com.pk/pk/browse/menu.html$LINE | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done  

仅需要循环帮助,以便我可以在 main.sh 文件中基本URL的末尾在 URL.txt 文件中附加不同的URL.

Just need help with the loop so that i can append different URL in URL.txt file at the end of the base URL in the main.sh file.

推荐答案

关于您的grep |sed无法帮助,因为不知道预期的输出

regarding your grep | sed can't help because don't know expected output

此示例演示了为什么将URL传递给curl而不附加URI

this is example to demonstrate why URL is passed to curl without appending URI

#!/bin/bash

# just for demo
> URI.txt
URI='?daypartId=1&amp;catId='
URL=https://www.mcdelivery.com.pk/pk/browse/menu.html

# just for demo
for id in 1 11 2
  do
    echo -e "${URI}${id}" | tee -a URI.txt
    # reason why it fails
    echo -e "\n\n\n" >> URI.txt
done

ARRAY=()
while read -r LINE || [[ -n $LINE ]]
do
    ## how to prevent
    #[ "$LINE" ] && \
    ARRAY+=("$LINE")
done < URI.txt

for LINE in "${ARRAY[@]}"
  do
    # just for demo
    echo -e "LINE='$LINE'"
    # skipt empty lines
    [ "$LINE" ] && curl "${URL}${LINE}" | grep -o '<span class="starting-price">.*</span>' | sed 's/<[^>]\+>//g' >> price.txt 
done

exit 0

这篇关于将数据从一个文件传输到另一个文件(Bash)-Web搜集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆