如何在</br>内部获取数据< li>内的标签使用python抓取 [英] How to fetch data inside </br> tag which is inside <li> using python scraping

查看:111
本文介绍了如何在</br>内部获取数据< li>内的标签使用python抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<div class="row features_2 "><br />
        <ul>

                    <li><b>Área privada:</b><br />
                    70,00 m²
                    </li>

            <!--area-->

                    <li><b>Área Const.:</b><br />
                    70,00 m²
                    </li>

            <!--precio metro cuadrado-->

                <li><b>Precio m²:</b><br />
                3.142.857/m²
                </li>

            <!--Valor noche si es alquiler vacacional-->

            <!--precio de administracion -->

                    <li><b>Admón:</b><br />
                    $150,000</li>

            <!--Estrato si aplica-->

                <li><b>Estrato:</b> <br />
                3

            <!--Estado si aplica-->

                <li><b>Estado:</b> <br />
                    Excelente
                </li>

            <!--edad si aplica-->

                <li><b>Antigüedad:</b> <br />
                1 a 8 años</li>

            <!--piso #-->

            <!--Clima-->

            <!--tipo de apartamento si aplica-->

            <!--para parqueaderos-->
            <!--caracteristicas parqueadero-->

            <!--Sector (siempre va)-->
            <li><b>Sector:</b> <br />

                <a href="#pnlMap" style="font-weight: bold;">Ver Mapa</a>

            </li>  


        </ul>

从上面我想在标签中获取值,但是在将值保存到单个列表时遇到了麻烦.

From the above I'd like to get the values inside tag, however I'm having trouble in saving the values to individual list.

我想基于自身内部的数据保存值.

I'd like to save the values based on the data inside the itself.

例如,如果标签包含Áreaprivada:",那么我必须将值"70,00m²"保存到列表名称区域

For example if the tag contains 'Área privada:', then I've to save the value '70,00 m²' to list name area

否则,如果标签包含'Preciom²:',那么我必须将值3.142.857/m²保存到名为Precio的列表中

else if tag contains 'Precio m²:' , then I've to save the value 3.142.857/m² into list named Precio

我已经尝试了以下代码来获取元素,但不确定如何根据上述条件将数据保存到列表中.

I've tried the following code to get the elements, but not sure how to write the condition the save the data into list based on above condition.

import requests
from bs4 import BeautifulSoup
page = requests.get('https://www.fincaraiz.com.co/oceana-52/barranquilla/proyecto-nuevo-det-1041165.aspx')
soup = BeautifulSoup(page.content, 'lxml')
box_3 = soup.find('div' ,'row features_2 ')
box_3_1  = box_3.findAll('li')
for i in box_3_1:
    print (i)

还有其他选择,可以将上述标签中的数据保存到相应的列表中.

Else is there any other option to save the data from the above tag to respective lists.

推荐答案

使用

Use the next_sibling property of <br>:

for li in box_3_1:
    print(str(li.br.next_sibling).strip())

输出:

71,00 a 185,00 m²
78,00 a 207,00 m²
5
Cálido

OP希望将Área"和"Precio"数据存储在单独的列表中.假设这两个字符串永远不会出现在相同的<li>标题中,这是一个完整的解决方案:

OP wanted to store "Área" and "Precio" data in separate lists. Assuming those two strings never appear in the same <li> heading, here's a full solution:

area = []
precio = []
for li in box_3_1:
    heading_words = li.b.text.split()
    target_content = str(li.br.next_sibling).strip()
    if "Área" in heading_words:
        area.append(target_content)
    elif "Precio" in heading_words:
        precio.append(target_content)

对于更通用的解决方案,请考虑列出关键标头术语,然后将所有输出存储在dict中.例如:

For a more general solution, consider making a list of key header terms, and then storing all output in a dict. For example:

import re

key_terms = ["Área", "Precio", "Estrato"]
data = {k:[] for k in key_terms}

for li in box_3_1:
    heading = li.b.text
    target_content = str(li.br.next_sibling).strip()
    for term in key_terms:
        # Headers like "Estrato:" do not match on split() due to end ":"; use re instead.
        if re.search(term, heading):
            data[term].append(target_content)          
data
{'Estrato': ['5'],
 'Precio': [],
 'Área': ['71,00 a 185,00 m²', '78,00 a 207,00 m²']}

这篇关于如何在&lt;/br&gt;内部获取数据&lt; li&gt;内的标签使用python抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆