Scrapy:提取带有特殊字符的文本 [英] Scrapy: extract text with special characters

查看：73 发布时间：2021/7/16 22:20:22 python json web-scraping scrapy

本文介绍了Scrapy:提取带有特殊字符的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Scrapy 从一些西班牙网站提取文本.显然，文本是用西班牙语写的，有些单词有特殊字符，如ñ"或í".我的问题是，当我在命令行中运行时:scrapy crawl econoticia -o prueba.json要获取包含抓取数据的文件，某些字符未以正确方式显示.例如:这是原文La exministra，procesada como partícipe a titulo lucrativo，intenta burlar a los fotógrafos"这是刮掉的文字La exministra，procesada como part\u00edcipe a titulo lucrativo，intenta burlar a los fot\u00f3grafos"我希望返回一个带有特殊字符的 json.我认为我的 spyder 代码需要一些东西来以正确的方式获取 json.这是我的间谍代码:

I'm using Scrapy for extract text from some spanish websites. Obviously, the text is written in spanish and some words have special characters like 'ñ' or 'í'. My problem is that when I run in the command line: scrapy crawl econoticia -o prueba.json to get the file with the scraped data, some characters are not shown in a proper way. For example: This is the original text "La exministra, procesada como partícipe a titulo lucrativo, intenta burlar a los fotógrafos" And this is the text scraped "La exministra, procesada como part\u00edcipe a titulo lucrativo, intenta burlar a los fot\u00f3grafos" I wish to return a json with the special characters. I presume that my spyder code need something to get the json in the right way. This is my spyder code:

# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import HtmlXPathSelector
from pais.items import PaisItem


class NoticiaSpider(scrapy.Spider):
   name = "noticia"
   allowed_domains = ["elpais.com"]
start_urls = (...

)

def parse(self, response):

    hxs = HtmlXPathSelector(response)        
    item= PaisItem()
    item['subtitulo']=hxs.select('//*[@id="merc"]/div[2]/div[4]/div[1]/div[1]/span/text()').extract()
    item['titular']=hxs.select('//*[@id="merc"]/div[2]/div[4]/div[1]/div[3]/div[2]/div[1]/h1/a/text()').extract()
    return item

Scrapy:提取带有特殊字符的文本 [英] Scrapy: extract text with special characters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy:提取带有特殊字符的文本 [英] Scrapy: extract text with special characters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭