美丽的汤,prettified HTML为TXT,获得编码错误 [英] Beautiful Soup, prettified html to txt, get encoding error

查看:164
本文介绍了美丽的汤,prettified HTML为TXT,获得编码错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想保存的HTML文件的prettified打印到一个txt文件,但得到这个错误信息:

 回溯(最后最近一次调用):
  文件prettyhtmlfiles.py,第16行,上述<&模块GT;
    file.write(汤。prettify())
UNI $ C $岑codeError:ASCIIcodeC无法连接code字符U'\\ XBB在8532的位置是:序数不在范围内(128)

怎样才能解决这个问题呢?

在code我有:

 进口的urllib2
进口OS
从BS4进口BeautifulSoup
导入CSVURL =/home/sveisa/S141test/ayuki.html
开放(URL,'R')为f:
    数据= f.read()
    汤= BeautifulSoup(开('/家庭/ sveisa / S141test / ayuki.html')。阅读())打印(汤。prettify())
文件=打开(newfile.txt,W)file.write(汤。prettify())


解决方案

试试这个。它应该工作。

 打印>>文件(汤。prettify()。EN code(UTF-8))

I'm trying to save a prettified print of a html file, to a txt file, but get this error message:

Traceback (most recent call last):
  File "prettyhtmlfiles.py", line 16, in <module>
    file.write(soup.prettify())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 8532: ordinal not in range(128)

How can I get around this problem?

The code I have:

import urllib2
import os
from bs4 import BeautifulSoup
import csv

url = "/home/sveisa/S141test/ayuki.html"
with open(url, 'r') as f:
    data = f.read()
    soup = BeautifulSoup(open('/home/sveisa/S141test/ayuki.html').read())

print(soup.prettify())


file = open("newfile.txt", "w")

file.write(soup.prettify())

解决方案

Try this. It should work.

print >> file, (soup.prettify().encode('utf-8'))

这篇关于美丽的汤,prettified HTML为TXT,获得编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆