美丽的汤,prettified HTML为TXT,获得编码错误 [英] Beautiful Soup, prettified html to txt, get encoding error
本文介绍了美丽的汤,prettified HTML为TXT,获得编码错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想保存的HTML文件的prettified打印到一个txt文件,但得到这个错误信息:
回溯(最后最近一次调用):
文件prettyhtmlfiles.py,第16行,上述<&模块GT;
file.write(汤。prettify())
UNI $ C $岑codeError:ASCIIcodeC无法连接code字符U'\\ XBB在8532的位置是:序数不在范围内(128)
我
怎样才能解决这个问题呢?
在code我有:
进口的urllib2
进口OS
从BS4进口BeautifulSoup
导入CSVURL =/home/sveisa/S141test/ayuki.html
开放(URL,'R')为f:
数据= f.read()
汤= BeautifulSoup(开('/家庭/ sveisa / S141test / ayuki.html')。阅读())打印(汤。prettify())
文件=打开(newfile.txt,W)file.write(汤。prettify())
解决方案
试试这个。它应该工作。
打印>>文件(汤。prettify()。EN code(UTF-8))
I'm trying to save a prettified print of a html file, to a txt file, but get this error message:
Traceback (most recent call last):
File "prettyhtmlfiles.py", line 16, in <module>
file.write(soup.prettify())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 8532: ordinal not in range(128)
How can I get around this problem?
The code I have:
import urllib2
import os
from bs4 import BeautifulSoup
import csv
url = "/home/sveisa/S141test/ayuki.html"
with open(url, 'r') as f:
data = f.read()
soup = BeautifulSoup(open('/home/sveisa/S141test/ayuki.html').read())
print(soup.prettify())
file = open("newfile.txt", "w")
file.write(soup.prettify())
解决方案
Try this. It should work.
print >> file, (soup.prettify().encode('utf-8'))
这篇关于美丽的汤,prettified HTML为TXT,获得编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文