解析表BeautifulSoup和文本中写入文件 [英] parsing table with BeautifulSoup and write in text file
问题描述
我需要在此格式的文本文件(output.txt的)从表中的数据:
DATA1;数据2;数据3,数据4; .....
Celkova podlahova plocha bytu;33米; Vytah;肛; Nadzemne podlazie; Prizemne podlazie; ......;备考vlastnictva; Osobne
所有一行,分隔符为 (后来以CSV文件导出)
I'm初学者..帮助,谢谢。
从BeautifulSoup进口BeautifulSoup
进口的urllib2
进口codeCS响应= urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/$p$pdaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
HTML = response.read()
汤= BeautifulSoup(HTML)tabulka = soup.find(表,{级:细节字符})在tabulka.findAll('TR')行:
COL = row.findAll('TD')
prvy = COL [0] .string.strip()
druhy = COL [1] .string.strip()
记录=(prvy],[druhy])FL = codecs.open('output.txt的','WB','UTF8')
在记录REC:
行=''
在REC VAL:
行+ = VAL + U';'
fl.write(行+ U'\\ r \\ n)
fl.close()
您不能保留,你在读它的每个记录。试试这个,它存储在记录的记录
:
从BeautifulSoup进口BeautifulSoup
进口的urllib2
进口codeCS响应= urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/$p$pdaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
HTML = response.read()
汤= BeautifulSoup(HTML)tabulka = soup.find(表,{级:细节字符})记录= []#的所有记录存储在该列表
在tabulka.findAll('TR')行:
COL = row.findAll('TD')
prvy = COL [0] .string.strip()
druhy = COL [1] .string.strip()
记录=%S;%s'的%(prvy,druhy)#存储与记录的; prvy和druhy之间
records.append(记录)FL = codecs.open('output.txt的','WB','UTF8')
行=';'。加入(记录)
fl.write(行+ U'\\ r \\ n)
fl.close()
这可以更干净了,但我认为这是你想要的东西。
I need data from table in text file (output.txt) in this format: data1;data2;data3;data4;.....
Celkova podlahova plocha bytu;33m;Vytah;Ano;Nadzemne podlazie;Prizemne podlazie;.....;Forma vlastnictva;Osobne
All in "one line", separator is ";" (later export in csv-file).
I´m beginner.. Help, thanks.
from BeautifulSoup import BeautifulSoup
import urllib2
import codecs
response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
html = response.read()
soup = BeautifulSoup(html)
tabulka = soup.find("table", {"class" : "detail-char"})
for row in tabulka.findAll('tr'):
col = row.findAll('td')
prvy = col[0].string.strip()
druhy = col[1].string.strip()
record = ([prvy], [druhy])
fl = codecs.open('output.txt', 'wb', 'utf8')
for rec in record:
line = ''
for val in rec:
line += val + u';'
fl.write(line + u'\r\n')
fl.close()
You are not keeping each record as you read it in. Try this, which stores the records in records
:
from BeautifulSoup import BeautifulSoup
import urllib2
import codecs
response = urllib2.urlopen('http://www.reality.sk/zakazka/0747-003578/predaj/1-izb-byt/kosice-mestska-cast-sever-sladkovicova-kosice-sever/art-real-1-izb-byt-sladkovicova-ul-kosice-sever')
html = response.read()
soup = BeautifulSoup(html)
tabulka = soup.find("table", {"class" : "detail-char"})
records = [] # store all of the records in this list
for row in tabulka.findAll('tr'):
col = row.findAll('td')
prvy = col[0].string.strip()
druhy = col[1].string.strip()
record = '%s;%s' % (prvy, druhy) # store the record with a ';' between prvy and druhy
records.append(record)
fl = codecs.open('output.txt', 'wb', 'utf8')
line = ';'.join(records)
fl.write(line + u'\r\n')
fl.close()
This could be cleaned up more, but I think it's what you are wanting.
这篇关于解析表BeautifulSoup和文本中写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!