只有unicode字符串的第一个字符写入csv [英] Only first character of unicode strings getting written to csv
问题描述
我的问题的坚果是我的脚本不能写完整的unicode字符串(从数据库检索)到csv,而只是每个字符串的第一个字符写入文件。例如:
U,1423.0,831,1,139
其中输出应为:
华盛顿大学学生,1423.0831 ,1,139
一些背景:我使用pyodbc连接到MSSQL数据库。我有我的odbc配置文件设置unicode,并连接到db如下:
p.connect(DSN = myserver; UID = username; PWD = password; DATABASE = mydb; CHARSET = utf-8)
我可以得到数据没有问题,但问题出现,当我尝试保存查询结果到csv文件。我试过使用csv.writer, UnicodeWriter 解决方案官方文档,以及最近在github上找到的 unicodecsv 模块。每个方法产生相同的结果。
奇怪的是,我可以打印字符串在python控制台没有问题。然而,如果我把同样的字符串并写到csv,问题出现了。看我的测试代码&结果如下:
代码突出显示问题:
数据库中的raw字符串:
print\tencoding:\t+ whatisthis(report.data [1] [0])
print\tprint string:\t report.data [1] [0]
print\tstring len:\t+ str(len(report.data [1] [0]))
f = StringIO ()
w = unicodecsv.writer(f,encoding ='utf-8')
w.writerows(report.data)
f.seek(0)
r = unicodecsv。阅读器(f)
row = r.next()
row = r.next()
print从csv文件写入/读取:
print \tencoding:\t+ whatisthis(row [0])
print\tprint string:\t+ row [0]
print\tstring len:\t + str(len(row [0]))
测试输出:
数据库中的Raw字符串:
/ pre>
encoding:unicode string
print string:华盛顿大学学生
string len:66
从csv文件写入/读取:
encoding:unicode string
打印字符串:U
字符串len:1
这个问题的原因是什么,我该如何解决?谢谢!
编辑:whatisthis函数只是检查字符串格式,取自此帖
def whatisthis:
如果isinstance(s,str):
print普通字符串
elif isinstance(s,unicode):
printunicode string
else:
printnot a string
解决方案import StringIO as sio
import unicodecsv as ucsv
class Report(object):
def __init __ ):
self.data = data
report =报告(
[
[华盛顿大学学生,1,2,3],
[UCLA,5,6,7]
]
)
print report.data
print report.data [0] [0]
print** 20
f = sio.StringIO()
writer = ucsv.writer(f,encoding ='utf -8')
writer.writerows(report.data)
print f.getvalue()
print - * 20
f。 seek(0)
reader = ucsv.reader(f)
row = reader.next()
打印行
打印行[0]
--output: -
[[华盛顿大学学生,1,2,3],['UCLA',5,6 ,7]]
华盛顿大学学生
********************
华盛顿大学学生,1,2,3
UCLA,5,6,7
--------------------
[u'University of Washington Students ',u'1',u'2',u'3']
华盛顿大学学生
谁知道你的whatisthis()函数是什么恶作剧。
The nutshell of my problem is that my script cannot write complete unicode strings (retrieved from a db) to a csv, instead only the first character of each string is written to the file. eg:
U,1423.0,831,1,139
Where the output should be:
University of Washington Students,1423.0,831,1,139
Some background: I'm connecting to an MSSQL database using pyodbc. I have my odbc config file set up for unicode, and connect to the db as follows:
p.connect("DSN=myserver;UID=username;PWD=password;DATABASE=mydb;CHARSET=utf-8")
I can get data no problem, but the issue arises when I try to save query results to the csv file. I've tried using csv.writer, the UnicodeWriter solution in the official docs, and most recently, the unicodecsv module I found on github. Each method yields the same results.
The weird thing is I can print the strings in the python console no problem. Yet, if I take that same string and write it to csv, the problem emerges. See my test code & results below:
Code to highlight issue:
print "'Raw' string from database:" print "\tencoding:\t" + whatisthis(report.data[1][0]) print "\tprint string:\t" + report.data[1][0] print "\tstring len:\t" + str(len(report.data[1][0])) f = StringIO() w = unicodecsv.writer(f, encoding='utf-8') w.writerows(report.data) f.seek(0) r = unicodecsv.reader(f) row = r.next() row = r.next() print "Write/Read from csv file:" print "\tencoding:\t" + whatisthis(row[0]) print "\tprint string:\t" + row[0] print "\tstring len:\t" + str(len(row[0]))
Output from test:
'Raw' string from database: encoding: unicode string print string: University of Washington Students string len: 66 Write/Read from csv file: encoding: unicode string print string: U string len: 1
What could be the reason for this issue and how might I resolve it? Thanks!
EDIT: the whatisthis function is just to check the string format, taken from this post
def whatisthis(s): if isinstance(s, str): print "ordinary string" elif isinstance(s, unicode): print "unicode string" else: print "not a string"
解决方案import StringIO as sio import unicodecsv as ucsv class Report(object): def __init__(self, data): self.data = data report = Report( [ ["University of Washington Students", 1, 2, 3], ["UCLA", 5, 6, 7] ] ) print report.data print report.data[0][0] print "*" * 20 f = sio.StringIO() writer = ucsv.writer(f, encoding='utf-8') writer.writerows(report.data) print f.getvalue() print "-" * 20 f.seek(0) reader = ucsv.reader(f) row = reader.next() print row print row[0] --output:-- [['University of Washington Students', 1, 2, 3], ['UCLA', 5, 6, 7]] University of Washington Students ******************** University of Washington Students,1,2,3 UCLA,5,6,7 -------------------- [u'University of Washington Students', u'1', u'2', u'3'] University of Washington Students
Who knows what mischief your whatisthis() function is up to.
这篇关于只有unicode字符串的第一个字符写入csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文