损坏的CSV,我如何解决它? [英] Broken CSV, how can I fix it?

查看:223
本文介绍了损坏的CSV,我如何解决它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析CSV。我想把它放入一个数据库或只是用JavaScript解析,但任何一种方式失败,由于语法破碎。我的整个CSV文件位于此处:

https://gist.github.com/1023560

I'm trying to parse a CSV. I'd like to get it into a DB or just parse it with JavaScript, but either way fails due to the broken syntax. My entire CSV file is here:
https://gist.github.com/1023560

如果您注意到,它会破坏双引号中有双引号的地方,并且在插入MySQL时也会失败。第一个断点在第13行。它断开,而不是返回完整的:

If you notice, it breaks where there are double quotes in the double quotes and it also fails when inserting into MySQL. The first break is seen at line 13. It breaks off and instead of returning the full:

 <a href="http://www.facebook.com/pages/Portland-Community-Gardens/139244076118027?v=wall" target="_blank"><img src="/shared/cfm/image.cfm?id=348340" alt="Facebook" width="100" height="31" /></a>

返回:

<a href="

对于JavaScript我打算只使用CSVToArray ()by Ben Nadel:

http://www.bennadel.com/blog/1504-Ask-Ben-Parsing-CSV-Strings-With-Javascript-Exec-Regular-Expression-Command.htm

For JavaScript I was going to just use CSVToArray() by Ben Nadel:
http://www.bennadel.com/blog/1504-Ask-Ben-Parsing-CSV-Strings-With-Javascript-Exec-Regular-Expression-Command.htm

我的最终目标是把MySQL放入MySQL,这样我就可以回传一个JSON的feed,PHP的 json_encode $ c>。

My ultimate goal tho is to put into MySQL so I can echo back a JSON feed with PHP's json_encode().

我注意到可能有问题的是,双引号可以在上面的HTML标签中,也可以作为HTML标签的textNodes,因此

Things I noticed that could be problematic are that double quotes can be in HTML tags like above, but also as the textNodes of HTML tags, so "<span class="text">"Example"</span>"

推荐答案

您可以能够欺骗它,并使用正则表达式查找:

You may be able to trick it and use a regex to look for:

"(.*?)"(?=,|$)

但这是一种hack-ish后跟逗号或行尾)。相同的逻辑将适用于查找替换。 (同样,这一切都假定偏离的报价将不会遵循标准的CSV规则(例如在它之前或之后有一个逗号/行[开始/结束]))

But that's kind of hack-ish (basically, only accept an end quote when immediately followed by a comma or an end of line). Same logic would apply to a find-replace. (Again, this all assumes that a "stray" quote will never follow standard CSV rules (e.g. have a comma/line [beginning/end] before or after it))

我假设您对原始数据没有控制权,必须处理您的问题。

I assume you have no control over the original data and have to work with what you have?

EDIT

虽然我只是在一个小的样本的数据,这似乎找到杂散的报价,你可以使用替换 on:

Though I've only tried this on a small sample of your data, this appears to find the "stray" quotes, to which you can use a replace with "" on:

(?<!^|"|,)"(?!"|,|$)

这篇关于损坏的CSV,我如何解决它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆