通过复制粘贴解决了数据处理错误? [英] Data processing error solved by a copy paste?

查看:95
本文介绍了通过复制粘贴解决了数据处理错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Linux 16.04下使用python 2.7处理数据时遇到一个非常奇怪的问题. 我使用此功能创建一个.csv文件:

I am encountering a very strange problem while processing data under Linux 16.04 using python 2.7. I create a .csv file using this function :

from ast import literal_eval
    with open('logs.csv') as f:
    data = [literal_eval(line) for line in f]

该文件已正确创建,外观如下:

the file is properly created and looks like that:

('2017-04-01 12:05:00','0.01770001','0.0177887','0.01780275','0.01770001')
('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
('2017-04-01 12:15:00','0.01773','0.01780092','0.01780092','0.01773')
('2017-04-01 12:20:00','0.0178','0.01781212','0.01784922','0.01774015')
('2017-04-01 12:25:00','0.01781212','0.01774528','0.01782994','0.01774528')
('2017-04-01 12:30:00','0.01774529','0.0178732','0.01788145','0.01774509')
('2017-04-01 12:35:00','0.01788145','0.01793318','0.01793318','0.01788145')
('2017-04-01 12:40:00','0.01794','0.01780093','0.01799984','0.01780092')
('2017-04-01 12:45:00','0.01785694','0.01806699','0.01807519','0.01785694')
('2017-04-01 12:50:00','0.01807999','0.01819687','0.01827573','0.018027')
('2017-04-01 12:55:00','0.01819687','0.01825402','0.0184','0.01800011')
('2017-04-01 13:00:00','0.01822416','0.01830994','0.01835554','0.0181777')
('2017-04-01 13:05:00','0.01825415','0.01810171','0.01830986','0.01810008')
('2017-04-01 13:10:00','0.01810174','0.01818991','0.01818991','0.01810173')
('2017-04-01 13:15:00','0.01818991','0.01818002','0.01819687','0.01818001')
('2017-04-01 13:20:00','0.01818002','0.01821999','0.01822','0.01818001')

然后我将其传递给此代码以绘制图形:

I then pass it throught this code to draw a graph :

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import dates, ticker
import matplotlib as mpl
from mpl_finance import candlestick_ohlc
from ast import literal_eval

mpl.style.use('default')


data = []
ohlc_data = [] 

with open('logsXMR.csv') as f:
    data = [literal_eval(line) for line in f]


for line in data:
        #ohlc_data.append((np.float64(line[0]), np.float64(line[1]), np.float64(line[2]), np.float64(line[3]), np.float64(line[4])))
        ohlc_data.append((dates.datestr2num(line[0]), np.float64(line[1]), np.float64(line[2]), np.float64(line[3]), np.float64(line[4])))

fig, ax1 = plt.subplots()
candlestick_ohlc(ax1, ohlc_data, width = 0.5/((24*60)/5), colorup = 'g', colordown = 'r', alpha = 0.8)

#ax1.xaxis.set_major_formatter(dates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.xaxis.set_major_locator(ticker.MaxNLocator(10))

plt.xticks(rotation = 30)
plt.grid()
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Historical Data XMRUSD')
plt.tight_layout()
plt.show()

但是每次我收到该错误:

But each time I get that error:

Traceback (most recent call last):
  File "CSVing.py", line 15, in <module>
    data = [literal_eval(line) for line in f]
  File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 2
    ('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
    ^

我不明白为什么会出现此错误,因为如果我简单地将数据复制并粘贴到另一个文件中,那么一切正常,那么我就可以完美地绘制图形.我只是不明白,因为这两个数据文件是相同的,没有增加空间或其他任何内容.

I don't understand why I get this error because if I simply copy and paste my datas into another file, everything works fine, I can draw my graph flawlessly. I just don't get it because the 2 data files are identicals, no added space or anything.

是什么原因导致此错误,如何在无需将数据复制粘贴到另一个文件的情况下直接使用生成的数据文件呢?

what can cause this error and how can I proceed to be able to used my generated data file directly without the need to copy paste the datas in another file ?

预先感谢

像素

推荐答案

我建议重新考虑一下您拥有的数据格式.我不知道数据来自哪里,但是以不包含括号等的方式存储数据是合理的.

I would recommend rethinking the data format you have. I don't know where the data comes from, but it would be reasonable to store it in a way that does not contain parantheses etc.

如果您确实需要使用此数据格式,则仍可以使用大熊猫,并通过删除无用的字符来清理格式.

If you really need to work with this data format, you may still use e.g. pandas and sanitize the format, by removing the characters that aren't useful.

u = """('2017-04-01 12:05:00','0.01770001','0.0177887','0.01780275','0.01770001')
('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
('2017-04-01 12:15:00','0.01773','0.01780092','0.01780092','0.01773')
('2017-04-01 12:20:00','0.0178','0.01781212','0.01784922','0.01774015')
('2017-04-01 12:25:00','0.01781212','0.01774528','0.01782994','0.01774528')
('2017-04-01 12:30:00','0.01774529','0.0178732','0.01788145','0.01774509')
('2017-04-01 12:35:00','0.01788145','0.01793318','0.01793318','0.01788145')
('2017-04-01 12:40:00','0.01794','0.01780093','0.01799984','0.01780092')
('2017-04-01 12:45:00','0.01785694','0.01806699','0.01807519','0.01785694')
('2017-04-01 12:50:00','0.01807999','0.01819687','0.01827573','0.018027')
('2017-04-01 12:55:00','0.01819687','0.01825402','0.0184','0.01800011')
('2017-04-01 13:00:00','0.01822416','0.01830994','0.01835554','0.0181777')
('2017-04-01 13:05:00','0.01825415','0.01810171','0.01830986','0.01810008')
('2017-04-01 13:10:00','0.01810174','0.01818991','0.01818991','0.01810173')
('2017-04-01 13:15:00','0.01818991','0.01818002','0.01819687','0.01818001')
('2017-04-01 13:20:00','0.01818002','0.01821999','0.01822','0.01818001')"""

import io
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
from mpl_finance import candlestick_ohlc

replace = {"\(" : "", "\)" : "", "'" : ""}
df = pd.read_csv(io.StringIO(u), sep=",",  header=None).replace(replace, regex=True)
# use pd.read_csv("myfilename.txt", ...)  here for your real file

df[0] = dates.datestr2num(df[0])
df.iloc[:,1:] = df.iloc[:,1:].astype(float)

fig, ax1 = plt.subplots()
candlestick_ohlc(ax1, df.values, width = 0.5/((24*60)/5), 
                 colorup = 'g', colordown = 'r', alpha = 0.8)

ax1.xaxis.set_major_formatter(dates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.xaxis.set_major_locator(dates.MinuteLocator((0,15,30,45)))

plt.xticks(rotation = 30)
plt.grid()
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Historical Data XMRUSD')
plt.tight_layout()
plt.show()

请注意,数据似乎也不是Ohlc格式,因此看起来很奇怪.但是,由于对数据一无所知,因此您需要自己找出正确的顺序.

Note that the data also does not seem to be in Ohlc format, hence the strange looking graph. But since nothing is known about the data, you need to find out the correct order yourself.

这篇关于通过复制粘贴解决了数据处理错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆