数据集选择性挑选和转换 [英] Dataset selective picking and transformation

查看:46
本文介绍了数据集选择性挑选和转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 .xlsx 中有一个包含数十万行的数据集,如下所示:

I have a dataset in .xlsx with hundreds of thousands of rows as follow:

slug    symbol  name    date    ranknow open    high    low close   volume  market  close_ratio spread
companyA    AAA companyA    28/04/2013  1   135,3   135,98  132,1   134,21  0   1500520000  0,5438  3,88
companyA    AAA companyA    29/04/2013  1   134,44  147,49  134 144,54  0   1491160000  0,7813  13,49
companyA    AAA companyA    30/04/2013  1   144 146,93  134,05  139 0   1597780000  0,3843  12,88
....
companyA    AAA companyA    17/04/2018  1   8071,66 8285,96 7881,72 7902,09 6900880000  1,3707E+11  0,0504  404,24
....
lancer  LA  Lancer  09/01/2018  731 0,347111    0,422736    0,345451    0,422736    3536710 0   1   0,08
lancer  LA  Lancer  10/01/2018  731 0,435794    0,512958    0,331123    0,487106    2586980 0   0,8578  0,18
lancer  LA  Lancer  11/01/2018  731 0,479738    0,499482    0,309485    0,331977    950410  0   0,1184  0,19
....
lancer  LA  Lancer  17/04/2018  731 0,027279    0,041106    0,02558 0,031017    9936    1927680 0,3502  0,02
....
yocomin YC  Yocomin 21/01/2016  732 0,008135    0,010833    0,002853    0,002876    63  139008  0,0029  0,01
yocomin YC  Yocomin 22/01/2016  732 0,002872    0,008174    0,001192    0,005737    69  49086   0,651   0,01
yocomin YC  Yocomin 23/01/2016  732 0,005737    0,005918    0,001357    0,00136     67  98050   0,0007  0
....
yocomin YC  Yocomin 17/04/2018  732 0,020425    0,021194    0,017635    0,01764 12862   2291610 0,0014  0
....

假设我有一个 .txt 文件,其中包含我想要提取的时间序列的符号列表.例如:

Let's say I have a .txt file with a list of symbol of that time series I want to extract. For example:

AAA
LA
YC

我想得到一个如下所示的数据集:

I would like to get a dataset that would look as follow:

date          AAA        LA        YC
28/04/2013    134,21     NaN       NaN
29/04/2013    144,54     NaN       NaN
30/04/2013    139        NaN       NaN
....
....
....
17/04/2018    7902,09    0,031017  0,01764

在股票名称(如 AAA 等)下,我得到收盘"价格.我对 Python 和 R 都持开放态度.任何帮助都将不胜感激!

where under the stock name (like AAA, etc) i get the "close" price. I'm open to both Python and R. Any help would be grate!

推荐答案

在使用 Pandas 的 python 中,这应该可行.

In python using pandas, this should work.

import pandas as pd

df = pd.read_excel("/path/to/file/Book1.xlsx")
df = df.loc[:, ['symbol', 'name', 'date', 'close']]
df = df.set_index(['symbol', 'name', 'date'])
df = df.unstack(level=[0,1])
df = df['close']

读取符号文件文件,然后过滤掉不在数据框中的符号:

to read the symbols file file and then filter out symbols not in the dataframe:

symbols = pd.read_csv('/path/to/file/symbols.txt', sep=" ", header=None)
symbols = symbols[0].tolist()
symbols = pd.Index(symbols).unique()
symbols = symbols.intersection(df.columns.get_level_values(0))

输出将如下所示:

print(df[symbols])


symbol                   AAA        LA        YC
name                companyA    Lancer   Yocomin
date                                            
2018-09-01 00:00:00     None  0,422736      None
2018-10-01 00:00:00     None  0,487106      None
2018-11-01 00:00:00     None  0,331977      None

这篇关于数据集选择性挑选和转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆