将公司名称列表转换为置顶 [英] Turn list of company names into tickers

查看:95
本文介绍了将公司名称列表转换为置顶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个公司名称列表,希望将其用作股票行情自动收录器.这是可复制的代码,用于创建我拥有的名称列表:

I have a list of company names that I would like to turn into tickers. Here is the reproducible code to create the list of names that I have:

companynames=structure(list(V1 = structure(1:41, .Label = c("AETNA INC", "ANTHEM INC", 
"APPLE INC", "ASPEN INSURANCE HOLDINGS LTD", "BARRICK GOLD CORP", 
"BEST BUY CO INC", "CAREFUSION CORP", "CBS CORP-CLASS B NON VOTING", 
"CIGNA CORP", "COMPUTER SCIENCES CORP", "COMPUWARE CORP", "COVENTRY HEALTH CARE INC", 
"DELPHI AUTOMOTIVE PLC", "DST SYSTEMS INC", "EINSTEIN NOAH RESTAURANT GRO", 
"ENSCO PLC-CL A", "EXPEDIA INC", "FIFTH STREET FINANCE CORP", 
"GENERAL MOTORS CO", "GENWORTH FINANCIAL INC-CL A", "GREEN BRICK PARTNERS INC", 
"HESS CORP", "HUMANA INC", "HUNTINGTON INGALLS INDUSTRIE", "LEGG MASON INC", 
"MARKET VECTORS GOLD MINERS", "MARVELL TECHNOLOGY GROUP LTD", 
"MICROSOFT CORP", "NCR CORPORATION", "NVR INC", "OAKTREE CAPITAL GROUP LLC", 
"REPUBLIC AIRWAYS HOLDINGS IN", "SEAGATE TECHNOLOGY", "SPRINT COMMUNICATIONS INC", 
"STARZ - A", "STATE BANK FINANCIAL CORP", "SYMMETRICOM INC", 
"TESSERA TECHNOLOGIES INC", "UNITEDHEALTH GROUP INC", "VIRGIN MEDIA INC/OLD", 
"XEROX CORP"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-41L))

这给了我一些类似的东西:

This gives me something along the lines of:

head(companynames)
                            V1
1                    AETNA INC
2                   ANTHEM INC
3                    APPLE INC
4 ASPEN INSURANCE HOLDINGS LTD
5            BARRICK GOLD CORP
6              BEST BUY CO INC

我想要另一列输出这些公司的股票行情的专栏.因此,对于第一行,我应该得到AET,第二行是ATHN,第三行是AAPL,依此类推.我的示例在R中,但是使用python或R的任何解决方案都将非常有帮助.我不确定是否已经存在执行此功能的函数,或者如果不存在该函数,最好的方法是如何创建它.

I would like another column that outputed the tickers of each of these companies. So for the first row I should get AET, second row would be ATHN, and third row would be AAPL, etc. My example is in R, but any solution in python or R would be very helpful. I am not sure if there is already a function that does it or how the best approach would be to create a function if it does not exist.

推荐答案

您可以使用@Joshual Ulrich的TTR包来获取公司名称到股票行情的映射,并针对您的companynames对象执行查找.理想情况下,您的名称列表应该是正确的/正确的格式,但是由于不是这样,您将不得不做一些额外的工作来获取一些符号.例如

You can use @Joshual Ulrich's TTR package to get a mapping of company names to tickers and perform lookups against your companynames object. Ideally, your list of names would be accurate / properly formatted, but since it's not you will have to do a bit of extra leg work to get some of the symbols. For example,

stock.symbols <- TTR::stockSymbols()
stock.symbols$adj_name <- gsub("[\\.\\,]", "", toupper(stock.symbols$Name)) # quick adjustments
##
companynames$Symbol <- sapply(companynames[,1], function(x) {
  stock.symbols[grep(x, stock.symbols$adj_name)[1], 1]
})
##
R> na.omit(companynames)
#                      V1        Symbol
#1                     AETNA INC    AET
#2                    ANTHEM INC   ANTM
#3                     APPLE INC   AAPL
#5             BARRICK GOLD CORP    ABX
#6               BEST BUY CO INC    BBY
#9                    CIGNA CORP     CI
#10       COMPUTER SCIENCES CORP    CSC
#13        DELPHI AUTOMOTIVE PLC   DLPH
#14              DST SYSTEMS INC    DST
#17                  EXPEDIA INC   EXPE
#18    FIFTH STREET FINANCE CORP    FSC
#19            GENERAL MOTORS CO     GM
#21     GREEN BRICK PARTNERS INC   GRBK
#22                    HESS CORP    HES
#23                   HUMANA INC    HUM
#24 HUNTINGTON INGALLS INDUSTRIE    HII
#25               LEGG MASON INC     LM
#27 MARVELL TECHNOLOGY GROUP LTD   MRVL
#28               MICROSOFT CORP   MSFT
#29              NCR CORPORATION    NCR
#30                      NVR INC    NVR
#31    OAKTREE CAPITAL GROUP LLC    OAK
#32 REPUBLIC AIRWAYS HOLDINGS IN   RJET
#33           SEAGATE TECHNOLOGY    STX
#36    STATE BANK FINANCIAL CORP   STBZ
#38     TESSERA TECHNOLOGIES INC   TSRA
#39       UNITEDHEALTH GROUP INC    UNH
#41                   XEROX CORP    XRX


因此,只需使用一些基本转换(将Names列设置为大写并删除. s和, s),就可以匹配41个输入中的28个.其余大多数不匹配的情况很可能可以通过简单替换输入名称或stock.symbols中的adj_names列来解决,例如CORPCORPORATION等...正如上面的注释所指出的,如果您的公司名称未在任何NASDAQAMEXNYSE交易所进行交易,则您将不得不提取更多外部数据.


So just using a few basic transformations (setting the Names column to uppercase and removing .s and ,s), you can match 28 out of 41 of the inputs. Most of the remaining non-matching cases could probably be solved by simple substitutions of either your input names or the adj_names column in stock.symbols, e.g. CORP vs CORPORATION, etc... And as pointed out in the comments above, if you have company names that aren't traded on any of the NASDAQ, AMEX, or NYSE exchanges, you will have to pull in some more external data.

这篇关于将公司名称列表转换为置顶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆