如何使用CSV时间数据在R中创建直方图? [英] How to create histogram in R with CSV time data?

查看:439
本文介绍了如何使用CSV时间数据在R中创建直方图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有24小时的日志CSV数据,如下所示:

  svr01,07:17:14, 'u1@user.de','8.3.1.35'
svr03,07:17:21,'u2 @ sr.de','82.15.1.35'
svr02,07:17:30, 'u3@fr.de','2.15.1.35'
svr04,07:17:40,'u2 @ for.de','2.1.1.35'

我使用 tbl < - read.csv(logs.csv)



如何将这些数据绘制在直方图中以查看每小时的点击次数?
理想情况下,我会得到4个小节,代表每个小时每个srv01,srv02,srv03,srv04的点击次数。

感谢您在这里帮助我!

解决方案

示例数据集:

  dat = data.frame(server = paste(svr,round(runif(1000,1,10)),sep =),
time = Sys.time()+ sort(round(runif(1000 ,1,36000))))

我使用的技巧是创建一个新的变量,在这个小时内记录了命中:

  dat $ hr = strftime(dat $ time,%H)

现在我们可以使用 plyr magick:

  hits_hour = count(dat,vars = c(server,hr))



并创建图:

  ggplot( data = hits_hour)+ geom_bar(aes(x = hr,y = freq,fill = server),stat =identity,position =dodge)

看起来像:




<

  ggplot(data = hits_hour)我不太喜欢这个情节,我会更赞成: + geom_line(aes(x = as.numeric(hr),y = freq))+ facet_wrap(〜server,nrow = 1)

看起来像:


将所有方面放在一行中可以轻松比较服务器之间的点击次数。这在使用真实数据而不是我的随机数据时看起来会更好。


I have CSV data of a log for 24 hours that looks like this:

svr01,07:17:14,'u1@user.de','8.3.1.35'
svr03,07:17:21,'u2@sr.de','82.15.1.35'
svr02,07:17:30,'u3@fr.de','2.15.1.35'
svr04,07:17:40,'u2@for.de','2.1.1.35'

I read the data with tbl <- read.csv("logs.csv")

How can I plot this data in a histogram to see the number of hits per hour? Ideally, I would get 4 bars representing hits per hour per srv01, srv02, srv03, srv04.

Thank you for helping me here!

解决方案

An example dataset:

dat = data.frame(server = paste("svr", round(runif(1000, 1, 10)), sep = ""),
                 time = Sys.time() + sort(round(runif(1000, 1, 36000))))

The trick I use is to create a new variable which only specifies in which hour the hit was recorded:

dat$hr = strftime(dat$time, "%H")

Now we can use some plyr magick:

hits_hour = count(dat, vars = c("server","hr"))

And create the plot:

ggplot(data = hits_hour) + geom_bar(aes(x = hr, y = freq, fill = server), stat="identity", position = "dodge")

Which looks like:

I don't really like this plot, I'd be more in favor of:

ggplot(data = hits_hour) + geom_line(aes(x = as.numeric(hr), y = freq)) + facet_wrap(~ server, nrow = 1)

Which looks like:

Putting all the facets in one row allows easy comparison of the number of hits between the servers. This will look even better when using real data instead of my random data.

这篇关于如何使用CSV时间数据在R中创建直方图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆