简单,快速的SQL查询平面文件 [英] Simple, fast SQL queries for flat files

查看:76
本文介绍了简单,快速的SQL查询平面文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有人知道使用类似于SQL的声明性查询语言来提供简单,快速的平面文件查询的工具吗?我宁愿不支付将文件加载到数据库中的开销,因为输入数据通常在查询运行后几乎立即被抛出.

Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.

考虑数据文件"animals.txt":

Consider the data file, "animals.txt":

dog 15
cat 20
dog 10
cat 30
dog 5
cat 40

假设我想为每只独特的动物提取最高价值.我想写些类似的东西:

Suppose I want to extract the highest value for each unique animal. I would like to write something like:

cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"

使用sort可以获得几乎相同的结果:

I can get nearly the same result using sort:

cat animals.txt | sort -t " " -k1,1 -k2,2nr

而且我总是可以从那里进入awk,但是当类似SQL的语言似乎可以如此干净地解决问题时,这一切都会感觉到awk领域(无法抗拒).

And I can always drop into awk from there, but this all feels a bit awkward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.

我已经考虑过为SQLite编写一个包装程序,该包装程序将根据输入数据自动创建一个表,并且我考虑在单处理器模式下使用Hive,但我不禁感到这个问题已经解决了.之前解决过.我想念什么吗?此功能是否已由其他标准工具实现?

I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?

半路!

推荐答案

我从没有找到令人满意的答案,但是我至少使用uniq的"-f"找到了解决玩具问题的方法选项,我一直没有意识到:

I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniqs "-f" option, which I had been unaware of:

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

如果输入文件的创建顺序相反,显然可以完全跳过上面的awk部分.

The awk portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.

不过,我仍然希望有一个类似SQL的工具.

I'm still holding out hope for a SQL-like tool, though.

这篇关于简单,快速的SQL查询平面文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆