将简短的编码数据存储在varchar列中 [英] Storing snappy encoded data in varchar column

查看:105
本文介绍了将简短的编码数据存储在varchar列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Snappy-java编码JSON数据,并且要将结果存储在varchar列中的数据库中。

该数据库是具有ISO-8859-1编码的oracle数据库。 >
插入数据时遇到编码问题。 Oracle似乎无法识别某些字符。

I am using Snappy-java to encode JSON data and I want to store the result in database in a varchar column.
The database is an oracle database with ISO-8859-1 encoding.
I am facing an encoding problem when inserting the data. It would seem that some characters are not recognised by Oracle.

我找到了一种解决方法,即在插入压缩数据之前对压缩数据使用Base64编码。然后我就可以很好地检索它了:)

I've found a workaround by using Base64 encoding on the compressed data before inserting it. I can then retrieve it just fine :)

问题在于,Base64编码增加了我要存储的数据的长度,从而减少了通过Snappy ...

The problem with that is that Base64 encoding increases the length of the data that I am then storing, hereby reducing the savings gained with Snappy...

所以我的问题是:如何在不对Base64进行编码的情况下存储该数据?
我想使用varchar的原因是因为我希望能够使用oracle索引访问表而无需访问表(性能绝对是一个问题)。

So my question is: How can I store that data without encoding it in Base64? The reason I want to use a varchar is because I want to be able to access the table using an oracle index without ever accessing the table (performance is definitely an issue).

我也尝试了其他压缩算法,但是它们似乎都存在相同的问题。
我也看过yEnc,但是找不到任何Java编码器。此外,我不确定我是否了解yEnc列出的所有问题,因此我不太愿意使用它。

I have tried other compression algorithms as well, but they all seem to have the same problem. I have also looked at yEnc but I cannot find any java encoder. Moreover I am not sure that I understands all the problems listed with yEnc, so I am bit reluctant using it.

非常感谢您的帮助!

推荐答案

谢谢大家的帮助!

我终于找到了解决方法。
因为我存储字节而不是char,所以我将使用BLOB来存储数据。
BLOB的问题是无法对其进行索引。
替代方法是使用RAW类型列。它存储字节并且是可索引的。不幸的是,它太小(2000字节)。
因此,我的答案是将数据存储在BLOB中,并通过两种RAW类型的索引对其进行访问,因为数据永远不会大于4000字节。

I finally found a workaround. Since I am storing bytes and not chars, I am going to use a BLOB to store the data. The problem with the BLOB is that it cannot be indexed. The alternative is using a RAW type column. It stores bytes and is indexable. Unfortunately it is too small (2000 bytes). So, the answer in my case consist in storing the data in BLOB, and access it through an index on two RAW types since the data is never bigger than 4000 bytes.

索引看起来像这样:

CREATE INDEX blob_to_raw_prd_ix 
ON product (product_id, 
            substr_dt(blob_summary,2000,1), 
            substr_dt(blob_summary,2000,2001));

其中


  • blob_summary是我将数据存储在的BLOB列

  • substr_dt是用户定义的确定性函数(此后定义)

  • blob_summary is the BLOB column i store the data in
  • substr_dt is a user defined deterministic function (defined hereafter)

创建或替换功能substr_dt(str BLOB,buffer_size int,offset int)返回原始
确定性是
开始
返回dbms_lob.substr(str,buffer_size,offset);
END;

CREATE OR REPLACE FUNCTION substr_dt(str BLOB, buffer_size int, offset int) RETURN RAW DETERMINISTIC IS BEGIN RETURN dbms_lob.substr(str,buffer_size,offset); END;

要访问数据,我只需要使用别名查询product_id和字段,例如

To access the data, I just need to query the product_id and the fields using aliases, e.g.

SELECT     /*+ index(blob_to_raw_prd_ix) */ product_id, 
                                            substr_dt(blob_summary, 2000, 1) AS summary1, 
                                            substr_dt(blob_summary, 2000, 2001) AS summary2
FROM       product
WHERE      (product_id = ?);

在这种情况下,summary_1表示Blob的前2000个字节,摘要2表示后2000个字节。
在两个数组summary1和summary2上使用串联,我得到了blob的内容。

In this case, summary_1 represents the first 2000 bytes of the blob, and summary 2 the last 2000 bytes. Using concatenation on the two arrays summary1 and summary2 I get the content of the blob.

这与Jdbc兼容,但我无法使其与Hibernate兼容(然而)。
这不是最好的解决方案,因为在解释数据之前需要重新处理数据。但是,它解决了数据访问问题,而没有编码空间开销。

That works with Jdbc but I could not make it work with Hibernate (yet). It is not the best solution ever as data needs reprocessing before being interpreted. However, it solves the data access problem without the encoding space overhead.

这篇关于将简短的编码数据存储在varchar列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆