数百万个条目的SQLite优化? [英] SQLite Optimization for Millions of Entries?

查看:84
本文介绍了数百万个条目的SQLite优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过使用SQLite数据库和Perl模块来解决问题.最后,我需要登录数千万个条目.每个项目的唯一唯一标识符是URL的文本字符串.我正在考虑通过两种方式做到这一点:

方法#1:拥有一张好桌子,一张坏桌子,未分类的桌子. (我需要检查html并确定是否需要它.)假设我们总共有10亿个页面,每个表中有3.33亿个URL.我要添加一个新的URL,我需要检查它是否在任何表中,如果唯一,则将其添加到Unsorted中.另外,使用此选项,我会在周围移动很多行.

方法2:我有2张桌子,主人和好. Master拥有所有10亿个页面URL,Good拥有我想要的3.33亿个URL.新的URL,需要做同样的事情,除了这次我只查询一个表,而且我永远不会从Master删除一行,只将数据添加到Good.

因此,基本上,我需要了解快速查询大型SQLite数据库的最佳设置,以查看〜20个字符的文本字符串是否唯一,然后添加. >

我现在正在尝试使Berkeley DB使用Perl模块工作,但是没有骰子.这就是我所拥有的:

use BerkeleyDB;

$dbFolder = 'C:\somedirectory';
my $env = BerkeleyDB::Env->new ( -Home => $dbFolder );

my $db  = BerkeleyDB::Hash->new (
-Filename => "fred.db", 
-Env => $env );
my $status = $db->db_put("apple", "red");

运行此命令时,我得到以下信息:

Can't call method "db_put" on an undefined value at C:\Directory\perlfile.pl line 42, <STDIN> line 1.

解决方案

如果未定义$db,则打开数据库失败,您应检查$!$BerkeleyDB::Error以了解原因.

您已经创建了数据库吗?如果不是,则需要-Flags => DB_CREATE.

工作示例:

use strict;
use warnings;
use BerkeleyDB;

my $dbFolder = '/home/ysth/bdbtmp/';

my $db  = BerkeleyDB::Hash->new (
    -Filename => "$dbFolder/fred.db", 
    -Flags => DB_CREATE,
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";

my $status = $db->db_put("apple", "red");

尽管如此,我无法让BerkeleyDB :: Env做任何有用的事情.无论我尝试了什么,构造函数都会返回undef.

I'm trying to tackle a problem by using a SQLite database and Perl modules. In the end, there will be tens of millions of entries I need to log. The only unique identifier for each item is a text string for the URL. I'm thinking of doing this in two ways:

Way #1: Have a good table, bad table, unsorted table. (I need to check the html and decide whether I want it.) Say we have 1 billion pages total, 333 million URLs in each table. I have a new URL to add, and I need to check and see if it's in any of the tables, and add it to the Unsorted if it is unique. Also, I would be moving a lot of rows around with this option.

Way #2: I have 2 tables, Master and Good. Master has all 1 billion page URLs, and Good has the 333 million that I want. New URL, need to do the same thing, except this time I am only querying one table, and I would never delete a row from Master, only add the data to Good.

So basically, I need to know the best setup to quickly query a huge SQLite database to see if a text string of ~20 characters is unique, then add if it isn't.

Edit: I'm now trying to get Berkeley DB to work using the Perl module, but no dice. Here's what I have:

use BerkeleyDB;

$dbFolder = 'C:\somedirectory';
my $env = BerkeleyDB::Env->new ( -Home => $dbFolder );

my $db  = BerkeleyDB::Hash->new (
-Filename => "fred.db", 
-Env => $env );
my $status = $db->db_put("apple", "red");

And when I run this, I get the following:

Can't call method "db_put" on an undefined value at C:\Directory\perlfile.pl line 42, <STDIN> line 1.

解决方案

If $db is undefined, opening the database is failing, and you should inspect $! and $BerkeleyDB::Error to see why.

Have you created the database already? If not, you need -Flags => DB_CREATE.

Working example:

use strict;
use warnings;
use BerkeleyDB;

my $dbFolder = '/home/ysth/bdbtmp/';

my $db  = BerkeleyDB::Hash->new (
    -Filename => "$dbFolder/fred.db", 
    -Flags => DB_CREATE,
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";

my $status = $db->db_put("apple", "red");

I couldn't get BerkeleyDB::Env to do anything useful, though; whatever I tried, the constructor returned undef.

这篇关于数百万个条目的SQLite优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆