数百万个条目的SQLite优化? [英] SQLite Optimization for Millions of Entries?
问题描述
我正在尝试通过使用SQLite数据库和Perl模块来解决问题.最后,我需要登录数千万个条目.每个项目的唯一唯一标识符是URL的文本字符串.我正在考虑通过两种方式做到这一点:
方法#1:拥有一张好桌子,一张坏桌子,未分类的桌子. (我需要检查html并确定是否需要它.)假设我们总共有10亿个页面,每个表中有3.33亿个URL.我要添加一个新的URL,我需要检查它是否在任何表中,如果唯一,则将其添加到Unsorted中.另外,使用此选项,我会在周围移动很多行.
方法2:我有2张桌子,主人和好. Master拥有所有10亿个页面URL,Good拥有我想要的3.33亿个URL.新的URL,需要做同样的事情,除了这次我只查询一个表,而且我永远不会从Master删除一行,只将数据添加到Good.
因此,基本上,我需要了解快速查询大型SQLite数据库的最佳设置,以查看〜20个字符的文本字符串是否唯一,然后添加. >
我现在正在尝试使Berkeley DB使用Perl模块工作,但是没有骰子.这就是我所拥有的:
use BerkeleyDB;
$dbFolder = 'C:\somedirectory';
my $env = BerkeleyDB::Env->new ( -Home => $dbFolder );
my $db = BerkeleyDB::Hash->new (
-Filename => "fred.db",
-Env => $env );
my $status = $db->db_put("apple", "red");
运行此命令时,我得到以下信息:
Can't call method "db_put" on an undefined value at C:\Directory\perlfile.pl line 42, <STDIN> line 1.
如果未定义$db
,则打开数据库失败,您应检查$!
和$BerkeleyDB::Error
以了解原因.
您已经创建了数据库吗?如果不是,则需要-Flags => DB_CREATE
.
工作示例:
use strict;
use warnings;
use BerkeleyDB;
my $dbFolder = '/home/ysth/bdbtmp/';
my $db = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/fred.db",
-Flags => DB_CREATE,
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
my $status = $db->db_put("apple", "red");
尽管如此,我无法让BerkeleyDB :: Env做任何有用的事情.无论我尝试了什么,构造函数都会返回undef.
I'm trying to tackle a problem by using a SQLite database and Perl modules. In the end, there will be tens of millions of entries I need to log. The only unique identifier for each item is a text string for the URL. I'm thinking of doing this in two ways:
Way #1: Have a good table, bad table, unsorted table. (I need to check the html and decide whether I want it.) Say we have 1 billion pages total, 333 million URLs in each table. I have a new URL to add, and I need to check and see if it's in any of the tables, and add it to the Unsorted if it is unique. Also, I would be moving a lot of rows around with this option.
Way #2: I have 2 tables, Master and Good. Master has all 1 billion page URLs, and Good has the 333 million that I want. New URL, need to do the same thing, except this time I am only querying one table, and I would never delete a row from Master, only add the data to Good.
So basically, I need to know the best setup to quickly query a huge SQLite database to see if a text string of ~20 characters is unique, then add if it isn't.
Edit: I'm now trying to get Berkeley DB to work using the Perl module, but no dice. Here's what I have:
use BerkeleyDB;
$dbFolder = 'C:\somedirectory';
my $env = BerkeleyDB::Env->new ( -Home => $dbFolder );
my $db = BerkeleyDB::Hash->new (
-Filename => "fred.db",
-Env => $env );
my $status = $db->db_put("apple", "red");
And when I run this, I get the following:
Can't call method "db_put" on an undefined value at C:\Directory\perlfile.pl line 42, <STDIN> line 1.
If $db
is undefined, opening the database is failing, and you should inspect $!
and $BerkeleyDB::Error
to see why.
Have you created the database already? If not, you need -Flags => DB_CREATE
.
Working example:
use strict;
use warnings;
use BerkeleyDB;
my $dbFolder = '/home/ysth/bdbtmp/';
my $db = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/fred.db",
-Flags => DB_CREATE,
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
my $status = $db->db_put("apple", "red");
I couldn't get BerkeleyDB::Env to do anything useful, though; whatever I tried, the constructor returned undef.
这篇关于数百万个条目的SQLite优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!