优化我的代码以模拟数据库 [英] Optimizating my code simulating a database

查看:45
本文介绍了优化我的代码以模拟数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究一个程序,模拟一个可以查询的小型数据库,在编写代码之后,我已经执行了它,但是性能却很差.它真的很慢.我已经尝试改进它,但是几个月前我自己开始使用C ++,所以我的知识仍然很低.因此,我想找到一种提高性能的解决方案.

I have been working on a program, simulating a small database where I could make queries, and after writing the code, I have executed it, but the performance is quite bad. It works really slow. I have tried to improve it, but I started with C++ on my own a few months ago, so my knowledge is still very low. So I would like to find a solution to improve the performance.

让我解释一下我的代码如何工作.在这里,我讲了一个有关我的代码如何工作的摘要示例.

Let me explain how my code works. Here I have atached a summarized example of how my code works.

首先,我有一个.txt文件,用于模拟一个数据库表,其中的随机字符串用"|"分隔.这里有一个表格示例(5行5列).

First of all I have a .txt file simulating a database table with random strings separated with "|". Here you have an example of table (with 5 rows and 5 columns).

Table.txt

Table.txt

0|42sKuG^uM|24465\lHXP|2996fQo\kN|293cvByiV
1|14772cjZ`SN|28704HxDYjzC|6869xXj\nIe|27530EymcTU
2|9041ByZM]I|24371fZKbNk|24085cLKeIW|16945TuuU\Nc
3|16542M[Uz\|13978qMdbyF|6271ait^h|13291_rBZS
4|4032aFqa|13967r^\\`T|27754k]dOTdh|24947]v_uzg

.txt文件中的此信息由我的程序读取并存储在计算机内存中.然后,在进行查询时,我将访问存储在计算机内存中的此信息.将数据加载到计算机内存中可能是一个缓慢的过程,但是以后访问数据会更快,这对我真的很重要.

This information in a .txt file is read by my program and stored in the computer memory. Then, when making queries, I will access to this information stored in the computer memory. Loading the data in the computer memory can be a slow process, but accessing to the data later will be faster, what really matters me.

这里有一部分代码可以从文件中读取此信息并存储在计算机中.

Here you have the part of the code that read this information from a file and store in the computer.

从Table.txt文件中读取数据并将其存储在计算机内存中的代码

Code that reads data from the Table.txt file and store it in the computer memory

string ruta_base("C:\\a\\Table.txt"); // Folder where my "Table.txt" is found

string temp; // Variable where every row from the Table.txt file will be firstly stored
vector<string> buffer; // Variable where every different row will be stored after separating the different elements by tokens.
vector<ElementSet> RowsCols; // Variable with a class that I have created, that simulated a vector and every vector element is a row of my table

ifstream ifs(ruta_base.c_str());

while(getline( ifs, temp )) // We will read and store line per line until the end of the ".txt" file. 
{
    size_t tokenPosition = temp.find("|"); // When we find the simbol "|" we will identify different element. So we separate the string temp into tokens that will be stored in vector<string> buffer

    while (tokenPosition != string::npos)
    {    
        string element;
        tokenPosition = temp.find("|");      

        element = temp.substr(0, tokenPosition);
        buffer.push_back(element);
        temp.erase(0, tokenPosition+1);
    }

    ElementSet ss(0,buffer); 
    buffer.clear();
    RowsCols.push_back(ss); // We store all the elements of every row (stores as vector<string> buffer) in a different position in "RowsCols" 
}

vector<Table> TablesDescriptor;

Table TablesStorage(RowsCols);
TablesDescriptor.push_back(TablesStorage);

DataBase database(1, TablesDescriptor);

此后,出现重要部分.假设我要查询,并要求输入.假设我的查询是行"n",还有连续的元组"numTuples",以及列"y".(我们必须说,列数是由十进制数"y"定义的,它将转换为二进制数,并向我们显示要查询的列,例如,如果我要查询第54列(二进制为00110110),将要求输入第2、3、5和6列).然后,我访问计算机内存以获取所需的信息,并将其存储在显示的向量中.在这里,我向您展示此代码的一部分.

After this, comes the IMPORTANT PART. Let's suppose that I want to make a query, and I ask for input. Let's say that my query is row "n", and also the consecutive tuples "numTuples", and the columns "y". (We must say that the number of columns is defined by a decimal number "y", that will be transformed into binary and will show us the columns to be queried, for example, if I ask for columns 54 (00110110 in binary) I will ask for columns 2, 3, 5 and 6). Then I access to the computer memory to the required information and store it in a vector shownVector. Here I show you the part of this code.

根据我的输入访问所需信息的代码

Code that access to the required information upon my input

int n, numTuples; 
unsigned long long int y;
clock_t t1, t2;

cout<< "Write the ID of the row you want to get more information: " ;
cin>>n; // We get the row to be represented -> "n"

cout<< "Write the number of followed tuples to be queried: " ;
cin>>numTuples; // We get the number of followed tuples to be queried-> "numTuples"

cout<<"Write the ID of the 'columns' you want to get more information: ";
cin>>y; // We get the "columns" to be represented ' "y"

unsigned int r; // Auxiliar variable for the columns path
int t=0; // Auxiliar variable for the tuples path
int idTable;

vector<int> columnsToBeQueried; // Here we will store the columns to be queried get from the bitset<500> binarynumber, after comparing with a mask
vector<string> shownVector; // Vector to store the final information from the query
bitset<500> mask;
mask=0x1;

t1=clock(); // Start of the query time

bitset<500> binaryNumber = Utilities().getDecToBin(y); // We get the columns -> change number from decimal to binary. Max number of columns: 5000

// We see which columns will be queried
for(r=0;r<binaryNumber.size();r++) //
{               
    if(binaryNumber.test(r) & mask.test(r))  // if both of them are bit "1"
    {
        columnsToBeQueried.push_back(r);
    }
    mask=mask<<1;   
}

do
{
    for(int z=0;z<columnsToBeQueried.size();z++)
    {
        int i;
        i=columnsToBeQueried.at(z);

        vector<int> colTab;
        colTab.push_back(1); // Don't really worry about this

        //idTable = colTab.at(i);   // We identify in which table (with the id) is column_i
        // In this simple example we only have one table, so don't worry about this

        const Table& selectedTable = database.getPointer().at(0); // It simmulates a vector with pointers to different tables that compose the database, but our example database only have one table, so don't worry            ElementSet selectedElementSet;

        ElementSet selectedElementSet;

        selectedElementSet=selectedTable.getRowsCols().at(n);
        shownVector.push_back(selectedElementSet.getElements().at(i)); // We save in the vector shownVector the element "i" of the row "n"

    }   
    n=n+1;
    t++;            

}while(t<numTuples);

t2=clock(); // End of the query time

float diff ((float)t2-(float)t1);
float microseconds = diff / CLOCKS_PER_SEC*1000000;

cout<<"The query time is: "<<microseconds<<" microseconds."<<endl;

类定义

在这里,我附加了一些类定义,以便您可以编译代码并更好地了解其工作原理:

Here I attached some of the class definitions so that you can compile the code, and understand better how it works:

class ElementSet
{
private:
    int id;
    vector<string> elements; 

public:
    ElementSet(); 
    ElementSet(int, vector<string>); 

    const int& getId();
    void setId(int);

    const vector<string>& getElements();
    void setElements(vector<string>);

};

class Table
{
private:
    vector<ElementSet> RowsCols; 

public:
    Table(); 
    Table(vector<ElementSet>); 

    const vector<ElementSet>& getRowsCols();
    void setRowsCols(vector<ElementSet>);
};


class DataBase
{
     private:
        int id;
        vector<Table> pointer; 

     public:
        DataBase(); 
        DataBase(int, vector<Table>); 

    const int& getId();
    void setId(int);

    const vector<Table>& getPointer();
    void setPointer(vector<Table>);

    };

class Utilities
{
        public:
        Utilities();
        static bitset<500> getDecToBin(unsigned long long int);
};

所以我得到的问题是,我的查询时间根据表的大小而有很大的不同(它与100行100列的表以及10000行1000列的表无关).这使得我的代码性能对于大型表而言非常低,这对我真正重要...您知道我如何优化我的代码吗???

So the problem that I get is that my query time is very different depending on the table size (it has nothing to do a table with 100 rows and 100 columns, and a table with 10000 rows and 1000 columns). This makes that my code performance is very low for big tables, what really matters me... Do you have any idea how could I optimizate my code????

非常感谢您的帮助!!!:)

Thank you very much for all your help!!! :)

推荐答案

一个明显的问题是您的get函数按值返回向量.您是否需要每次都有一份新的副本?可能不是.

One obvious issue is that your get-functions return vectors by value. Do you need to have a fresh copy each time? Probably not.

如果您尝试返回一个const引用,则可以避免很多副本:

If you try to return a const reference instead, you can avoid a lot of copies:

const vector< Table>&getPointer();

和嵌套get的类似.

这篇关于优化我的代码以模拟数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆