我如何避免在我的C#代码中执行许多执行的gremlinqueries? [英] How can I avoid to many executed gremlinqueries in my C#-Code?

查看:83
本文介绍了我如何避免在我的C#代码中执行许多执行的gremlinqueries?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据库:

客户=>事件=>文件=>文件名

Clients => Incident => File => Filename

客户有一个ID 事件有一个ID和一个reportOn属性 文件具有ID和fileSize,mimeType,恶意软件属性 文件名有一个ID 客户端对事件有外发的Edge(已报告),事件对文件有外发的Edge(containsFile),文件有对文件名的外发Edge(hasName).

Clients have an ID Incidents have an ID and a reportedOn property Files have an ID and a fileSize, mimeType, malware property Filenames have an ID Client have a outgoing Edge to Incidents (reported), incident have a outgoing Edge to file (containsFile), file have a outgoing Edge to filename (hasName).

以下是一些示例数据:

g.addV('client').property('id','1').as('1').
  addV('incident').property('id','11').property('reportedON', '2/15/2019 8:01:19 AM').as('11').
  addV('file').property('id','100').property('fileSize', '432534').as('100').
  addV('fileName').property('id','file.pdf').as('file.pdf').
  addE('reported').from('1').to('11').
  addE('containsFile').from('11').to('100').
  addE('hasName').from('100').to('file.pdf').iterate()

在下面的C#代码中,我正在检查数据库中的每个fileName是否存在特殊的文件扩展名.之后,我使用具有这些特殊文件扩展名的fileName来获取它们的所有值以及在其周围的顶点以及在foreachloop中的第二个查询中的值:

In the C# Code below I am checking every fileName in the Database for special fileextensions. After that I take the fileNames which have these special fileextensions to get all their values and the vertices around them with their values in the second query which is in the foreachloop:

var resultSet = await SubmitQueryAsync("g.V().hasLabel('fileName')");
                    if (resultSet.Length > 0)
                    {
                        foreach (var result in resultSet)
                        {
                        JObject jsonData = result;
                        string fileId = jsonData["Id"].Value<string>();
                        string fileExtension = "";
                            string[] fileExtensions = { ".ace", ".arj", ".iso", ".rar", ".gz", ".acrj", ".lnk", ".z", ".tar", ".xz" };
                            HashSet<string> hSet = new HashSet<string>(fileExtensions);

                            if (fileId.Contains("."))
                            {
                                fileExtension = fileId.Substring(fileId.LastIndexOf('.'));
                            }


                            if (hSet.Contains(fileExtension))
                            {
                            var resultSet2 = await SubmitQueryAsync("g.V().has(id, '" + fileId + "').as('FILENAME').in('hasName').as('FILE').in('containsFile').as('INCIDENT').select('FILE').valueMap().as('FILEVALUES').select('INCIDENT').valueMap().as('INCIDENTVALUES').select('FILE', 'FILEVALUES', 'FILENAME', 'INCIDENTVALUES')");
                            list = FillList(list, resultSet2);
                            }
                        }
                    }

因此,对于具有特殊文件扩展名之一的每个fileName,我都在foreachloop中执行一个查询.问题是这对于数据库来说查询太多了.那么我怎样才能更有效呢?

So for every fileName which have one of the special fileextensions I am executing one query in the foreachloop. The problem is that this are too many queries for the database. So how can I get this more efficient?

推荐答案

您可能需要做的第一件事是更改数据模型,并在"fileName"上包含"ext"(即"fileExtension")属性,以便您可以轻松地对其进行搜索(我认为CosmosDB尚不支持TextP或类似选项进行文本搜索),因此:

The first thing you probably need to do is change your data model and include a "ext" (i.e. "fileExtension") property on "fileName" so that you can search on it easily (I don't think CosmosDB supports TextP or similar options yet for text searches), thus:

g.addV('client').property('id','1').as('1').
  addV('incident').property('id','11').property('reportedON', '2/15/2019 8:01:19 AM').as('11').
  addV('file').property('id','100').property('fileSize', '432534').as('100').
  addV('fileName').property('id','file.pdf').property('ext','.pdf').as('file.pdf').
  addE('reported').from('1').to('11').
  addE('containsFile').from('11').to('100').
  addE('hasName').from('100').to('file.pdf').iterate() 

然后,将所有C#滚动到单个Gremlin遍历中非常简单:

Then, it's pretty simple to roll all of that C# into a single Gremlin traversal:

gremlin> g.V().has('fileName','ext',within(".ace", ".arj", ".iso", ".rar", ".gz", ".acrj", ".lnk", ".z", ".tar", ".xz", ".pdf")).as('FILENAME').
......1>   in('hasName').as('FILE').
......2>   in('containsFile').as('INCIDENT').
......3>   select('FILE').valueMap().as('FILEVALUES').
......4>   select('INCIDENT').valueMap().as('INCIDENTVALUES').
......5>   select('FILE', 'FILEVALUES', 'FILENAME', 'INCIDENTVALUES')
==>[FILE:v[5],FILEVALUES:[fileSize:[432534],id:[100]],FILENAME:v[8],INCIDENTVALUES:[reportedON:[2/15/2019 8:01:19 AM],id:[11]]]

请注意,我在您的扩展名"列表中添加了".pdf",以便根据您的示例数据返回结果.除此之外,我认为您的查询确实比它应该的更为复杂-让我们尝试简化一下,因为所有步骤标签都使此操作难以执行.我更喜欢使用project():

Note that I added ".pdf" to your list of "extensions" so that it would return a result given your sample data. Aside from that, I think your query is really more complex that it should be - let's try to simplify because all the step labeling makes this hard to follow. I'd prefer some use of project():

gremlin> g.V().has('fileName','ext',within(".ace", ".arj", ".iso", ".rar", ".gz", ".acrj", ".lnk", ".z", ".tar", ".xz", ".pdf")).
......1>   project('FILE','FILEVALUES','FILENAME','INCIDENTVALUES').
......2>     by(__.in('hasName')).
......3>     by(__.in('hasName').valueMap()).
......4>     by().
......5>     by(__.in('hasName').in('containsFile').valueMap())
==>[FILE:v[5],FILEVALUES:[fileSize:[432534],id:[100]],FILENAME:v[8],INCIDENTVALUES:[reportedON:[2/15/2019 8:01:19 AM],id:[11]]]

这使我认识到"FILE"和"FILEVALUES"基本上是相同的东西,可以组合:

which Then makes me realize that "FILE" and "FILEVALUES" are basically the same thing and can be combined:

gremlin> g.V().has('fileName','ext',within(".ace", ".arj", ".iso", ".rar", ".gz", ".acrj", ".lnk", ".z", ".tar", ".xz", ".pdf")).
......1>   project('FILEVALUES','FILENAME','INCIDENTVALUES').
......2>     by(__.in('hasName').valueMap(true)).
......3>     by().
......4>     by(__.in('hasName').in('containsFile').valueMap())
==>[FILEVALUES:[id:5,fileSize:[432534],id:[100],label:file],FILENAME:v[8],INCIDENTVALUES:[reportedON:[2/15/2019 8:01:19 AM],id:[11]]]

我不喜欢我们两次遍历in('hasName'),所以:

I don't like that we traverser in('hasName') twice so:

gremlin> g.V().has('fileName','ext',within(".ace", ".arj", ".iso", ".rar", ".gz", ".acrj", ".lnk", ".z", ".tar", ".xz", ".pdf")).
......1>   project('FILEVALUES','FILENAME').
......2>     by(__.in('hasName').
......3>        project('FILE','INCIDENT').
......4>          by(valueMap(true)).
......5>          by(__.in('containsFile').valueMap())).
......6>     by()
==>[FILEVALUES:[FILE:[id:5,fileSize:[432534],id:[100],label:file],INCIDENT:[reportedON:[2/15/2019 8:01:19 AM],id:[11]]],FILENAME:v[8]]

但这会稍微改变返回结果的结构.我想可以通过更多转换将其恢复到原来的水平,但是我不确定您是否对此感到担心.现在,我只是想帮助使查询更具可读性.

but that changes the structure of your returned result a little bit. I suppose that could be flattened back to what you had with more transformations, but I'm not sure you're worried about that. I'm just trying to help make the query more readable at this point.

这篇关于我如何避免在我的C#代码中执行许多执行的gremlinqueries?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆