当传入线程数增加时,Couchbase Get 操作变慢 [英] Couchbase Get operation slows down when the number of incoming threads increases
问题描述
总结:
Spring-Boot 2.0.4
和 Couchbase 服务器存在重大性能问题5.5.1
当线程数量增加时,我们正在经历数据库响应时间性能的快速下降.这是关于该问题的另一份报告.
We are experiencing a rapid decline in DB response time performance when the number of threads is increasing. Here is another report about the issue.
详情:
Spring Boot 以 500 个线程运行:
Spring Boot is running with 500 threads:
server:
tomcat:
max-threads: 500
max-connections: 500
我们正在使用以下依赖项:
We are using the following dependency:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-couchbase</artifactId>
<version>3.0.9.RELEASE</version>
</dependency>
我们从 DB 中的选择"是使用 Spring-Data 存储库执行的:
Our "select" from DB is performed with Spring-Data repository:
Cat findFirstByOwnerIdAndNameAndColor(String ownerId, String name, String color);
我们有一个专门针对这个查询的索引:
We have an index that is especially for this query:
CREATE INDEX `cat_by_ownerId_name_and_color_idx` ON `pets`(`ownerId`,`name`,`color`) WHERE (`_class` = "com.example.Cat")
随着请求数量的增加,我们可以看到数据库响应查询所需的时间迅速下降.
As the number of requests increase, we can see a quick degradation in the time it takes the DB to answer the query.
例如,当每秒运行 300 个请求时,响应时间的第 99 个百分位约为 10 秒!!,第 50 个百分位约为 5 秒.
For example, when running 300 requests per second, the 99's percentile of response time is about 10 Seconds!! and the 50's percentile is about 5 seconds.
返回文档的平均大小约为 300 字节.这意味着我们试图每秒提取大约 90 KB.数量相对较少.
The average size of the returned document is about 300 Bytes. Meaning that we are trying to extract about 90 Kilobytes per second. A relatively low amount.
我在这里添加在 Couchbase 的 UI 中运行相同查询的结果:(在 UI 中,查询需要 1.75 毫秒才能完成).
I'm adding here the result of running the same query in the UI of Couchbase: (In the UI, the query takes 1.75ms to complete).
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"index": "cats_by_ownerId_name_and_color_idx",
"index_id": "c061141c2d373067",
"index_projection": {
"primary_key": true
},
"keyspace": "pets",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": ""bf23fa4c-22c3-42ac-b141-39cdc76bb2x5"",
"inclusion": 3,
"low": ""bf23fa4c-22c3-42ac-b141-39cdc76bb2x5""
},
{
"high": ""Oscar"",
"inclusion": 3,
"low": ""Oscar""
},
{
"high": ""red"",
"inclusion": 3,
"low": ""red""
}
]
}
],
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "pets",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "(((((`pets`.`_class`) = "com.example.Cat") and ((`pets`.`ownerId`) = "bf23fa4c-22c3-42ac-b141-39cdc76bb2x5")) and ((`pets`.`name`) = "Oscar")) and ((`pets`.`color`) = "red"))"
},
{
"#operator": "InitialProject",
"result_terms": [
{
"expr": "self",
"star": true
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "select * from pets where _class="com.example.Cat" and projectId="bf23fa4c-22c3-42ac-b141-39cdc76bb2x5" and name="Oscar" and color="red""
}
编辑 2
我们也尝试过隐式编写 N1ql 查询,但结果是一样的.和以前一样,我们得到了很多 TimeOutExceptions:
We also tried to implicitly write the N1ql query, but the outcome is the same. As before, we get many TimeOutExceptions:
Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.QueryTimeoutException: java.util.concurrent.TimeoutException: {"b":"pets","s":"n1ql","t":7500000,"i":"f8cdf670-d32a-4d74-858c-f9dd9789d264"}; nested exception is java.lang.RuntimeException: java.util.concurrent.TimeoutException: {"b":"pets","s":"n1ql","t":7500000,"i":"f8cdf670-d32a-4d74-858c-f9dd9789d264"}] with root cause
java.util.concurrent.TimeoutException: {"b":"pets","s":"n1ql","t":7500000,"i":"f8cdf670-d32a-4d74-858c-f9dd9789d264"}
at com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:131) ~[java-client-2.7.0.jar:na]
at com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:127) ~[java-client-2.7.0.jar:na]
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140) ~[rxjava-1.3.8.jar:1.3.8]
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166) ~[rxjava-1.3.8.jar:1.3.8]
at rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191) ~[rxjava-1.3.8.jar:1.3.8]
at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) ~[rxjava-1.3.8.jar:1.3.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_161]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_161]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_161]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
有没有办法解决这个问题,或者我们需要一个不同的数据库?
Is there a way to fix this, or we need a different DB?
推荐答案
所以经过进一步排查,发现问题出在Spring-Data
组件中.
So after further investigation, the problem was found in the Spring-Data
component.
为了克服它,我们不得不转向非阻塞机制.
To over come it, we had to move to non-blocking mechanism.
我们做了两件事:
- 从控制器层到服务层的所有调用 &存储库层,已更改为
CompleteableFuture
为了绕过 Spring-Data 与 couchbase 的连接,我们创建了一个自己的存储库类,其实现代码如下所示:
- All the calls from controller layer down to service & repository layers, were changed to
CompleteableFuture<Cat>
To bypass Spring-Data connection to the couchbase, we created a repository class of our own with implementation code that looks something like that:
Statement statement = select("*")
.from(i(bucket.name()))
.where(x("name").eq(s(name))
.and(x("ownerId").eq(s(ownerId)))
.and(x("color").eq(s(color)))
.and(x("_class").eq(s("com.example.Cat"))));
CompletableFuture<Cat> completableFuture = new CompletableFuture();
bucket.async().query(statement)
...
在我们这样做之后,延迟问题消失了,查询性能约为 2 毫秒,即使在大约数百个并发请求期间也是如此.
After we did that, the latency problem disappeared and the performance are about 2 Milliseconds for query, even during about few hundreds concurrent requests.
这篇关于当传入线程数增加时,Couchbase Get 操作变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!