在Kubernetes上具有Pgbouncer的Npgsql-合并&保持活力 [英] Npgsql with Pgbouncer on Kubernetes - pooling & keepalives

查看:88
本文介绍了在Kubernetes上具有Pgbouncer的Npgsql-合并&保持活力的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找更详细的指导/其他人在Pgbouncer的生产中使用Npgsql的经验.

基本上,我们使用GKE和Google Cloud SQL进行了以下设置:

现在-我已经使用本地连接池将npgsql配置为好像没有pgbouncer一样.我已将pgbouncer添加为我的GKE集群中的部署,因为Google SQL的最大连接限制非常低-为了能够在Kubernetes内水平扩展应用程序,我需要防止其被压倒.

我的问题是pgbouncer吊舱之一死亡时的可靠性之一(由于节点故障或我按比例放大/缩小).

发生这种情况时(1)应用程序pod中来自客户端连接池的所有现有打开连接都不会立即关闭(2)-基本上是我的应用程序在尝试执行命令时出现异常.不理想!

我看到了(并查看 https://www.npgsql.org/doc/compatibility.html 上的建议),我有三个选择.

  1. 使用它,并在我的应用程序中处理SQL命令的重试.可能,但是如果我弄错的话,似乎要付出很大的努力,并且会产生很多可能的错误.p>

  2. 打开保持活动状态,让npgsql自身在出现故障时较快地断开"不良连接.我什至不确定这是否会起作用或是否会导致进一步故障问题.

  3. 完全关闭客户端连接池.这似乎是官方建议,但出于性能原因我不愿意这样做,Npgsql必须打开似乎非常浪费每个会话都与pgbouncer连接-并与我对其他RDBMS(如SQL Server)的所有经验背道而驰.

我使用这些选项之一在正确的轨道上吗?还是我错过了什么?

解决方案

您通常走在正确的轨道上,您的分析似乎很准确.一些评论:

选项2(结果为保持连接)将有助于删除Npgsql池中已断开的空闲连接.在编写完应用程序后,应用程序仍然会出现一些故障(因为某些不良的空闲连接可能无法及时删除).没有特别的理由认为这会引起进一步的问题-开启它应该很安全.

对于perf而言,选项3确实存在问题,因为每次需要数据库连接时,都必须建立与pgbouncer的TCP连接.它也不会提供100%的故障保护机制,因为在使用连接时pgbouncer仍可能会掉线.

在一天结束时,您要面对面对任意网络/服务器故障的弹性,这不是一件容易的事.处理此问题的唯一100%可靠方法是在您的应用程序中,通过专用层在发生临时异常时重试操作.您可能需要查看 Polly ,请注意,Npgsql通过公开<可以使用的href ="http://www.npgsql.org/doc/api/Npgsql.NpgsqlException.html#Npgsql_NpgsqlException_IsTransient" rel ="nofollow noreferrer"> IsTransient 异常作为重试的触发(Entity Framework Core还包括类似的重试策略").如果您确实走这条路,请注意,特别难于正确处理交易.

I'm looking for more detailed guidance / other people's experience of using Npgsql in production with Pgbouncer.

Basically we have the following setup using GKE and Google Cloud SQL:

Right now - I've got npgsql configured as if pgbouncer wasn't in place, using a local connection pool. I've added pgbouncer as a deployment in my GKE cluster as Google SQL has very low max connection limits - and to be able to scale my application horizontally inside of Kubernetes I need to protect against overwhelming it.

My problem is one of reliability when one of the pgbouncer pods dies (due to a node failure or as I'm scaling up/down).

When that happens (1) all of the existing open connections from the client side connection pools in the application pods don't immediately close (2) - and basically result in exceptions to my application as it tries to execute commands. Not ideal!

As I see it (and looking at the advice at https://www.npgsql.org/doc/compatibility.html) I have three options.

  1. Live with it, and handle retries of SQL commands within my application. Possible, but seems like a lot of effort and creates lots of possible bugs if I get it wrong.

  2. Turn on keep alives and let npgsql itself 'fail out' relatively quickly the bad connections when those fail. I'm not even sure if this will work or if it will cause further problems.

  3. Turn off client side connection pooling entirely. This seems to be the official advice, but I am loathe to do this for performance reasons, it seems very wasteful for Npgsql to have to open a connnection to pgbouncer for each session - and runs counter to all of my experience with other RDBMS like SQL Server.

Am I on the right track with one of those options? Or am I missing something?

解决方案

You are generally on the right track and your analysis seems accurate. Some comments:

Option 2 (turning out keepalives) will help remove idle connections in Npgsql's pool which have been broken. As you've written your application will still some failures (as some bad idle connections may not be removed in time). There is no particular reason to think this would cause further problems - this should be pretty safe to turn on.

Option 3 is indeed problematic for perf, as a TCP connection to pgbouncer would have to be established every single time a database connection is needed. It will also not provide a 100% fail-proof mechanism, since pgbouncer may still drop out while a connection is in use.

At the end of the day, you're asking about resiliency in the face of arbitrary network/server failure, which isn't an easy thing to achieve. The only 100% reliable way to deal with this is in your application, via a dedicated layer which would retry operations when a transient exception occurs. You may want to look at Polly, and note that Npgsql helps our a bit by exposing an IsTransient exception which can be used as a trigger to retry (Entity Framework Core also includes a similar "retry strategy"). If you do go down this path, note that transactions are particularly difficult to handle correctly.

这篇关于在Kubernetes上具有Pgbouncer的Npgsql-合并&amp;保持活力的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆