From 123c4fd56a357fdf480ea4f3a14929fd13e67fbb Mon Sep 17 00:00:00 2001 From: Wade Barnes Date: Tue, 3 Jul 2018 13:59:58 -0700 Subject: [PATCH] Update documentation - Add documentation on how to fix a Postgres crash loop caused by a `tuple concurrently updated` error --- openshift/README.md | 43 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/openshift/README.md b/openshift/README.md index 1a97a86f..2fff7d4f 100644 --- a/openshift/README.md +++ b/openshift/README.md @@ -376,4 +376,45 @@ kill 38 39 40 41 ``` Wait a 30 seconds type ps -x. Ensuring that new PIDs have been created. -Now you can see the yellow Django debug screen!!! \ No newline at end of file +Now you can see the yellow Django debug screen!!! + +## Fixing a Postgres crash loop caused by a `tuple concurrently updated` error + +If a Postgres database pod gets terminated unexpectedly it can trigger a crash loop with the following log signature. + +``` +pg_ctl: another server might be running; trying to start server anyway +waiting for server to start....LOG: redirecting log output to logging collector process +HINT: Future log output will appear in directory "pg_log". +..... done +server started +=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ... +ERROR: tuple concurrently updated +``` + +To fix the issue: +- Find the name of the postgres pod that is in the crash loop. +- Start an `oc debug` session with the pod. +- Scale the associated Postgres deployment to zero pods. +- From the cmd line of the debug session; + - Run `run-postgresql`. This is the `CMD` for the docker image. As part of the start-up process the script creates a number of files that won't exist in the pod otherwise, namely `/var/lib/pgsql/openshift-custom-postgresql.conf` and `/var/lib/pgsql/passwd`, which will stop you from running any of the `pg_ctl` commands. When you run the command you should see the same error output listed above. + - Run `pg_ctl stop -D /var/lib/pgsql/data/userdata` to cleanly shutdown Postgres. You should see; + ``` + waiting for server to shut down.... done + server stopped + ``` + - Run `pg_ctl start -D /var/lib/pgsql/data/userdata` to start Postgres. You should see the following output and it should wait there indefinitly (no errors); + ``` + server starting + sh-4.2$ LOG: redirecting log output to logging collector process + HINT: Future log output will appear in directory "pg_log". + ``` + - Press `enter` a couple of times to get back to the cmd prompt. + - Run `pg_ctl stop -D /var/lib/pgsql/data/userdata`, and wait for postgres to stop. This will ensure a clean shutdown. + ``` + waiting for server to shut down.... done + server stopped + ``` + - Exit the debug session. + - Scale the deployment to 1 pod. Postgres should start normally now. +