From 123c4fd56a357fdf480ea4f3a14929fd13e67fbb Mon Sep 17 00:00:00 2001
From: Wade Barnes <wade.barnes@shaw.ca>
Date: Tue, 3 Jul 2018 13:59:58 -0700
Subject: [PATCH] Update documentation

- Add documentation on how to fix a Postgres crash loop caused by a `tuple concurrently updated` error
---
 openshift/README.md | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/openshift/README.md b/openshift/README.md
index 1a97a86f..2fff7d4f 100644
--- a/openshift/README.md
+++ b/openshift/README.md
@@ -376,4 +376,45 @@ kill 38 39 40 41
 ```
 
 Wait a 30 seconds type ps -x.  Ensuring that new PIDs have been created.  
-Now you can see the yellow Django debug screen!!!
\ No newline at end of file
+Now you can see the yellow Django debug screen!!!
+
+## Fixing a Postgres crash loop caused by a `tuple concurrently updated` error
+
+If a Postgres database pod gets terminated unexpectedly it can trigger a crash loop with the following log signature.
+
+```
+pg_ctl: another server might be running; trying to start server anyway
+waiting for server to start....LOG:  redirecting log output to logging collector process
+HINT:  Future log output will appear in directory "pg_log".
+..... done
+server started
+=> sourcing /usr/share/container-scripts/postgresql/start/set_passwords.sh ...
+ERROR:  tuple concurrently updated
+```
+
+To fix the issue:
+- Find the name of the postgres pod that is in the crash loop.
+- Start an `oc debug` session with the pod.
+- Scale the associated Postgres deployment to zero pods.
+- From the cmd line of the debug session;
+  - Run `run-postgresql`.  This is the `CMD` for the docker image.  As part of the start-up process the script creates a number of files that won't exist in the pod otherwise, namely `/var/lib/pgsql/openshift-custom-postgresql.conf` and `/var/lib/pgsql/passwd`, which will stop you from running any of the `pg_ctl` commands.  When you run the command you should see the same error output listed above.
+  - Run `pg_ctl stop -D /var/lib/pgsql/data/userdata` to cleanly shutdown Postgres.  You should see;
+    ```
+    waiting for server to shut down.... done
+    server stopped
+    ```
+  - Run `pg_ctl start -D /var/lib/pgsql/data/userdata` to start Postgres.  You should see the following output and it should wait there indefinitly (no errors);
+    ```
+    server starting
+    sh-4.2$ LOG:  redirecting log output to logging collector process
+    HINT:  Future log output will appear in directory "pg_log".
+    ```
+  - Press `enter` a couple of times to get back to the cmd prompt.
+  - Run `pg_ctl stop -D /var/lib/pgsql/data/userdata`, and wait for postgres to stop.  This will ensure a clean shutdown.
+    ```
+    waiting for server to shut down.... done
+    server stopped
+    ```
+  - Exit the debug session.
+  - Scale the deployment to 1 pod.  Postgres should start normally now.
+