[Pgcluster-general] Question about recovery

David Schlenk david.schlenk at spanlink.com
Fri Aug 29 16:18:53 UTC 2008




On 8/29/08 10:55 AM, "Sean Brown" <sbrown at eaglepress.com> wrote:

> I am by no means a expert on this, but I believe that by default with
> only one node left, the cluster goes into Read Only and won't accept
> writes, so in theory, with a two node cluster you would just have to
> bring back the failed node. You may however, just to be on the safe
> side, bring the failed node up with -R since, well, who knows if you had
> file corruption when the node went south.
> 
> The relevant cluster.conf setting is
> <When_Stand_Alone> read_only </When_Stand_Alone>
> 
> If you've changed that, then yes, you absolutely have to bring back a
> failed node with -R as the replicator does not keep track of
> transactions that have occurred.
> 

My understanding is that the read_only setting only takes effect when no
*replicator* is available. From the config file doc:

When all replication servers fail,
you can set up two kinds of permission,
"read_only" or "read_write".

(I did correct a couple spelling mistakes, but other than that I just
copied/pasted from the stock config file).

I'm talking about the situation that the replicator is fine, but you go down
to one db node, an insert happens, then the remaining node goes down, the db
that went down first comes up (not in recovery since there's no remaining DB
to recover from), then the second failed DB comes up in recovery mode which
results in lost data.

The reason I'm curious is that I want to be able to automate the recovery of
a failed DB node. I'd like to always attempt starting with -R, but fall back
to normal mode if that isn't possible (if no other DB nodes are up), but if
I do this there's a risk of data loss if the DB node that comes up first
wasn't the last to go down. Even forgetting the automated goal, I'm not sure
I'll always be able to know with certainty which DB node was the last to be
taken offline. I would hope my monitoring solution will give me that
information but it would be nice to avoid the possibility of data loss if
it's possible. 
-- 
David Schlenk
Software Engineer
Product Engineering
Spanlink Communications, Inc.
(763) 971.2030
david.schlenk at spanlink.com



More information about the Pgcluster-general mailing list