[Pgcluster-general] Recovery questions
Filip Rembiałkowski
plk.zuber at gmail.com
Tue Feb 26 17:49:45 UTC 2008
2008/2/25, At.Mitani <mitani at sraw.co.jp>:
> Hi Filip,
>
>
> > 5) now I try to bring cluster_1 back, imagine that someone just
> > plugged in the power cable:
> > cluster_1 $ pg_ctl start
> > at this moment cluster_1 is not synchronised - so the load balancer is
> > still using only cluster_2.
>
>
> Yes, for that reason, load balancer uses only cluster_2.
> In order to back cluster_1 to replication table, it needs restart with -R option as you did.
>
>
> > I tried to start cluster_1 with "-R" option, but after this the load
> > balancer still does not use it.
>
>
> please check cluster db's status in the "pglb.sts".
So I did.
all 4 entries in pglb.sts are as follows
Tue Feb 26 16:34:40 2008 port(5829) host:cluster_1 initialize
Tue Feb 26 16:34:40 2008 port(5829) host:cluster_2 initialize
Tue Feb 26 16:35:21 2008 port(5829) host:cluster_1 start use
Tue Feb 26 16:35:23 2008 port(5829) host:cluster_2 start use
> If status of cluster_1 is neither 'initialize' nor 'start use',
> recovery might be failed.
It is 'start use' - so in theory everything should be ok, but...
This is how I tested:
By conencting to pglb, I created a database during failure of
cluster_1, and created a table in it. Then, after recovery, i
connected to pglb again and created another table. I expected that
this second table will be present in all clusterdbs.
this is what I get after I recover cluster_1 with
pg_ctl start "-i -R"
psql -U pgcluster -d postgres -h cluster_2 -p 5829 -d during_failure -c \dt
List of relations
Schema | Name | Type | Owner
--------+------------------------------+-------+-----------
public | table_created_after_failure | table | pgcluster
public | table_created_during_failure | table | pgcluster
(2 rows)
psql -U pgcluster -d postgres -h cluster_1 -p 5829 -d during_failure -c \dt
List of relations
Schema | Name | Type | Owner
--------+------------------------------+-------+-----------
public | table_created_during_failure | table | pgcluster
(1 row)
as you can see, cluster_1 is not synchronised with the rest of the company.
> If so, please check the debug log of replication server.
log tail from pgreplicate.log at cluster_1:
=======================
Tue Feb 26 17:33:12 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:33:12 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:33:12 2008
replicate_packet_send_internal():setTransactionTbl failed
Tue Feb 26 17:33:12 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:41 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:43 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:55 2008 read_packet():unexpected EOF
Tue Feb 26 18:04:55 2008 pgrecovery_loop():unknown packet. abort to parse
Tue Feb 26 18:04:55 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:55 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:05:03 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008
replicate_packet_send_internal():setTransactionTbl failed
Tue Feb 26 18:05:03 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
logtail from pgreplicatelog on cluster_2:
=====================
Tue Feb 26 17:34:20 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008 PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
Is the server running on host "192.168.60.115" and accepting
TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008 send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
> Probably, you can find something.
I'm not sure if I found something :)
Regards,
Filip
More information about the Pgcluster-general
mailing list