[Pgcluster-general] Recovery questions

Filip Rembiałkowski plk.zuber at gmail.com
Tue Feb 26 17:49:45 UTC 2008


2008/2/25, At.Mitani <mitani at sraw.co.jp>:
> Hi Filip,
>
>
>  > 5) now I try to bring cluster_1 back, imagine that someone just
>  > plugged in the power cable:
>  >     cluster_1 $ pg_ctl start
>  > at this moment cluster_1 is not synchronised - so the load balancer is
>  > still using only cluster_2.
>
>
> Yes, for that reason, load balancer uses only cluster_2.
>  In order to back cluster_1 to replication table, it needs restart with -R option as you did.
>
>
>  > I tried to start cluster_1 with "-R" option, but after this the load
>  > balancer still does not use it.
>
>
> please check cluster db's status in the "pglb.sts".

So I did.


all 4 entries in pglb.sts are as follows

Tue Feb 26 16:34:40 2008  port(5829) host:cluster_1 initialize
Tue Feb 26 16:34:40 2008  port(5829) host:cluster_2 initialize
Tue Feb 26 16:35:21 2008  port(5829) host:cluster_1 start use
Tue Feb 26 16:35:23 2008  port(5829) host:cluster_2 start use

>  If status of cluster_1 is neither 'initialize' nor 'start use',
>  recovery might be failed.

It is 'start use' - so in theory everything should be ok, but...

This is how I tested:

By conencting to pglb, I created a database during failure of
cluster_1, and created a table in it. Then, after recovery, i
connected to pglb again and created another table. I expected that
this second table will be present in all clusterdbs.

this is what I get after I recover cluster_1 with
pg_ctl start "-i -R"


psql -U pgcluster -d postgres -h cluster_2 -p 5829 -d during_failure -c \dt
                     List of relations
 Schema |             Name             | Type  |   Owner
--------+------------------------------+-------+-----------
 public | table_created_after_failure  | table | pgcluster
 public | table_created_during_failure | table | pgcluster
(2 rows)


psql -U pgcluster -d postgres -h cluster_1 -p 5829 -d during_failure -c \dt
                     List of relations
 Schema |             Name             | Type  |   Owner
--------+------------------------------+-------+-----------
 public | table_created_during_failure | table | pgcluster
(1 row)

as you can see, cluster_1 is not synchronised with the rest of the company.


>  If so, please check the debug log of replication server.

log tail from pgreplicate.log at cluster_1:
=======================

Tue Feb 26 17:33:12 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:33:12 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:33:12 2008
replicate_packet_send_internal():setTransactionTbl failed
Tue Feb 26 17:33:12 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:41 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:43 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:55 2008  read_packet():unexpected EOF
Tue Feb 26 18:04:55 2008  pgrecovery_loop():unknown packet. abort to parse
Tue Feb 26 18:04:55 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:04:55 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed
Tue Feb 26 18:05:03 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'fe_sendauth: no password supplied
'
Tue Feb 26 18:05:03 2008
replicate_packet_send_internal():setTransactionTbl failed
Tue Feb 26 18:05:03 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed




logtail from pgreplicatelog on cluster_2:
=====================
Tue Feb 26 17:34:20 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008  PGRcreateConn():Retry. h_errno is 1,reason
is 'could not connect to server: Connection refused
        Is the server running on host "192.168.60.115" and accepting
        TCP/IP connections on port 5829?
'
Tue Feb 26 17:34:20 2008  send_packet():host[lb_3]
port[6001]PGR_Create_Socket_Connect failed



>  Probably, you can find something.

I'm not sure if I found something :)


Regards,
Filip


More information about the Pgcluster-general mailing list