[Pgcluster-general] PGcluster, load balancer problem

Lia Domide lia.domide at codemart.ro
Wed Jan 16 14:43:14 UTC 2008


Hi everybody,

 

I am trying to organize a highly available DB solution using
Postgresql(8.2.5) and pgCluster(1.7.7rc7).

I use 2 Ubuntu 7.04 (x32)machines, currently in virtual machines.

 

Pg1 (192.168.123.31)                

Rep1

Lb1

 

Pg2(193.168.123.29)

Rep2

Lb2

 

-  I managed to make the replication working;

-  I checked the etc/hosts file;

-  When one node is recovering from failure (-R) any operations executed on
the other node is correctly replicated;

 

But the load balancers seem to work in a wrong way (at least not the way I
am expecting them to work).

-          first all DB nodes are initialized, as the pglb.sts file shows

-          immediately after:" PGRscan_cluster:X ClusterDB can be used"
decreases with one  (X -1)

-          I tried to add 3 DB nodes in cluster, and I have the same
problem: at the beginning "3 ClusterDB nodes can be used" and immediately
after that "2 ClusterDB..", even if all three DB nodes are running.

-          In the 3 nodes scenario, when only the last DB node is up, the
cluster is unreachable, but with any of the first two DB nodes is alive, the
cluster is running.

-          In the 2 nodes scenario, when the first DB node is down the
cluster is unreachable, even if the second DB node is alive.

 

Does anyone knows why "PGRscan_cluster:X ClusterDB can be used" decreases,
and when the X number is updated?

A supplementary node must be always kept for safety reasons? (E.g. from a 3
nodes cluster only 2 may be used)?

 

Below, some logs from load balancers, in the 2 nodes scenario:

On PG2, LB2 log:  (PG1 DB node stopped):

2008-01-16 15:30:30 [13087] DEBUG:PGRset_status_on_cluster_tbl():host:pg1
port:5432 max:32 use:0 status1
2008-01-16 15:30:30 [13087] DEBUG:PGRset_status_on_cluster_tbl():host:pg2
port:5432 max:32 use:0 status1
2008-01-16 15:30:30 [13087] DEBUG:init_pglb():Child_Tbl size is[49536]
2008-01-16 15:31:07 [13087] DEBUG:PGRscan_cluster:2 ClusterDB can be used
2008-01-16 15:31:07 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->1
max->32 use_num->0

2008-01-16 15:31:07 [13087] DEBUG:PGRset_status_on_cluster_tbl():host:pg1
port:5432 max:32 use:1 status2
2008-01-16 15:31:07 [13116] DEBUG:PGRdo_child():I am 13116
2008-01-16 15:31:07 [13116] DEBUG:do_accept():I am 13116 accept fd 6
2008-01-16 15:31:07 [13116] DEBUG:read_startup_packet():Protocol Major: 3
Minor: 0 database: TEST user: postgres
2008-01-16 15:31:07 [13116] ERROR:connect_inet_domain_socket(): connect()
failed: Connection refused
2008-01-16 15:31:07 [13116] DEBUG:PGRset_status_on_cluster_tbl():host:pg1
port:5432 max:32 use:2 status98
2008-01-16 15:31:09 [13087] DEBUG:PGRscan_cluster:1 ClusterDB can be used
2008-01-16 15:31:09 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->98
max->32 use_num->1

2008-01-16 15:31:09 [13087] DEBUG:PGRscan_cluster:pg2 [5432],useFlag->1
max->32 use_num->0

2008-01-16 15:31:09 [13087] DEBUG:PGRset_status_on_cluster_tbl():host:pg2
port:5432 max:32 use:1 status2
2008-01-16 15:31:09 [13117] DEBUG:PGRdo_child():I am 13117
2008-01-16 15:31:09 [13117] DEBUG:do_accept():I am 13117 accept fd 6
2008-01-16 15:31:09 [13117] DEBUG:read_startup_packet():Protocol Major: 3
Minor: 0 database: TEST user: postgres
2008-01-16 15:31:09 [13117] DEBUG:create_cp():[pg2] [pg2] is same
2008-01-16 15:31:09 [13117] DEBUG:connect_unix_domain_socket():postmaster
Unix domain socket: /tmp/.s.PGSQL.5432
2008-01-16 15:31:09 [13117] DEBUG:connect_unix_domain_socket():connected to
postmaster Unix domain socket: /tmp/.s.PGSQL.5432 fd: 7
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;   ### HERE I created a JDBC connection
from another host.............
2008-01-16 15:31:09 [13117] DEBUG:ReadyForQuery(): message length: 5
2008-01-16 15:31:09 [13117] DEBUG:ReadyForQuery(): transaction state: I
2008-01-16 15:31:09 [13117] DEBUG:ProcessFrontendResponse():read kind from
frontend X(58)
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] ERROR:PGRload_balance():no cluster available
2008-01-16 15:33:32 [13087] ERROR:load_balance_main():load balance process
failed
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] ERROR:PGRload_balance():no cluster available
2008-01-16 15:33:32 [13087] ERROR:load_balance_main():load balance process
failed
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1

2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1
...................
2008-01-16 15:33:32 [13087] ERROR:PGRload_balance():no cluster available
2008-01-16 15:33:32 [13087] ERROR:load_balance_main():load balance process
failed
2008-01-16 15:33:32 [13087] ERROR:load_balance_main():no cluster available
2008-01-16 15:33:32 [13087] DEBUG:do_accept():I am 13087 accept fd 6
2008-01-16 15:33:32 [13087] DEBUG:read_startup_packet():Protocol Major: 3
Minor: 0 database: TEST user: postgres
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:0 ClusterDB can be used
2008-01-16 15:33:32 [13087] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->99
max->32 use_num->1



 

On PG1, LB1 log (both PG1 and PG2 DB services were previously started on
5432 port, with postgres user):

2008-01-16 16:13:27 [29688] DEBUG:PGRset_status_on_cluster_tbl():host:pg1
port:5432 max:42 use:0 status1
2008-01-16 16:13:27 [29688] DEBUG:PGRset_status_on_cluster_tbl():host:pg2
port:5432 max:42 use:0 status1
2008-01-16 16:13:27 [29688] DEBUG:init_pglb():Child_Tbl size is[65016]
2008-01-16 16:13:28 [29688] DEBUG:PGRscan_cluster:2 ClusterDB can be used
2008-01-16 16:13:28 [29688] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->1
max->42 use_num->0

2008-01-16 16:13:28 [29688] DEBUG:PGRset_status_on_cluster_tbl():host:pg1
port:5432 max:42 use:1 status2
2008-01-16 16:13:28 [29695] DEBUG:PGRdo_child():I am 29695
2008-01-16 16:13:28 [29695] DEBUG:do_accept():I am 29695 accept fd 6
2008-01-16 16:13:28 [29695] ERROR:pool_read: read failed (Connection reset
by peer)
2008-01-16 16:13:30 [29688] DEBUG:PGRscan_cluster:1 ClusterDB can be used
2008-01-16 16:13:30 [29688] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->2
max->42 use_num->0

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

2008-01-16 16:24:58 [31526] DEBUG:PGRdo_child():I am 31526
2008-01-16 16:24:58 [31526] DEBUG:do_accept():I am 31526 accept fd 6
2008-01-16 16:24:58 [31526] ERROR:pool_read: read failed (Connection reset
by peer)
2008-01-16 16:25:00 [29688] DEBUG:PGRscan_cluster:1 ClusterDB can be used
2008-01-16 16:25:00 [29688] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->2
max->42 use_num->0

2008-01-16 16:25:00 [31531] DEBUG:PGRdo_child():I am 31531
2008-01-16 16:25:00 [31531] DEBUG:do_accept():I am 31531 accept fd 6
2008-01-16 16:25:00 [31531] ERROR:pool_read: read failed (Connection reset
by peer)
2008-01-16 16:25:02 [29688] DEBUG:PGRscan_cluster:1 ClusterDB can be used
2008-01-16 16:25:02 [29688] DEBUG:PGRscan_cluster:pg1 [5432],useFlag->2
max->42 use_num->0

 

 

                                                               Thanks in
advance,

                                                               Lia Domide.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080116/60e24129/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pglb.conf
Type: application/octet-stream
Size: 3064 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080116/60e24129/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgreplicate.conf
Type: application/octet-stream
Size: 4823 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080116/60e24129/attachment-0003.obj 


More information about the Pgcluster-general mailing list