[Pgcluster-general] PGCluster 1.7 seems to be stuck in deadlocks

Hari G hari_g at hotmail.com
Thu Dec 27 21:58:50 UTC 2007


Hello All -

(this cross-posted to pgcluster-general and geoserver-users lists)

For the last few days I am trying my best to implement a PGCluster solution. The client of this cluster is Java applications (mainly geoserver) that use postgis as well. This setup works fine with a single Postgresql but when I use the cluster everything goes wrong. Many database connections gets hung. Sometime I see lot of "idle in transaction" threads. I have two physical servers, each running a replicator, load balancer and cluster.

I am not doing any upate at this time but when I do "select * from pg_locks;" I see 160 rows:
 relation      |    18577 |    19058 |      |       |               |         |       |          |      654004 | 12119 | AccessShareLock | t relation      |    18577 |    19020 |      |       |               |         |       |          |      653938 | 12001 | AccessShareLock | t relation      |    18577 |    18588 |      |       |               |         |       |          |      653958 | 12096 | AccessShareLock | t relation      |    18577 |    19020 |      |       |               |         |       |          |      653994 | 12113 | AccessShareLock | t relation      |    18577 |    18654 |      |       |               |         |       |          |      653980 | 12106 | AccessShareLock | t relation      |    18577 |    19020 |      |       |               |         |       |          |      654191 | 12481 | AccessShareLock | t relation      |    18577 |    19058 |      |       |               |         |       |          |      654180 | 12504 | AccessShareLock | t relation      |    18577 |    19020 |      |       |               |         |       |          |      654375 | 12900 | AccessShareLock | t relation      |    18577 |    18588 |      |       |               |         |       |          |      654191 | 12481 | AccessShareLock | t relation      |    18577 |    18668 |      |       |               |         |       |          |      654540 | 13330 | AccessShareLock | t relation      |    18577 |    18588 |      |       |               |         |       |          |      654375 | 12900 | AccessShareLock | t relation      |    18577 |    18588 |      |       |               |         |       |          |      653938 | 12001 | AccessShareLock | t transactionid |          |          |      |       |        654367 |         |       |          |      654367 | 12885 | ExclusiveLock   | t relation      |    18577 |    18588 |      |       |               |         |       |          |      653994 | 12113 | AccessShareLock | t transactionid |          |          |      |       |        654375 |         |       |          |      654375 | 12900 | ExclusiveLock   | t relation      |    18577 |    19020 |      |       |               |         |       |          |      653958 | 12096 | AccessShareLock | t relation      |    18577 |    19059 |      |       |               |         |       |          |      653980 | 12106 | AccessShareLock | t relation      |    18577 |    19062 |      |       |               |         |       |          |      654540 | 13330 | AccessShareLock | t transactionid |          |          |      |       |        654362 |         |       |          |      654362 | 12877 | ExclusiveLock   | t transactionid |          |          |      |       |        654189 |         |       |          |      654189 | 12477 | ExclusiveLock   | t relation      |    18577 |    19059 |      |       |               |         |       |          |      654543 | 13298 | AccessShareLock | t relation      |    18577 |    18588 |      |       |               |         |       |          |      654190 | 12479 | AccessShareLock | t relation      |    18577 |    19058 |      |       |               |         |       |          |      654002 | 12118 | AccessShareLock | t transactionid |          |          |      |       |        653976 |         |       |          |      653976 | 12104 | ExclusiveLock   | t transactionid |          |          |      |       |        654004 |         |       |          |      654004 | 12119 | ExclusiveLock   | t relation      |    18577 |    19020 |      |       |               |         |       |          |      653982 | 12107 | AccessShareLock | t

Does this look like deadlock to you? I am also copying my conf files:
#------------------------------------------------------------# file: cluster.conf (cluster_2)#------------------------------------------------------------# set Replication Server information#------------------------------------------------------------<Replicate_Server_Info>        <Host_Name> rep_1 </Host_Name>        <Port> 8001 </Port>        <Recovery_Port> 8101 </Recovery_Port></Replicate_Server_Info><Replicate_Server_Info>        <Host_Name> rep_2 </Host_Name>        <Port> 8001 </Port>        <Recovery_Port> 8101 </Recovery_Port></Replicate_Server_Info>#-------------------------------------------------------------# set Cluster DB Server information#-------------------------------------------------------------<Recovery_Port> 7001 </Recovery_Port><Rsync_Path> /usr/bin/rsync </Rsync_Path><Rsync_Option>ssh</Rsync_Option><Rsync_Compress> yes </Rsync_Compress><Pg_Dump_Path>  /usr/local/pgsql/bin/pg_dump </Pg_Dump_Path>
#-------------------------------------------------------------# file: pglb.conf#-------------------------------------------------------------# set cluster DB server information#--------------------------------------------------------------------<Cluster_Server_Info>    <Host_Name> cluster_1 </Host_Name>    <Port> 5433 </Port>    <Max_Connect>100</Max_Connect> </Cluster_Server_Info><Cluster_Server_Info>    <Host_Name> cluster_2 </Host_Name>    <Port> 5433 </Port>    <Max_Connect>100</Max_Connect>  </Cluster_Server_Info>#-------------------------------------------------------------# set Load Balance server information #------------------------------------------------------------- <Backend_Socket_Dir>    /tmp    </Backend_Socket_Dir> <Receive_Port>          5432    </Receive_Port><Recovery_Port>         6101    </Recovery_Port><Max_Cluster_Num>        128    </Max_Cluster_Num><Use_Connection_Pooling>  no    </Use_Connection_Pooling><LifeCheck_Timeout>     3s      </LifeCheck_Timeout><LifeCheck_Interval>    15s     </LifeCheck_Interval>#-------------------------------------------------------------# A setup of a log files#-------------------------------------------------------------<Log_File_Info>        <File_Name> /var/log/postgresql/pglb.log </File_Name>        <File_Size> 1M </File_Size>        <Rotate> 3 </Rotate></Log_File_Info>#-------------------------------------------------------------# file: pgreplicate.conf#-------------------------------------------------------------# A setup of Cluster DB(s)#-------------------------------------------------------------<Cluster_Server_Info>    <Host_Name> cluster_1 </Host_Name>    <Port> 5433 </Port>    <Recovery_Port> 7001 </Recovery_Port></Cluster_Server_Info><Cluster_Server_Info>    <Host_Name> cluster_2 </Host_Name>    <Port> 5433 </Port>    <Recovery_Port> 7001 </Recovery_Port></Cluster_Server_Info>#------------------------------------------------------------# A setup of a replication server#------------------------------------------------------------- <Replication_Port>8001</Replication_Port><Recovery_Port>8101</Recovery_Port><RLOG_Port>8301</RLOG_Port><Response_Mode>normal</Response_Mode><Use_Replication_Log>no</Use_Replication_Log><Replication_Timeout>1min</Replication_Timeout><LifeCheck_Timeout>3s</LifeCheck_Timeout><LifeCheck_Interval>15s</LifeCheck_Interval><Status_Log_File> /tmp/pgreplicate.sts </Status_Log_File><Error_Log_File> /tmp/pgreplicate.log </Error_Log_File>#-------------------------------------------------------------# A setup of a log files #-------------------------------------------------------------<Log_File_Info>        <File_Name> /var/log/postgresql/pgreplicate.log </File_Name>        <File_Size> 1M </File_Size>        <Rotate> 3 </Rotate></Log_File_Info>
I also tried the following:
1. removing the Load balancer and use the cluster directly
2. starting lb and replication servers in Debug mode.

In Debug mode both LB and Replication servers are spitting out a lot of messages but I saw nothing that is alarming....

I am very new to Pgcluster and hence any help or pointers are greatly appreciated.

-- Hari Gangadharan


_________________________________________________________________
Share life as it happens with the new Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_122007
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20071227/44aa18d7/attachment-0001.html 


More information about the Pgcluster-general mailing list