[Pgcluster-general] Instability replicating over wan
Diogo Biazus
diogob at gmail.com
Wed May 23 17:25:16 UTC 2007
Sorry for posting it again, but is there anyone who could give me some advice?
On 5/18/07, Diogo Biazus <diogob at gmail.com> wrote:
> Hi there,
> I just started using pgcluster and I must say I'm impressed with the
> project, although it lacks some docs (I'll try to write something in
> portuguese later) it was pretty easy to setup and get it replicating.
>
> My network layout:
> I have 2 servers communicating over a VPN: cluster_poa and
> cluster_rio. The machines are on different cities, but the average
> ping time is good: 53.504 ms
> I'm not using the load balancer, people on rio de janeiro access only
> the cluster_rio, and people on porto alegre access only the
> cluster_poa, and I'm using only one replicator on cluster_poa.
> I'm using the version 1.7.rc7 on both machines. Below I put my config
> files for both:
>
> >>>>>>>>>>>>>> cluster_poa <<<<<<<<<<<<<<<<<<
> * /etc/hosts
> 192.168.2.8 cluster_rio
> 192.168.2.8 rep_rio
> 192.168.200.100 cluster_poa
> 192.168.200.100 rep_poa
>
> * cluster.conf
> <Replicate_Server_Info>
> <Host_Name> rep_poa </Host_Name>
> <Port> 8001 </Port>
> <Recovery_Port> 8101 </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name> cluster_poa </Host_Name>
> <Recovery_Port> 7001 </Recovery_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh </Rsync_Option>
> <Rsync_Compress> yes
> </Rsync_Compress>
> <Pg_Dump_Path> /usr/local/pgsql/bin/pg_dump </Pg_Dump_Path>
> <When_Stand_Alone> read_only
> </When_Stand_Alone>
> <Replication_Timeout> 1 min
> </Replication_Timeout>
> <LifeCheck_Timeout> 20s
> </LifeCheck_Timeout>
> <LifeCheck_Interval> 21s
> </LifeCheck_Interval>
>
> * pgreplicate.conf
> <Cluster_Server_Info>
> <Host_Name> cluster_poa </Host_Name>
> <Port> 5432 </Port>
> <Recovery_Port> 7001 </Recovery_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
> <Host_Name> cluster_rio </Host_Name>
> <Port> 5432 </Port>
> <Recovery_Port> 7001 </Recovery_Port>
> </Cluster_Server_Info>
> <Host_Name> rep_poa </Host_Name>
> <Replication_Port> 8001
> </Replication_Port>
> <Recovery_Port> 8101 </Recovery_Port>
> <RLOG_Port> 8301 </RLOG_Port>
> <Response_Mode> normal </Response_Mode>
> <Use_Replication_Log> no
> </Use_Replication_Log>
> <Replication_Timeout> 1min
> </Replication_Timeout>
> <LifeCheck_Timeout> 10s
> </LifeCheck_Timeout>
> <LifeCheck_Interval> 15s
> </LifeCheck_Interval>
> <Log_File_Info>
> <File_Name> /tmp/pgreplicate.log </File_Name>
> <File_Size> 1M </File_Size>
> <Rotate> 3 </Rotate>
> </Log_File_Info>
>
>
>
> >>>>>>>>>>>>>> cluster_rio <<<<<<<<<<<<<<<<<<
> * /etc/hosts
> 192.168.2.8 cluster_rio
> 192.168.2.8 rep_rio
> 192.168.200.100 cluster_poa
> 192.168.200.100 rep_poa
>
> * cluster.conf
> <Replicate_Server_Info>
> <Host_Name> rep_poa </Host_Name>
> <Port> 8001 </Port>
> <Recovery_Port> 8101 </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name> cluster_rio </Host_Name>
> <Recovery_Port> 7001 </Recovery_Port>
> <Rsync_Path> /usr/bin/rsync </Rsync_Path>
> <Rsync_Option> ssh </Rsync_Option>
> <Rsync_Compress> yes
> </Rsync_Compress>
> <Pg_Dump_Path> /usr/local/pgsql/bin/pg_dump </Pg_Dump_Path>
> <When_Stand_Alone> read_only
> </When_Stand_Alone>
> <Replication_Timeout> 2 min
> </Replication_Timeout>
> <LifeCheck_Timeout> 20s
> </LifeCheck_Timeout>
> <LifeCheck_Interval> 21s
> </LifeCheck_Interval>
>
> My problem:
> I'm having some stability problems. It works very well and then
> sometimes (I couldn't find out any pattern, it seems random to me) the
> cluster_rio stops seeing the replicator and the whole cluster restarts
> and put in the log:
>
> LOG: server process (PID 11017) was terminated by signal 11
> LOG: terminating any other active server processes
> LOG: all server processes terminated; reinitializing
>
> When this starts happening it wont stop till I restart the whole
> cluster and the replicator.
> If I restart only the cluster_rio postmaster it will not comunicate
> with the replicator and give lots of:
> ERROR: This query is not permitted when all replication servers fell down
>
> But it seems that the comunication between the machis is perfect
> everytime I test it. So I was thinking if it is not caused by some
> network instability and some problem for the replicator reconnect to
> the remote cluster.
>
> Any ideas on this?
>
> Thanks in advance,
> --
> Diogo Biazus - diogob at gmail.com
> Móvel Consultoria
> http://www.movelinfo.com.br
> http://www.postgresql.org.br
>
--
Diogo Biazus - diogob at gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br
More information about the Pgcluster-general
mailing list