[Pgcluster-general] Instability replicating over wan

Diogo Biazus diogob at gmail.com
Wed May 23 17:25:16 UTC 2007


Sorry for posting it again, but is there anyone who could give me some advice?

On 5/18/07, Diogo Biazus <diogob at gmail.com> wrote:
> Hi there,
> I just started using pgcluster and I must say I'm impressed with the
> project, although it lacks some docs (I'll try to write something in
> portuguese later) it was pretty easy to setup and get it replicating.
>
> My network layout:
> I have 2 servers communicating over a VPN: cluster_poa and
> cluster_rio. The machines are on different cities, but the average
> ping time is good: 53.504 ms
> I'm not using the load balancer, people on rio de janeiro access only
> the cluster_rio, and people on porto alegre access only the
> cluster_poa, and I'm using only one replicator on cluster_poa.
> I'm using the version 1.7.rc7 on both machines. Below I put my config
> files for both:
>
> >>>>>>>>>>>>>> cluster_poa <<<<<<<<<<<<<<<<<<
> * /etc/hosts
> 192.168.2.8     cluster_rio
> 192.168.2.8     rep_rio
> 192.168.200.100 cluster_poa
> 192.168.200.100 rep_poa
>
> * cluster.conf
> <Replicate_Server_Info>
>         <Host_Name>             rep_poa </Host_Name>
>         <Port>                  8001                           </Port>
>         <Recovery_Port>         8101                            </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name>                     cluster_poa             </Host_Name>
> <Recovery_Port>         7001                            </Recovery_Port>
> <Rsync_Path>                    /usr/bin/rsync                  </Rsync_Path>
> <Rsync_Option>                  ssh                             </Rsync_Option>
> <Rsync_Compress>                yes
> </Rsync_Compress>
> <Pg_Dump_Path>                  /usr/local/pgsql/bin/pg_dump    </Pg_Dump_Path>
> <When_Stand_Alone>              read_only
> </When_Stand_Alone>
> <Replication_Timeout>           1 min
> </Replication_Timeout>
> <LifeCheck_Timeout>             20s
> </LifeCheck_Timeout>
> <LifeCheck_Interval>            21s
> </LifeCheck_Interval>
>
> * pgreplicate.conf
> <Cluster_Server_Info>
>     <Host_Name>                 cluster_poa     </Host_Name>
>     <Port>                      5432                            </Port>
>     <Recovery_Port>             7001                            </Recovery_Port>
> </Cluster_Server_Info>
> <Cluster_Server_Info>
>     <Host_Name>                 cluster_rio     </Host_Name>
>     <Port>                      5432                            </Port>
>     <Recovery_Port>             7001                            </Recovery_Port>
> </Cluster_Server_Info>
> <Host_Name>                     rep_poa         </Host_Name>
> <Replication_Port>              8001
> </Replication_Port>
> <Recovery_Port>         8101                            </Recovery_Port>
> <RLOG_Port>                     8301                            </RLOG_Port>
> <Response_Mode>         normal                          </Response_Mode>
> <Use_Replication_Log>           no
> </Use_Replication_Log>
> <Replication_Timeout>           1min
> </Replication_Timeout>
> <LifeCheck_Timeout>             10s
> </LifeCheck_Timeout>
> <LifeCheck_Interval>            15s
> </LifeCheck_Interval>
> <Log_File_Info>
>         <File_Name>             /tmp/pgreplicate.log    </File_Name>
>         <File_Size>             1M                      </File_Size>
>         <Rotate>                3                       </Rotate>
> </Log_File_Info>
>
>
>
> >>>>>>>>>>>>>> cluster_rio <<<<<<<<<<<<<<<<<<
> * /etc/hosts
> 192.168.2.8     cluster_rio
> 192.168.2.8     rep_rio
> 192.168.200.100 cluster_poa
> 192.168.200.100 rep_poa
>
> * cluster.conf
> <Replicate_Server_Info>
>         <Host_Name>             rep_poa </Host_Name>
>         <Port>                  8001                            </Port>
>         <Recovery_Port>         8101                            </Recovery_Port>
> </Replicate_Server_Info>
> <Host_Name>                     cluster_rio             </Host_Name>
> <Recovery_Port>         7001                            </Recovery_Port>
> <Rsync_Path>                    /usr/bin/rsync                  </Rsync_Path>
> <Rsync_Option>                  ssh                             </Rsync_Option>
> <Rsync_Compress>                yes
> </Rsync_Compress>
> <Pg_Dump_Path>                  /usr/local/pgsql/bin/pg_dump    </Pg_Dump_Path>
> <When_Stand_Alone>              read_only
> </When_Stand_Alone>
> <Replication_Timeout>           2 min
> </Replication_Timeout>
> <LifeCheck_Timeout>             20s
> </LifeCheck_Timeout>
> <LifeCheck_Interval>            21s
> </LifeCheck_Interval>
>
> My problem:
> I'm having some stability problems. It works very well and then
> sometimes (I couldn't find out any pattern, it seems random to me) the
> cluster_rio stops seeing the replicator and the whole cluster restarts
> and put in the log:
>
> LOG:  server process (PID 11017) was terminated by signal 11
> LOG:  terminating any other active server processes
> LOG:  all server processes terminated; reinitializing
>
> When this starts happening it wont stop till I restart the whole
> cluster and the replicator.
> If I restart only the cluster_rio postmaster it will not comunicate
> with the replicator and give lots of:
> ERROR:  This query is not permitted when all replication servers fell down
>
> But it seems that the comunication between the machis is perfect
> everytime I test it. So I was thinking if it is not caused by some
> network instability and some problem for the replicator reconnect to
> the remote cluster.
>
> Any ideas on this?
>
> Thanks in advance,
> --
> Diogo Biazus - diogob at gmail.com
> Móvel Consultoria
> http://www.movelinfo.com.br
> http://www.postgresql.org.br
>


-- 
Diogo Biazus - diogob at gmail.com
Móvel Consultoria
http://www.movelinfo.com.br
http://www.postgresql.org.br


More information about the Pgcluster-general mailing list