[Pgcluster-general] Problems after data nodes looses connectivity

Pshem Kowalczyk pshem.k at gmail.com
Thu Jun 21 05:27:29 UTC 2007


Hi,

I have 3 data nodes, 1 replicator and 1 loadbalancer. Postgresql 8.2.4
and corresponding patch.
The whole setup works nicely when all nodes are online, but when I
ifdown one of the interfaces on the data nodes - things go really bad.

Scenario
- I start a simple perl script on the loadbalancer inserting one row
per second and printing times of inserts
- I shut down the network interface in data3 - the inserts stop
- I unshut the newtork interface - inserts resume
- database on data1 doesn't have  all the rows:

testdb1=> select count(*) from t1;
 count
-------
    13
(1 row)

but its not in read only mode:

testdb1=> insert into t1 values (24, 'row 24');
INSERT 0 1

and the changes replicate to other nodes

- database on data2 has all the rows (and is in read-write mode):

testdb1=> select count(*) from t1;
 count
-------
    23
(1 row)

- database on data3 doesn't have all the rows (and is in read-only mode)

testdb1=# select count(*) from t1;
 count
-------
    13
(1 row)

But accepts changes from other nodes.

Obviously the behaviour of data2 is unacceptable - even if it got
de-synced by accident it should get switched into read-only mode.

Log from the inserting script:
# ./insert.pl
1182401599 row 1
1182401600 row 2
1182401601 row 3
1182401602 row 4
1182401603 row 5
1182401604 row 6
1182401605 row 7
1182401606 row 8
1182401607 row 9
1182401608 row 10
1182401609 row 11
1182401610 row 12
1182401625 row 13 <==== please notice the 15 sec gap
1182401626 row 14
1182401627 row 15
1182401628 row 16
1182401629 row 17
1182401630 row 18
1182401631 row 19
1182401632 row 20
1182401633 row 21
1182401634 row 22
1182401635 row 23


If I shut down data3 gracefully it gets removed properly and all
things seems to work ok.



My configuration is below:
/etc/hosts
127.0.0.1       localhost

10.23.254.115   loadbalancer
10.23.254.116   replicator
10.23.254.117   data1
10.23.254.118   data2
10.23.254.119   data3


# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


pglb.conf
<Cluster_Server_Info>
    <Host_Name>                 data1                   </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>                 data2                   </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>                 data3                   </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
</Cluster_Server_Info>
<Host_Name>                     loadbalancer                    </Host_Name>
<Backend_Socket_Dir>            /var/run/postgresql/
</Backend_Socket_Dir>
<Receive_Port>                  5432                            </Receive_Port>
<Recovery_Port>                 6001                            </Recovery_Port>
<Max_Cluster_Num>               128
</Max_Cluster_Num>
<Use_Connection_Pooling>        no
</Use_Connection_Pooling>
<LifeCheck_Timeout>             1s
</LifeCheck_Timeout>
<LifeCheck_Interval>            2s
</LifeCheck_Interval>
<Log_File_Info>
        <File_Name>             /tmp/pglb.log   </File_Name>
        <File_Size>             1M              </File_Size>
        <Rotate>                3               </Rotate>
</Log_File_Info>

pgreplicate.conf
<Cluster_Server_Info>
    <Host_Name>                 data1           </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
    <Recovery_Port>             7001            </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>                 data2           </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
    <Recovery_Port>             7001            </Recovery_Port>
</Cluster_Server_Info>
<Cluster_Server_Info>
    <Host_Name>                 data3           </Host_Name>
    <Port>                      5432                    </Port>
    <Max_Connect>               32                      </Max_Connect>
    <Recovery_Port>             7001            </Recovery_Port>
</Cluster_Server_Info>
<LoadBalance_Server_Info>
        <Host_Name>             loadbalancer                    </Host_Name>
        <Recovery_Port>         6001                            </Recovery_Port>
</LoadBalance_Server_Info>
<Host_Name>                     replicator              </Host_Name>
<Replication_Port>              8001                    </Replication_Port>
<Recovery_Port>                 8101                    </Recovery_Port>
<RLOG_Port>                     8301                    </RLOG_Port>
<Response_Mode>                 normal                  </Response_Mode>
<Use_Replication_Log>           no                      </Use_Replication_Log>
<Replication_Timeout>           10s                     </Replication_Timeout>
<LifeCheck_Timeout>             2s                      </LifeCheck_Timeout>
<LifeCheck_Interval>            3s                      </LifeCheck_Interval>
<Log_File_Info>
        <File_Name>             /tmp/pgreplicate.log    </File_Name>
        <File_Size>             1M                      </File_Size>
        <Rotate>                3                       </Rotate>
</Log_File_Info>


data1:
<Replicate_Server_Info>
        <Host_Name>             replicator                      </Host_Name>
        <Port>                  8001                            </Port>
        <Recovery_Port>         8101                            </Recovery_Port>
</Replicate_Server_Info>
<Host_Name>                     data1                           </Host_Name>
<Recovery_Port>                 7001                            </Recovery_Port>
<Rsync_Path>                    /usr/bin/rsync                  </Rsync_Path>
<Rsync_Option>                  ssh                             </Rsync_Option>
<Rsync_Compress>                yes
</Rsync_Compress>
<Pg_Dump_Path>                  /usr/bin/pg_dump                </Pg_Dump_Path>
<When_Stand_Alone>              read_only
</When_Stand_Alone>
<Replication_Timeout>           10s                     </Replication_Timeout>
<LifeCheck_Timeout>             2s                      </LifeCheck_Timeout>
<LifeCheck_Interval>            3s                      </LifeCheck_Interval>

data2:
<Replicate_Server_Info>
        <Host_Name>             replicator                      </Host_Name>
        <Port>                  8001                            </Port>
        <Recovery_Port>         8101                            </Recovery_Port>
</Replicate_Server_Info>
<Host_Name>                     data2                           </Host_Name>
<Recovery_Port>                 7001                            </Recovery_Port>
<Rsync_Path>                    /usr/bin/rsync                  </Rsync_Path>
<Rsync_Option>                  ssh                             </Rsync_Option>
<Rsync_Compress>                yes
</Rsync_Compress>
<Pg_Dump_Path>                  /usr/bin/pg_dump                </Pg_Dump_Path>
<When_Stand_Alone>              read_only
</When_Stand_Alone>
<Replication_Timeout>           10s                     </Replication_Timeout>
<LifeCheck_Timeout>             2s                      </LifeCheck_Timeout>
<LifeCheck_Interval>            3s                      </LifeCheck_Interval>

data3:
<Replicate_Server_Info>
        <Host_Name>             replicator                      </Host_Name>
        <Port>                  8001                            </Port>
        <Recovery_Port>         8101                            </Recovery_Port>
</Replicate_Server_Info>
<Host_Name>                     data3                           </Host_Name>
<Recovery_Port>                 7001                            </Recovery_Port>
<Rsync_Path>                    /usr/bin/rsync                  </Rsync_Path>
<Rsync_Option>                  ssh                             </Rsync_Option>
<Rsync_Compress>                yes
</Rsync_Compress>
<Pg_Dump_Path>                  /usr/bin/pg_dump                </Pg_Dump_Path>
<When_Stand_Alone>              read_only
</When_Stand_Alone>
<Replication_Timeout>           10s                     </Replication_Timeout>
<LifeCheck_Timeout>             2s                      </LifeCheck_Timeout>
<LifeCheck_Interval>            3s                      </LifeCheck_Interval>


Log files of pglb and pgreplicator are attached.

kind regards
Pshem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgrepliacate-failed.log
Type: application/octet-stream
Size: 160525 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20070621/2f420d8a/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pglb-failed.log
Type: application/octet-stream
Size: 35015 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20070621/2f420d8a/attachment-0003.obj 


More information about the Pgcluster-general mailing list