[Pgcluster-general] pglb stops working when at leat on postmaster dies, why ?

Gábriel Ákos akos.gabriel at i-logic.hu
Thu Oct 25 00:32:30 UTC 2007


On Thu, 25 Oct 2007 02:46:34 +0200 (CEST)
a.mitani at sra-europe.com wrote:

Hi At,

More and more signs are showing that pgcluster has some problems on
x86_64. Most likely the pointer arithmetic differences and/or smp
locking problems are showing up. As before, I offer you to have a test
environment on one of our x86_64 SMP servers. If you would like, use it.

Best regards,
Akos Gabriel

> Hi Holger,
> 
> Thank you for interesting test and reporting.
> 
> Currentry, I don't have x86_64 environment.
> Therefore, I tested on X86_32 environment as follow as yours.
> However, it has not been occured.
> 
> Would you let us know the each configuration files of every servers
> and debug messages of replication server and pglb.
> 
> Regards,
> ------------------
> At.Mitani
> 
> > I have evaluated the current pgcluster release (1.7.0rc7) under
> > Linux (x86_64).
> > The setup looks like this:
> > machine 1: pgreplicate & pglb
> > machine 2: postmaster (configured as the first machine in pglb and
> > pgreplicate)
> > machine 3: postmaster
> >
> > When I start a pgcbench -i repeatedly in a loop (20 times)
> > everything works fine. I can see connections to machine 2 and 3, so
> > the pglb process really does loadbalance the requests.
> >
> > Now, if I shutdown the postmaster on machine 2 (via pg_ctl stop),
> > during (!) a pgcbench run, something wierd (bad) happens:
> > 1. the current pgcbench
> > run?is?interrupted?(as?expected),?no?transfer?happens?to?machine?3?(what?might?work
> > eventually)
> > 2. the next pgcbench run runs against machine 3 (ok, fine)
> > 3.
> > all?next?pgcbench?runs?fail?with?the?pglb?telling?me?that?the?cluster?is?down?(what?is?not?true,?since?only?one?postmaster?"died")
> > 4. pglb never recovers from this state, the only whay to recover
> > this is to stop pglb, comment out the "dead" postmaster machine and
> > restart pglb
> >
> > Since pglb is supposed to be derived work from pgpool (version 1), I
> > simply added an installation of pgpool to machine 1 and configured
> > it to loadbalance accordingly. Here are the, astonishing, results,
> > when the postmaster on machine 2 is shutown during a pgcbench run:
> > 1. the current pgcbench
> > run?is?interrupted?(as?expected),?no?transfer?happens?to?machine?3?(what?might?work
> > eventually)
> > 2. the next pgcbench run runs against machine 3 (ok, fine)
> > 3. all?next?pgcbench?runs?run against machine 3 (fantastic)
> > 4. pgpool does completely what I expected from a loadbalancer
> >
> > Furthermore is pgcluster described as follows (see
> > http://pgcluster.projects.postgresql.org/feature.html) for details:
> > """
> > PGCluster has two functions.
> > - A load sharing function
> > ?- The session load of a reference request is distributed. It is
> > effective at the Web application with which a reference request
> > pours in.
> > ?- A replication object can be specified per table. When the tables
> > which receive an updating request and a reference request differ,
> > the PGCluster can distribute the table which receives an updating
> > request and can reproduce only the table which receives a reference
> > request.
> > - A high availability function
> > ?- When failure occurs in Cluster DB, a load balancer and a
> > replication server separate Failure DB from a system, and continue
> > service using the remaining DB. Since separation of Failure DB and
> > continuation of service are performed simultaneously, most service
> > stop time is made to 0. ?- The Cluster DB which repair finished can
> > be dynamically restored to a system, without stopping service.
> > ?- Data is automatically copied to DB restored or added from other
> > DB. The query which received during restoration isexecuted from the
> > replication server after restoration.
> > """
> > Is the latter,?"- When failure occurs in Cluster DB, a load
> > balancer and a replication server separate Failure DB from a
> > system, and continue service using the remaining DB. Since
> > separation of Failure DB and continuation of service are performed
> > simultaneously, most service stop time is made to 0." not true
> > anymore or is it simple a "bug" in the current implementation ?
> >
> > Has anyone attempted this feat and succeeded ? Versions anyone ?
> >
> > I am not afraid to recompile the sourcecode with patches or
> > bugfixes :-) Thanks a lot in advance :-)
> >
> > Regards,
> > Holger
> >
> > PS: Please ignore the automatically attached footer ...
> >
> > --
> > This e-mail and any attachments is confidential and solely intended
> > for the indicated addressee. If you are not the intended recipient
> > or an authorized person, please note, that any form of notice,
> > disclosure, reproduction or circulation of the contents of this
> > mail is prohibited. In this case, please immediately inform the
> > sender of the e-mail an destroy this e-mail. We use updated
> > antivirus protection software. We do not accept any responsibility
> > for damages caused anyhow by viruses.
> >
> > -
> > Diese Information ist ausschliesslich fuer den Adressaten bestimmt
> > und kann
> > vertraulich oder gesetzlich geschuetzte Informationen enthalten.
> > Wenn Sie nicht
> > der bestimmungsgemaesse Adressat sind, unterrichten Sie bitte den
> > Absender und
> > vernichten Sie diese Mail.
> > Anderen als dem bestimmungsgemaessen Adressaten ist es untersagt,
> > diese E-Mail
> > weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu
> > verwenden. Wir
> > verwenden aktuelle Virenschutzprogramme und Content-Filter.
> > Fuer Schaeden, die dem Empfaenger gleichwohl durch von uns
> > zugesandte mit Viren
> > befallene E-Mails entstehen, schliessen wir jede Haftung aus.
> > -
> > This e-mail and any attachments is confidential and solely intended
> > for the
> > indicated addressee. If you are not the intended recipient or an
> > authorized
> > person, please note, that any form of notice, disclosure,
> > reproduction or circulation of the contents of this mail is
> > prohibited. In this case, please
> > immediately inform the sender of the e-mail an destroy this e-mail.
> > We use updated antivirus protection software. We do not accept any
> > responsibility for
> > damages caused anyhow by viruses.
> > -
> > catWorkX GmbH: Sitz der Gesellschaft in Hamburg, HRB: 71494,
> > USt-IdNr.: DE201625856, Geschaeftsfuehrung: Dipl. Kfm. Andreas
> > Girnuweit, Dipl.-Ing. Oliver
> > Groht, Dr. Wolfgang Tank
> > _______________________________________________
> > Pgcluster-general mailing list
> > Pgcluster-general at pgfoundry.org
> > http://pgfoundry.org/mailman/listinfo/pgcluster-general
> >
> 
> _______________________________________________
> Pgcluster-general mailing list
> Pgcluster-general at pgfoundry.org
> http://pgfoundry.org/mailman/listinfo/pgcluster-general
> 


-- 
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabriel at i-logic.hu|Web:  http://www.i-logic.hu =-
-=Tel/fax:+3612391618            |Mobil:+36209278894          =-


More information about the Pgcluster-general mailing list