[Pgcluster-general] pglb stops working when at leat on postmaster dies, why ?
Holger Lehmann
Holger.Lehmann at catworkx.de
Fri Oct 26 07:22:31 UTC 2007
Hi At,
here are the requested logfiles and config files from all three machines
invloved.
I apologize for the size, but I could not help.
have compressed the files using zip. If anyone has problems receving
such large email
or wants to download the attachement using different means, please feel free to contact me directly, I will provide a download link.
BTW: I could constantly repeat this scenario. Either shutdown cluster db
1 or cluster db 2, the system stops working.
(You will see a shutdown of cluster db 2 first, then another attempt
using pgcbench and finally me shuting down all software and logging out
:-) )
Am Do 25.10.2007 02:46 schrieb a.mitani at sra-europe.com:
>Hi Holger,
>
>Thank you for interesting test and reporting.
>
>Currentry, I don't have x86_64 environment.
>Therefore, I tested on X86_32 environment as follow as yours.
>However, it has not been occured.
>
>Would you let us know the each configuration files of every servers and
>debug messages of replication server and pglb.
>
>Regards,
>------------------
>At.Mitani
>
>>I have evaluated the current pgcluster release (1.7.0rc7) under Linux
>>(x86_64).
>>The setup looks like this:
>>machine 1: pgreplicate & pglb
>>machine 2: postmaster (configured as the first machine in pglb and
>>pgreplicate)
>>machine 3: postmaster
>>
>>When I start a pgcbench -i repeatedly in a loop (20 times) everything
>>works fine. I can see connections to machine 2 and 3, so the pglb
>>process really does loadbalance the requests.
>>
>>Now, if I shutdown the postmaster on machine 2 (via pg_ctl stop),
>>during
>>(!) a pgcbench run, something wierd (bad) happens:
>>1. the current pgcbench
>>>>run?is?interrupted?(as?expected),?no?transfer?happens?to?machine?3?(what?might?work
>>eventually)
>>2. the next pgcbench run runs against machine 3 (ok, fine)
>>3.
>>>>all?next?pgcbench?runs?fail?with?the?pglb?telling?me?that?the?cluster?is?down?(what?is?not?true,?since?only?one?postmaster?"died")
>>4. pglb never recovers from this state, the only whay to recover this
>>is
>>to stop pglb, comment out the "dead" postmaster machine and restart
>>pglb
>>
>>Since pglb is supposed to be derived work from pgpool (version 1), I
>>simply added an installation of pgpool to machine 1 and configured it
>>to
>>loadbalance accordingly. Here are the, astonishing, results, when the
>>postmaster on machine 2 is shutown during a pgcbench run:
>>1. the current pgcbench
>>>>run?is?interrupted?(as?expected),?no?transfer?happens?to?machine?3?(what?might?work
>>eventually)
>>2. the next pgcbench run runs against machine 3 (ok, fine)
>>3. all?next?pgcbench?runs?run against machine 3 (fantastic)
>>4. pgpool does completely what I expected from a loadbalancer
>>
>>Furthermore is pgcluster described as follows (see
>>http://pgcluster.projects.postgresql.org/feature.html) for details:
>>"""
>>PGCluster has two functions.
>>- A load sharing function
>>?- The session load of a reference request is distributed. It is
>>effective at the Web application with which a reference request pours
>>in.
>>?- A replication object can be specified per table. When the tables
>>which receive an updating request and a reference request differ, the
>>PGCluster can distribute the table which receives an updating request
>>and can reproduce only the table which receives a reference request.
>>- A high availability function
>>?- When failure occurs in Cluster DB, a load balancer and a
>>replication
>>server separate Failure DB from a system, and continue service using
>>the
>>remaining DB. Since separation of Failure DB and continuation of
>>service
>>are performed simultaneously, most service stop time is made to 0.
>>?- The Cluster DB which repair finished can be dynamically restored to
>>a
>>system, without stopping service.
>>?- Data is automatically copied to DB restored or added from other DB.
>>The query which received during restoration isexecuted from the
>>replication server after restoration.
>>"""
>>Is the latter,?"- When failure occurs in Cluster DB, a load balancer
>>and
>>a replication server separate Failure DB from a system, and continue
>>service using the remaining DB. Since separation of Failure DB and
>>continuation of service are performed simultaneously, most service
>>stop
>>time is made to 0." not true anymore or is it simple a "bug" in the
>>current implementation ?
>>
>>Has anyone attempted this feat and succeeded ? Versions anyone ?
>>
>>I am not afraid to recompile the sourcecode with patches or bugfixes
>>:-)
>>Thanks a lot in advance :-)
>>
>>Regards,
>>Holger
>>
>>PS: Please ignore the automatically attached footer ...
>>
-------snipp-------
--
This e-mail and any attachments is confidential and solely intended for
the indicated addressee. If you are not the intended recipient or an
authorized person, please note, that any form of notice, disclosure,
reproduction or circulation of the contents of this mail is prohibited.
In this case, please immediately inform the sender of the e-mail an
destroy this e-mail. We use updated antivirus protection software. We do
not accept any responsibility for damages caused anyhow by viruses.
-
Diese Information ist ausschliesslich fuer den Adressaten bestimmt und kann
vertraulich oder gesetzlich geschuetzte Informationen enthalten. Wenn Sie nicht
der bestimmungsgemaesse Adressat sind, unterrichten Sie bitte den Absender und
vernichten Sie diese Mail.
Anderen als dem bestimmungsgemaessen Adressaten ist es untersagt, diese E-Mail
weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. Wir
verwenden aktuelle Virenschutzprogramme und Content-Filter.
Fuer Schaeden, die dem Empfaenger gleichwohl durch von uns zugesandte mit Viren
befallene E-Mails entstehen, schliessen wir jede Haftung aus.
-
This e-mail and any attachments is confidential and solely intended for the
indicated addressee. If you are not the intended recipient or an authorized
person, please note, that any form of notice, disclosure, reproduction or
circulation of the contents of this mail is prohibited. In this case, please
immediately inform the sender of the e-mail an destroy this e-mail. We use
updated antivirus protection software. We do not accept any responsibility for
damages caused anyhow by viruses.
-
catWorkX GmbH: Sitz der Gesellschaft in Hamburg, HRB: 71494, USt-IdNr.:
DE201625856, Geschaeftsfuehrung: Dipl. Kfm. Andreas Girnuweit, Dipl.-Ing. Oliver
Groht, Dr. Wolfgang Tank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20071026/90f26abc/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cwx_cluster.zip
Type: application/zip
Size: 901362 bytes
Desc: not available
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20071026/90f26abc/attachment-0001.zip
More information about the Pgcluster-general
mailing list