[Pgcluster-general] Lifecheck in pgreplicate issues

Anatoly Savchenkov Anatoly.Savchenkov at arc.com
Wed Feb 20 09:49:47 UTC 2008


Hi,

Thank you for your reply.

I have not tried to adjust keepalive settings and in general I think
it's not a good way (at least for me) to play with system wide
settings just because there are a lot of other services running on
that same machines with pgcluster and all of them will be affected by
these modifications. But as a temporary measure I'll try your
recommendation.

Wednesday, February 20, 2008, 5:12:33 AM, you wrote:

> Hi Anatoly,

> Basically, I agree with your suggestion.

> In addition to it, change the value of
> "net.ipv4.tcp_keepalive_time" is also one of the solutions
> I think.

> Have you tried it?

> ----- Original Message -----
> From: "Anatoly Savchenkov" <Anatoly.Savchenkov at arc.com>
> To: pgcluster-general at pgfoundry.org
> Sent: 2008?2?20? 0?36?19? (GMT+0900) Asia/Tokyo
> Subject: Re: [Pgcluster-general] Lifecheck in pgreplicate issues

> Hi,

> Looks like issue #2 has been addressed in pgcluster-1.7.0rc9. Thank
> you! However I've not tried it yet.

> I've got more details regarding issue #3 from my list. The problem
> occurs because there is no timeout for PQexec operations in
> send_replicate_packet_to_server() routine of pgreplicate daemon. If
> connection is already established to a cluster DB it will be used
> until error is detected. But in case of unexpected shutdown or network
> connectivity issue (when a peer stops responding to all network
> packets), it could take a lot of time (15-30 minutes or even more
> depending on net.ipv4.tcp_retries2 stack parameter) to detect network
> error.

> While execution is inside of send_replicate_packet_to_server() routine
> which runs in the context of thread_send_cluster() thread, SemID
> semaphore is locked preventing other replication attempts. So, this
> looks like a hang of entire cluster node for 15-30 minutes.

> The issue could be easily reproduced with prepared queries (like
> queries initiated with DBD::Pg with placeholders): just prepare a
> query while cluster in working, execute it, sleep for some time,
> (disconnect one node at this point) and try to execute again (or just
> delete prepared query like in my case).

> Here is a part of pgreplicate log with my comments which illustrates
> the problem (lifechecking is eliminated from the log for simplicity):

> At this point all nodes (10.0.0.113 and 10.0.0.118) are up. Prepared
> query is processed by 10.0.0.113 node:

> 2008-02-14 17:39:38 [4959] DEBUG:PGRdo_replicate():query ::
> 2008-02-14 17:39:38 [4959] DEBUG:cmdSts=P
> 2008-02-14 17:39:38 [4959] DEBUG:cmdType=E
> 2008-02-14 17:39:38 [4959] DEBUG:rlog=0
> 2008-02-14 17:39:38 [4959] DEBUG:port=5584
> 2008-02-14 17:39:38 [4959] DEBUG:pid=4958
> 2008-02-14 17:39:38 [4959] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:39:38 [4959] DEBUG:dbName=postgres
> 2008-02-14 17:39:38 [4959] DEBUG:userName=postgres
> 2008-02-14 17:39:38 [4959] DEBUG:recieve sec=1203028778
> 2008-02-14 17:39:38 [4959] DEBUG:recieve usec=34744
> 2008-02-14 17:39:38 [4959] DEBUG:query_size=6
> 2008-02-14 17:39:38 [4959] DEBUG:request_id=5
> 2008-02-14 17:39:38 [4959] DEBUG:replicate_id=0
> 2008-02-14 17:39:38 [4959] DEBUG:recovery_status=0
> 2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] req
> 2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] got it
> 2008-02-14 17:39:38 [4959] DEBUG:source host
> 2008-02-14 17:39:38 [4919] DEBUG:start thread_send_cluster()
> 2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
> 2008-02-14 17:39:38 [4919]
> DEBUG:send_replicate_packet_to_server(): prepare replication
> 2008-02-14 17:39:38 [4919]
> DEBUG:send_replicate_packet_to_server(): before cmd type switch
> 2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():return
> value from send_replicate_packet_to_server() is 0
> 2008-02-14 17:39:38 [4919]
> DEBUG:thread_send_cluster():pthread_exit[1]
> 2008-02-14 17:39:38 [4959] DEBUG:end thread_send_source()
> 2008-02-14 17:39:38 [4959] DEBUG:sem_unlock[1]
> 2008-02-14 17:39:38 [4959] DEBUG:PGRdo_replicate():query :: 
> 2008-02-14 17:39:38 [4959] DEBUG:cmdSts=P
> 2008-02-14 17:39:38 [4959] DEBUG:cmdType=S
> 2008-02-14 17:39:38 [4959] DEBUG:rlog=0
> 2008-02-14 17:39:38 [4959] DEBUG:port=5584
> 2008-02-14 17:39:38 [4959] DEBUG:pid=4958
> 2008-02-14 17:39:38 [4959] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:39:38 [4959] DEBUG:dbName=postgres
> 2008-02-14 17:39:38 [4959] DEBUG:userName=postgres
> 2008-02-14 17:39:38 [4959] DEBUG:recieve sec=1203028778
> 2008-02-14 17:39:38 [4959] DEBUG:recieve usec=36996
> 2008-02-14 17:39:38 [4959] DEBUG:query_size=1
> 2008-02-14 17:39:38 [4959] DEBUG:request_id=6
> 2008-02-14 17:39:38 [4959] DEBUG:replicate_id=0
> 2008-02-14 17:39:38 [4959] DEBUG:recovery_status=0
> 2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] req
> 2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] got it
> 2008-02-14 17:39:38 [4959] DEBUG:source host
> 2008-02-14 17:39:38 [4919] DEBUG:start thread_send_cluster()
> 2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
> 2008-02-14 17:39:38 [4919]
> DEBUG:send_replicate_packet_to_server(): prepare replication
> 2008-02-14 17:39:38 [4919]
> DEBUG:send_replicate_packet_to_server(): before cmd type switch
> 2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():return
> value from send_replicate_packet_to_server() is 0
> 2008-02-14 17:39:38 [4919]
> DEBUG:thread_send_cluster():pthread_exit[1]
> 2008-02-14 17:39:38 [4959] DEBUG:end thread_send_source()
> 2008-02-14 17:39:38 [4959] DEBUG:sem_unlock[1]

> Somewhere between 17:39:40 and 17:39:50 I've disconnected 10.0.0.118
> node from network by unplugging its network cable.

> 2008-02-14 17:39:40 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:39:40 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:39:40 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:39:40 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:39:40 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:39:42 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:39:42 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:39:42 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:39:42 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:39:42 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:39:43 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:39:43 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:39:43 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:39:43 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:39:43 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:39:47 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:39:47 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:39:47 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:39:47 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:39:47 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:39:53 [4959] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
> 2008-02-14 17:39:53 [4959] DEBUG:cmdSts=Q
> 2008-02-14 17:39:53 [4959] DEBUG:cmdType=O
> 2008-02-14 17:39:53 [4959] DEBUG:rlog=0
> 2008-02-14 17:39:53 [4959] DEBUG:port=5584
> 2008-02-14 17:39:53 [4959] DEBUG:pid=4958
> 2008-02-14 17:39:53 [4959] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:39:53 [4959] DEBUG:dbName=postgres
> 2008-02-14 17:39:53 [4959] DEBUG:userName=postgres
> 2008-02-14 17:39:53 [4959] DEBUG:recieve sec=1203028793
> 2008-02-14 17:39:53 [4959] DEBUG:recieve usec=43951
> 2008-02-14 17:39:53 [4959] DEBUG:query_size=18
> 2008-02-14 17:39:53 [4959] DEBUG:request_id=7
> 2008-02-14 17:39:53 [4959] DEBUG:replicate_id=0
> 2008-02-14 17:39:53 [4959] DEBUG:recovery_status=0
> 2008-02-14 17:39:53 [4959] DEBUG:query=DEALLOCATE dbdpg_1
> 2008-02-14 17:39:53 [4959] DEBUG:sem_lock [1] req

> Semaphore is free, acquire it.

> 2008-02-14 17:39:53 [4959] DEBUG:sem_lock [1] got it
> 2008-02-14 17:39:53 [4959] DEBUG:source host
> 2008-02-14 17:39:53 [4919] DEBUG:start thread_send_cluster()
> 2008-02-14 17:39:53 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
> 2008-02-14 17:39:53 [4919]
> DEBUG:send_replicate_packet_to_server(): command is SELECT
> PGR_SYSTEM_COMMAND_FUNCTION(8,8,0,1)
> 2008-02-14 17:40:20 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:40:20 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:40:20 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:40:20 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:40:20 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:40:41 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:40:41 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:40:41 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:40:41 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:40:41 [4920] DEBUG:do_rlog():process result done:cmdSys='S'

> Replication timeout in cluster.conf is 60 seconds, so after 60 seconds
> from initial replication attempt postgres issues another one. Here it is.

> 2008-02-14 17:40:53 [5147] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
> 2008-02-14 17:40:53 [4920] DEBUG:do_rlog():got result:cmdSys='R'
> 2008-02-14 17:40:53 [4920] DEBUG:PGRwrite_rlog():delete_query_log
> 2008-02-14 17:40:53 [5147] DEBUG:cmdSts=Q
> 2008-02-14 17:40:53 [4920] DEBUG:do_rlog():process result done:cmdSys='R'
> 2008-02-14 17:40:53 [5147] DEBUG:cmdType=O
> 2008-02-14 17:40:53 [5147] DEBUG:rlog=3
> 2008-02-14 17:40:53 [5147] DEBUG:port=5584
> 2008-02-14 17:40:53 [5147] DEBUG:pid=4958
> 2008-02-14 17:40:53 [5147] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:40:53 [5147] DEBUG:dbName=postgres
> 2008-02-14 17:40:53 [5147] DEBUG:userName=postgres
> 2008-02-14 17:40:53 [5147] DEBUG:recieve sec=1203028853
> 2008-02-14 17:40:53 [5147] DEBUG:recieve usec=316539
> 2008-02-14 17:40:53 [5147] DEBUG:query_size=0
> 2008-02-14 17:40:53 [5147] DEBUG:request_id=7
> 2008-02-14 17:40:53 [5147] DEBUG:replicate_id=0
> 2008-02-14 17:40:53 [5147] DEBUG:recovery_status=0
> 2008-02-14 17:40:53 [5147] DEBUG:query=DEALLOCATE dbdpg_1

> Trying to acquire the semaphore but it is still owned by 4959 thread,
> so current request hangs.

> 2008-02-14 17:40:53 [5147] DEBUG:sem_lock [1] req
> 2008-02-14 17:41:02 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:41:02 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:41:02 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:41:02 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:41:02 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:41:23 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:41:23 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:41:23 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:41:23 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:41:23 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
> 2008-02-14 17:41:44 [4920] DEBUG:do_rlog():got result:cmdSys='S'
> 2008-02-14 17:41:44 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
> 2008-02-14 17:41:44 [4920]
> DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
> 2008-02-14 17:41:44 [4920] DEBUG:do_rlog(): SYS_CALL process done
> 2008-02-14 17:41:44 [4920] DEBUG:do_rlog():process result done:cmdSys='S'

> Another replication attempt after another minute.

> 2008-02-14 17:41:53 [5196] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
> 2008-02-14 17:41:53 [4920] DEBUG:do_rlog():got result:cmdSys='R'
> 2008-02-14 17:41:53 [4920] DEBUG:PGRwrite_rlog():delete_query_log
> 2008-02-14 17:41:53 [5196] DEBUG:cmdSts=Q
> 2008-02-14 17:41:53 [4920] DEBUG:do_rlog():process result done:cmdSys='R'
> 2008-02-14 17:41:53 [5196] DEBUG:cmdType=O
> 2008-02-14 17:41:53 [5196] DEBUG:rlog=3
> 2008-02-14 17:41:53 [5196] DEBUG:port=5584
> 2008-02-14 17:41:53 [5196] DEBUG:pid=4958
> 2008-02-14 17:41:53 [5196] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:41:53 [5196] DEBUG:dbName=postgres
> 2008-02-14 17:41:53 [5196] DEBUG:userName=postgres
> 2008-02-14 17:41:53 [5196] DEBUG:recieve sec=1203028913
> 2008-02-14 17:41:53 [5196] DEBUG:recieve usec=653033
> 2008-02-14 17:41:53 [5196] DEBUG:query_size=0
> 2008-02-14 17:41:53 [5196] DEBUG:request_id=7
> 2008-02-14 17:41:53 [5196] DEBUG:replicate_id=0
> 2008-02-14 17:41:53 [5196] DEBUG:recovery_status=0
> 2008-02-14 17:41:53 [5196] DEBUG:query=DEALLOCATE dbdpg_1
> 2008-02-14 17:41:53 [5196] DEBUG:sem_lock [1] req

> A lot of lines skipped. Finally after 15 minutes of waiting PQexec failed
> with network error.

> 2008-02-14 17:55:21 [4919]
> DEBUG:send_replicate_packet_to_server(): after checking the result
> 2008-02-14 17:55:21 [4919]
> DEBUG:send_replicate_packet_to_server():PQexec send :DEALLOCATE
> dbdpg_1
> 2008-02-14 17:55:21 [4919] DEBUG:thread_send_cluster():return
> value from send_replicate_packet_to_server() is -1
> 2008-02-14 17:55:21 [4919]
> DEBUG:thread_send_cluster():pthread_exit[1]
> 2008-02-14 17:55:21 [4959] DEBUG:end thread_send_source()

> Only at this point semaphore is unlocked allowing other replications
> to continue.

> 2008-02-14 17:55:21 [4959] DEBUG:sem_unlock[1]
> 2008-02-14 17:55:21 [4959] DEBUG:PGRdo_replicate():query :: PGR_CLOSE_CONNECTION
> 2008-02-14 17:55:21 [4959] DEBUG:cmdSts=O
> 2008-02-14 17:55:21 [4959] DEBUG:cmdType=x
> 2008-02-14 17:55:21 [4959] DEBUG:rlog=0
> 2008-02-14 17:55:21 [4959] DEBUG:port=5584
> 2008-02-14 17:55:21 [4959] DEBUG:pid=4958
> 2008-02-14 17:55:21 [4959] DEBUG:from_host=10.0.0.113
> 2008-02-14 17:55:21 [4959] DEBUG:dbName=postgres
> 2008-02-14 17:55:21 [4959] DEBUG:userName=postgres
> 2008-02-14 17:55:21 [4959] DEBUG:recieve sec=1203029721
> 2008-02-14 17:55:21 [4959] DEBUG:recieve usec=797436
> 2008-02-14 17:55:21 [4959] DEBUG:query_size=21
> 2008-02-14 17:55:21 [4959] DEBUG:request_id=1
> 2008-02-14 17:55:21 [4959] DEBUG:replicate_id=0
> 2008-02-14 17:55:21 [4959] DEBUG:recovery_status=0
> 2008-02-14 17:55:21 [4959] DEBUG:query=PGR_CLOSE_CONNECTION
> 2008-02-14 17:55:21 [4959] DEBUG:sem_lock [1] req
> 2008-02-14 17:55:21 [5147] DEBUG:sem_lock [1] got it
> 2008-02-14 17:55:21 [5147] DEBUG:source host
> 2008-02-14 17:55:21 [5147] DEBUG:end thread_send_source()
> 2008-02-14 17:55:21 [5147] DEBUG:sem_unlock[1]


> I'm not sure what should be the proper behavior in similar cases.
> Actually I do not understand why it is necessary to protect with
> semaphores so large amount of code. Perhaps it is possible to use
> setsockopt with SO_SNDTIMEO or switch to async functions from libpq or
> utilize setjmp/longjmp with alarm(). In any way I think it is
> useful to limit replication time to a value of replication timeout
> from pgreplicate.conf file.

> Thursday, February 14, 2008, 12:21:56 PM, you wrote:

>> Hi,

>> I'm testing pgcluster 1.7rc8 and in normal conditions it works fine.
>> But when I test cluster behavior in case of node unavailability
>> (caused by either abrupt shutdown or network connectivity problems) I
>> encounter several issues:

>> 1. If some node is excluded from the list of available nodes by life
>> checking in pgreplicate it never comes back again until pgreplicate is
>> restarted. I detect node exclusion by 'port(XXXX) host:XXXX error'
>> line in pgreplicate.sts file. I reviewed the source code
>> (src/pgcluster/pgrp/lifecheck.c) and found that once dead node is
>> never involved in life check, so it will never be put into 'use'
>> state. From one point of view this looks correct because dead node
>> needs to be synchronized before including it back to cluster. But this
>> makes crash recovery more complex, pgreplicate should be restarted
>> manually on all cluster nodes.

>> The patch for this is trivial: just remove 'if
>> (host_ptr->useFlag != DB_TBL_USE){...}'
>> condition from while loop inside of lifecheck_loop() routine. But I'm
>> in doubt whether this cause some side effect.

>> 2. LifeCheck_Timeout parameter of pgreplicate.conf does not work as
>> expected. I expect to detect node failures in 'LifeCheck_Timeout'
>> seconds (or X * LifeCheck_Timeout, where X is known number of
>> retries). Life checking is performed just by issuing simple query on a
>> remote node and PGRcreateConn() is used to establish DB connection.
>> But implementation of PGRcreateConn() relies on blocking
>> PQsetdbLogin() which can hang for up to minutes depending on TCP stack
>> configuration of host OS. I suggest to rewrite PGRcreateConn() to use
>> PQconnectdb() with connect_timeout option set to LifeCheck_Timeout.
>> This will eliminate use of alarm() and will give more predictable
>> timings. Let me know if this is reasonable and will not introduce any
>> issues and I'll provide a patch. 

>> 3. Query issued during a time interval between some node failure and
>> pgreplicate detected this failure causes cluster node which receives
>> the query to hang. Sometimes the node hangs for less than a minute,
>> sometimes it may take more that 10 minutes, sometimes it is forever
>> (or at least for more than an hour). It is important to note that I
>> use DBD::Pg with prepared queries and placeholders. psql queries works
>> fine. Behavior does not depend on type of query being issued. It is
>> observed even on selects.

>> This looks very similar to previously discussed problem with
>> deadlocking when using DBD::Pg with placeholders and simultaneous DB
>> write (insert/update/delete) operations for the same table into two or
>> more nodes. That deadlock could be eliminated by either setting
>> 'auto_lock_table' to false or switching to cybercluster. But actually
>> this is the same solution because according to the source code
>> cybercluster does not support 'auto_lock_table' (source code is simply
>> excluded by #ifdef 0). However 'auto_lock_table' is very useful
>> feature because it prevents cluster desync in case of parallel inserts
>> with SERIAL columns (or perhaps in other cases as well).

>> I'm not experienced with postgres internals. Do someone have any ideas
>> which module could be responsible for hangs in case of DBD::Pg?

>> Thank you.





-- 
Best regards,
 Anatoly                            mailto:Anatoly.Savchenkov at arc.com 
Unless otherwise expressly stated, this message does not create or vary any contractual relationship between you and ARC International. The contents of this e-mail may be confidential and if you have received it in error, please delete it from your system, destroy any hard copies and telephone the above number. Incoming emails to ARC may be subject to monitoring other than by the addressee. SJ2  
 
 
 
 
 
 


More information about the Pgcluster-general mailing list