[Pgcluster-general] Lifecheck in pgreplicate issues
A. Mitani
a.mitani at sra-europe.com
Wed Feb 20 02:12:33 UTC 2008
Hi Anatoly,
Basically, I agree with your suggestion.
In addition to it, change the value of "net.ipv4.tcp_keepalive_time" is also one of the solutions
I think.
Have you tried it?
----- Original Message -----
From: "Anatoly Savchenkov" <Anatoly.Savchenkov at arc.com>
To: pgcluster-general at pgfoundry.org
Sent: 2008年2月20日 0時36分19秒 (GMT+0900) Asia/Tokyo
Subject: Re: [Pgcluster-general] Lifecheck in pgreplicate issues
Hi,
Looks like issue #2 has been addressed in pgcluster-1.7.0rc9. Thank
you! However I've not tried it yet.
I've got more details regarding issue #3 from my list. The problem
occurs because there is no timeout for PQexec operations in
send_replicate_packet_to_server() routine of pgreplicate daemon. If
connection is already established to a cluster DB it will be used
until error is detected. But in case of unexpected shutdown or network
connectivity issue (when a peer stops responding to all network
packets), it could take a lot of time (15-30 minutes or even more
depending on net.ipv4.tcp_retries2 stack parameter) to detect network
error.
While execution is inside of send_replicate_packet_to_server() routine
which runs in the context of thread_send_cluster() thread, SemID
semaphore is locked preventing other replication attempts. So, this
looks like a hang of entire cluster node for 15-30 minutes.
The issue could be easily reproduced with prepared queries (like
queries initiated with DBD::Pg with placeholders): just prepare a
query while cluster in working, execute it, sleep for some time,
(disconnect one node at this point) and try to execute again (or just
delete prepared query like in my case).
Here is a part of pgreplicate log with my comments which illustrates
the problem (lifechecking is eliminated from the log for simplicity):
At this point all nodes (10.0.0.113 and 10.0.0.118) are up. Prepared
query is processed by 10.0.0.113 node:
2008-02-14 17:39:38 [4959] DEBUG:PGRdo_replicate():query ::
2008-02-14 17:39:38 [4959] DEBUG:cmdSts=P
2008-02-14 17:39:38 [4959] DEBUG:cmdType=E
2008-02-14 17:39:38 [4959] DEBUG:rlog=0
2008-02-14 17:39:38 [4959] DEBUG:port=5584
2008-02-14 17:39:38 [4959] DEBUG:pid=4958
2008-02-14 17:39:38 [4959] DEBUG:from_host=10.0.0.113
2008-02-14 17:39:38 [4959] DEBUG:dbName=postgres
2008-02-14 17:39:38 [4959] DEBUG:userName=postgres
2008-02-14 17:39:38 [4959] DEBUG:recieve sec=1203028778
2008-02-14 17:39:38 [4959] DEBUG:recieve usec=34744
2008-02-14 17:39:38 [4959] DEBUG:query_size=6
2008-02-14 17:39:38 [4959] DEBUG:request_id=5
2008-02-14 17:39:38 [4959] DEBUG:replicate_id=0
2008-02-14 17:39:38 [4959] DEBUG:recovery_status=0
2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] req
2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] got it
2008-02-14 17:39:38 [4959] DEBUG:source host
2008-02-14 17:39:38 [4919] DEBUG:start thread_send_cluster()
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
2008-02-14 17:39:38 [4919] DEBUG:send_replicate_packet_to_server(): prepare replication
2008-02-14 17:39:38 [4919] DEBUG:send_replicate_packet_to_server(): before cmd type switch
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():return value from send_replicate_packet_to_server() is 0
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():pthread_exit[1]
2008-02-14 17:39:38 [4959] DEBUG:end thread_send_source()
2008-02-14 17:39:38 [4959] DEBUG:sem_unlock[1]
2008-02-14 17:39:38 [4959] DEBUG:PGRdo_replicate():query ::
2008-02-14 17:39:38 [4959] DEBUG:cmdSts=P
2008-02-14 17:39:38 [4959] DEBUG:cmdType=S
2008-02-14 17:39:38 [4959] DEBUG:rlog=0
2008-02-14 17:39:38 [4959] DEBUG:port=5584
2008-02-14 17:39:38 [4959] DEBUG:pid=4958
2008-02-14 17:39:38 [4959] DEBUG:from_host=10.0.0.113
2008-02-14 17:39:38 [4959] DEBUG:dbName=postgres
2008-02-14 17:39:38 [4959] DEBUG:userName=postgres
2008-02-14 17:39:38 [4959] DEBUG:recieve sec=1203028778
2008-02-14 17:39:38 [4959] DEBUG:recieve usec=36996
2008-02-14 17:39:38 [4959] DEBUG:query_size=1
2008-02-14 17:39:38 [4959] DEBUG:request_id=6
2008-02-14 17:39:38 [4959] DEBUG:replicate_id=0
2008-02-14 17:39:38 [4959] DEBUG:recovery_status=0
2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] req
2008-02-14 17:39:38 [4959] DEBUG:sem_lock [1] got it
2008-02-14 17:39:38 [4959] DEBUG:source host
2008-02-14 17:39:38 [4919] DEBUG:start thread_send_cluster()
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
2008-02-14 17:39:38 [4919] DEBUG:send_replicate_packet_to_server(): prepare replication
2008-02-14 17:39:38 [4919] DEBUG:send_replicate_packet_to_server(): before cmd type switch
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():return value from send_replicate_packet_to_server() is 0
2008-02-14 17:39:38 [4919] DEBUG:thread_send_cluster():pthread_exit[1]
2008-02-14 17:39:38 [4959] DEBUG:end thread_send_source()
2008-02-14 17:39:38 [4959] DEBUG:sem_unlock[1]
Somewhere between 17:39:40 and 17:39:50 I've disconnected 10.0.0.118
node from network by unplugging its network cable.
2008-02-14 17:39:40 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:39:40 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:39:40 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:39:40 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:39:40 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:39:42 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:39:42 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:39:42 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:39:42 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:39:42 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:39:43 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:39:43 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:39:43 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:39:43 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:39:43 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:39:47 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:39:47 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:39:47 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:39:47 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:39:47 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:39:53 [4959] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
2008-02-14 17:39:53 [4959] DEBUG:cmdSts=Q
2008-02-14 17:39:53 [4959] DEBUG:cmdType=O
2008-02-14 17:39:53 [4959] DEBUG:rlog=0
2008-02-14 17:39:53 [4959] DEBUG:port=5584
2008-02-14 17:39:53 [4959] DEBUG:pid=4958
2008-02-14 17:39:53 [4959] DEBUG:from_host=10.0.0.113
2008-02-14 17:39:53 [4959] DEBUG:dbName=postgres
2008-02-14 17:39:53 [4959] DEBUG:userName=postgres
2008-02-14 17:39:53 [4959] DEBUG:recieve sec=1203028793
2008-02-14 17:39:53 [4959] DEBUG:recieve usec=43951
2008-02-14 17:39:53 [4959] DEBUG:query_size=18
2008-02-14 17:39:53 [4959] DEBUG:request_id=7
2008-02-14 17:39:53 [4959] DEBUG:replicate_id=0
2008-02-14 17:39:53 [4959] DEBUG:recovery_status=0
2008-02-14 17:39:53 [4959] DEBUG:query=DEALLOCATE dbdpg_1
2008-02-14 17:39:53 [4959] DEBUG:sem_lock [1] req
Semaphore is free, acquire it.
2008-02-14 17:39:53 [4959] DEBUG:sem_lock [1] got it
2008-02-14 17:39:53 [4959] DEBUG:source host
2008-02-14 17:39:53 [4919] DEBUG:start thread_send_cluster()
2008-02-14 17:39:53 [4919] DEBUG:thread_send_cluster(): going to send replicate packet to server
2008-02-14 17:39:53 [4919] DEBUG:send_replicate_packet_to_server(): command is SELECT PGR_SYSTEM_COMMAND_FUNCTION(8,8,0,1)
2008-02-14 17:40:20 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:40:20 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:40:20 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:40:20 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:40:20 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:40:41 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:40:41 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:40:41 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:40:41 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:40:41 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
Replication timeout in cluster.conf is 60 seconds, so after 60 seconds
from initial replication attempt postgres issues another one. Here it is.
2008-02-14 17:40:53 [5147] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
2008-02-14 17:40:53 [4920] DEBUG:do_rlog():got result:cmdSys='R'
2008-02-14 17:40:53 [4920] DEBUG:PGRwrite_rlog():delete_query_log
2008-02-14 17:40:53 [5147] DEBUG:cmdSts=Q
2008-02-14 17:40:53 [4920] DEBUG:do_rlog():process result done:cmdSys='R'
2008-02-14 17:40:53 [5147] DEBUG:cmdType=O
2008-02-14 17:40:53 [5147] DEBUG:rlog=3
2008-02-14 17:40:53 [5147] DEBUG:port=5584
2008-02-14 17:40:53 [5147] DEBUG:pid=4958
2008-02-14 17:40:53 [5147] DEBUG:from_host=10.0.0.113
2008-02-14 17:40:53 [5147] DEBUG:dbName=postgres
2008-02-14 17:40:53 [5147] DEBUG:userName=postgres
2008-02-14 17:40:53 [5147] DEBUG:recieve sec=1203028853
2008-02-14 17:40:53 [5147] DEBUG:recieve usec=316539
2008-02-14 17:40:53 [5147] DEBUG:query_size=0
2008-02-14 17:40:53 [5147] DEBUG:request_id=7
2008-02-14 17:40:53 [5147] DEBUG:replicate_id=0
2008-02-14 17:40:53 [5147] DEBUG:recovery_status=0
2008-02-14 17:40:53 [5147] DEBUG:query=DEALLOCATE dbdpg_1
Trying to acquire the semaphore but it is still owned by 4959 thread,
so current request hangs.
2008-02-14 17:40:53 [5147] DEBUG:sem_lock [1] req
2008-02-14 17:41:02 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:41:02 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:41:02 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:41:02 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:41:02 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:41:23 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:41:23 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:41:23 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:41:23 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:41:23 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
2008-02-14 17:41:44 [4920] DEBUG:do_rlog():got result:cmdSys='S'
2008-02-14 17:41:44 [4920] DEBUG:do_rlog(): CMD_STS_TRANSACTION_ABORT
2008-02-14 17:41:44 [4920] DEBUG:reconfirm_commit():PGR_NOT_YET_COMMIT
2008-02-14 17:41:44 [4920] DEBUG:do_rlog(): SYS_CALL process done
2008-02-14 17:41:44 [4920] DEBUG:do_rlog():process result done:cmdSys='S'
Another replication attempt after another minute.
2008-02-14 17:41:53 [5196] DEBUG:PGRdo_replicate():query :: DEALLOCATE dbdpg_1
2008-02-14 17:41:53 [4920] DEBUG:do_rlog():got result:cmdSys='R'
2008-02-14 17:41:53 [4920] DEBUG:PGRwrite_rlog():delete_query_log
2008-02-14 17:41:53 [5196] DEBUG:cmdSts=Q
2008-02-14 17:41:53 [4920] DEBUG:do_rlog():process result done:cmdSys='R'
2008-02-14 17:41:53 [5196] DEBUG:cmdType=O
2008-02-14 17:41:53 [5196] DEBUG:rlog=3
2008-02-14 17:41:53 [5196] DEBUG:port=5584
2008-02-14 17:41:53 [5196] DEBUG:pid=4958
2008-02-14 17:41:53 [5196] DEBUG:from_host=10.0.0.113
2008-02-14 17:41:53 [5196] DEBUG:dbName=postgres
2008-02-14 17:41:53 [5196] DEBUG:userName=postgres
2008-02-14 17:41:53 [5196] DEBUG:recieve sec=1203028913
2008-02-14 17:41:53 [5196] DEBUG:recieve usec=653033
2008-02-14 17:41:53 [5196] DEBUG:query_size=0
2008-02-14 17:41:53 [5196] DEBUG:request_id=7
2008-02-14 17:41:53 [5196] DEBUG:replicate_id=0
2008-02-14 17:41:53 [5196] DEBUG:recovery_status=0
2008-02-14 17:41:53 [5196] DEBUG:query=DEALLOCATE dbdpg_1
2008-02-14 17:41:53 [5196] DEBUG:sem_lock [1] req
A lot of lines skipped. Finally after 15 minutes of waiting PQexec failed
with network error.
2008-02-14 17:55:21 [4919] DEBUG:send_replicate_packet_to_server(): after checking the result
2008-02-14 17:55:21 [4919] DEBUG:send_replicate_packet_to_server():PQexec send :DEALLOCATE dbdpg_1
2008-02-14 17:55:21 [4919] DEBUG:thread_send_cluster():return value from send_replicate_packet_to_server() is -1
2008-02-14 17:55:21 [4919] DEBUG:thread_send_cluster():pthread_exit[1]
2008-02-14 17:55:21 [4959] DEBUG:end thread_send_source()
Only at this point semaphore is unlocked allowing other replications
to continue.
2008-02-14 17:55:21 [4959] DEBUG:sem_unlock[1]
2008-02-14 17:55:21 [4959] DEBUG:PGRdo_replicate():query :: PGR_CLOSE_CONNECTION
2008-02-14 17:55:21 [4959] DEBUG:cmdSts=O
2008-02-14 17:55:21 [4959] DEBUG:cmdType=x
2008-02-14 17:55:21 [4959] DEBUG:rlog=0
2008-02-14 17:55:21 [4959] DEBUG:port=5584
2008-02-14 17:55:21 [4959] DEBUG:pid=4958
2008-02-14 17:55:21 [4959] DEBUG:from_host=10.0.0.113
2008-02-14 17:55:21 [4959] DEBUG:dbName=postgres
2008-02-14 17:55:21 [4959] DEBUG:userName=postgres
2008-02-14 17:55:21 [4959] DEBUG:recieve sec=1203029721
2008-02-14 17:55:21 [4959] DEBUG:recieve usec=797436
2008-02-14 17:55:21 [4959] DEBUG:query_size=21
2008-02-14 17:55:21 [4959] DEBUG:request_id=1
2008-02-14 17:55:21 [4959] DEBUG:replicate_id=0
2008-02-14 17:55:21 [4959] DEBUG:recovery_status=0
2008-02-14 17:55:21 [4959] DEBUG:query=PGR_CLOSE_CONNECTION
2008-02-14 17:55:21 [4959] DEBUG:sem_lock [1] req
2008-02-14 17:55:21 [5147] DEBUG:sem_lock [1] got it
2008-02-14 17:55:21 [5147] DEBUG:source host
2008-02-14 17:55:21 [5147] DEBUG:end thread_send_source()
2008-02-14 17:55:21 [5147] DEBUG:sem_unlock[1]
I'm not sure what should be the proper behavior in similar cases.
Actually I do not understand why it is necessary to protect with
semaphores so large amount of code. Perhaps it is possible to use
setsockopt with SO_SNDTIMEO or switch to async functions from libpq or
utilize setjmp/longjmp with alarm(). In any way I think it is
useful to limit replication time to a value of replication timeout
from pgreplicate.conf file.
Thursday, February 14, 2008, 12:21:56 PM, you wrote:
> Hi,
> I'm testing pgcluster 1.7rc8 and in normal conditions it works fine.
> But when I test cluster behavior in case of node unavailability
> (caused by either abrupt shutdown or network connectivity problems) I
> encounter several issues:
> 1. If some node is excluded from the list of available nodes by life
> checking in pgreplicate it never comes back again until pgreplicate is
> restarted. I detect node exclusion by 'port(XXXX) host:XXXX error'
> line in pgreplicate.sts file. I reviewed the source code
> (src/pgcluster/pgrp/lifecheck.c) and found that once dead node is
> never involved in life check, so it will never be put into 'use'
> state. From one point of view this looks correct because dead node
> needs to be synchronized before including it back to cluster. But this
> makes crash recovery more complex, pgreplicate should be restarted
> manually on all cluster nodes.
> The patch for this is trivial: just remove 'if (host_ptr->useFlag != DB_TBL_USE){...}'
> condition from while loop inside of lifecheck_loop() routine. But I'm
> in doubt whether this cause some side effect.
> 2. LifeCheck_Timeout parameter of pgreplicate.conf does not work as
> expected. I expect to detect node failures in 'LifeCheck_Timeout'
> seconds (or X * LifeCheck_Timeout, where X is known number of
> retries). Life checking is performed just by issuing simple query on a
> remote node and PGRcreateConn() is used to establish DB connection.
> But implementation of PGRcreateConn() relies on blocking
> PQsetdbLogin() which can hang for up to minutes depending on TCP stack
> configuration of host OS. I suggest to rewrite PGRcreateConn() to use
> PQconnectdb() with connect_timeout option set to LifeCheck_Timeout.
> This will eliminate use of alarm() and will give more predictable
> timings. Let me know if this is reasonable and will not introduce any
> issues and I'll provide a patch.
> 3. Query issued during a time interval between some node failure and
> pgreplicate detected this failure causes cluster node which receives
> the query to hang. Sometimes the node hangs for less than a minute,
> sometimes it may take more that 10 minutes, sometimes it is forever
> (or at least for more than an hour). It is important to note that I
> use DBD::Pg with prepared queries and placeholders. psql queries works
> fine. Behavior does not depend on type of query being issued. It is
> observed even on selects.
> This looks very similar to previously discussed problem with
> deadlocking when using DBD::Pg with placeholders and simultaneous DB
> write (insert/update/delete) operations for the same table into two or
> more nodes. That deadlock could be eliminated by either setting
> 'auto_lock_table' to false or switching to cybercluster. But actually
> this is the same solution because according to the source code
> cybercluster does not support 'auto_lock_table' (source code is simply
> excluded by #ifdef 0). However 'auto_lock_table' is very useful
> feature because it prevents cluster desync in case of parallel inserts
> with SERIAL columns (or perhaps in other cases as well).
> I'm not experienced with postgres internals. Do someone have any ideas
> which module could be responsible for hangs in case of DBD::Pg?
> Thank you.
--
Best regards,
Anatoly mailto:Anatoly.Savchenkov at arc.com
Unless otherwise expressly stated, this message does not create or vary any contractual relationship between you and ARC International. The contents of this e-mail may be confidential and if you have received it in error, please delete it from your system, destroy any hard copies and telephone the above number. Incoming emails to ARC may be subject to monitoring other than by the addressee. SJ2
_______________________________________________
Pgcluster-general mailing list
Pgcluster-general at pgfoundry.org
http://pgfoundry.org/mailman/listinfo/pgcluster-general
More information about the Pgcluster-general
mailing list