[Pgcluster-general] Replication while interface is brought down

K, Niranjan (NSN - IN/Bangalore) niranjan.k at nsn.com
Mon Jun 30 07:19:15 UTC 2008


Hi,
 
Further to the logs which was attached in the previous mails, i was debugging the PgCluster and found the following behaviour for the hanging of the test app.
The 'PQexec()' will poll() with timeout '-1' ('inifinite') unless there is something to be read on the socket file descriptor or it syscall is interrupted. This is in the client app side (mainly the library ../interfaces/libpq-fe). The poll() syscall will get blocked for about ~15 minutes after the network interface is brought down and after that PQexec(dbConn, "BEGIN") is called. The poll() will return with return value '1'. My question is who will write into this file descriptor and what is written? Why will it take 15 minutes to write. 
Is this a known problem or is there any workaround to deal with this problem. Could you please let me know this.
 
In the server side, there will be connection requests keep coming for the dsn 'template1'. 
 
gdb backtrack  - 
#0  0x00a5b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00b33dbd in poll () from /lib/tls/libc.so.6
#2  0x00134be6 in pqSocketPoll (sock=6, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:1037
#3  0x00134ab6 in pqSocketCheck (conn=0x9eb1008, forRead=1, forWrite=0, end_time=-1) at fe-misc.c:979
#4  0x001349c7 in pqWaitTimed (forRead=1, forWrite=0, conn=0x9eb1008, finish_time=-1) at fe-misc.c:911
#5  0x0013499b in pqWait (forRead=1, forWrite=0, conn=0x9eb1008) at fe-misc.c:894
#6  0x001315ee in PQgetResult (conn=0x9eb1008) at fe-exec.c:1223
#7  0x00131a75 in PQexecFinish (conn=0x9eb1008) at fe-exec.c:1452
#8  0x001317c9 in PQexec (conn=0x9eb1008, query=0x8048c36 "BEGIN") at fe-exec.c:1293
#9  0x0804881b in main (argc=1, argv=0xbff2f794 "\220«ù¿") at pg_test_app.cpp:35
 
I have attached the log file again.
Check from Line 75 in the attached log.
 
Please let me know, what could be the problem as i will have to provide inputs related to the evaluation.
 
Environment:
PgCluster - 1.9.0rc5
2 ClusterDB - One in each server
1 Replicator in the Server 1 in Active mode
1 replicator in the Server 2 in cold standby mode.
 
regards, 
Niranjan 

________________________________

From: K, Niranjan (NSN - IN/Bangalore) 
Sent: Thursday, June 26, 2008 8:17 PM
To: mitani_nl at yahoo.co.jp
Subject: RE: [Pgcluster-general] Replication while interface is brought down


Hi,
 
I have attached the logs of pgreplicate & postgres. i can see there is some problem with the lock 'ShareUpdateExclusiveLock'.
Do you have any clue why this happens & is there any solution or workaround for this problem?
 
regards, 
Niranjan 


________________________________

From: pgcluster-general-bounces at pgfoundry.org [mailto:pgcluster-general-bounces at pgfoundry.org] On Behalf Of ext K, Niranjan (NSN - IN/Bangalore)
Sent: Wednesday, June 25, 2008 10:54 AM
To: mitani_nl at yahoo.co.jp; pgcluster-general at pgfoundry.org
Subject: [Pgcluster-general] Replication while interface is brought down



Hi, 

I was checking the synchronous replication scenarios. I have a test application, which reads counter (COUNTER column) from the table (COUNTER_TABLE) and increments the counter and updates the table. This will be done in a loop. I have attached the test app for your reference.

<<test_app.cpp>> 
When the test_app is in the loop, I bring down the standby node's interface (ifconfig eth0 down). With this, the test_app in the active node hangs at the SELECT statement and this hang lasted for ~15 minutes and then the updation resumed after that. I have configured 'Replication_timeout' as 50 seconds.

Proceeding further to the above, I brought back the interface up on the standby node but the replication did not happen.

Are these known issues? And are there any workarounds to deal with these problems? 

regards, 
Niranjan 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080630/920571ac/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postgresql-2008-06-29_230642.csv
Type: application/octet-stream
Size: 178757 bytes
Desc: postgresql-2008-06-29_230642.csv
Url : http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080630/920571ac/attachment-0001.obj 


More information about the Pgcluster-general mailing list