[Pgcluster-general] Remote replication problem.

Lia Domide lia.domide at codemart.ro
Tue Feb 19 16:13:00 UTC 2008


 

         Hello everybody,

 

  I am testing cybercluster and I encounter a new problem, at the
replication level.

  My current configuration has 2 nodes (gn1 and gn2), on each of them a
replication service, a database and a load balancer are installed. In
/etc/hosts I defined:On gn1: 0.0.0.0 lb1 cyborg1 gn1 On gn2: 0.0.0.0 lb2
cyborg2 gn2

  Gn1 and gn2 are the DB services, cyborg1 and cyborg2 are the replication
services and lb1 and lb2 are the load balancers. The replication level is
"reliable" (not "normal").

  I used some manually added log messages in cybercluster's code, and I
found the following hanging scenario:

       LB (lb1 or lb2) sends an INSERT query toward gn1, when this DB uses
the remote replications service, cyborg2, it hangs. Gn1 hangs in method
"PGR_recv_replicate_result" as it waits infinitely for a response from the
remote replication service (cyborg2). Cyborg2 itself hangs in method
"send_replicate_packet_to_server", in switch (header->cmdType), for an S
(sync message). 

      In most of the cases, the insert query goes on a valid path: lb (1 or
2) --> gn1 (DB) --> cyborg1 (local replication) --> gn2 (for the "reliable"
scenario, waits until this DB returns OK also). Or, the similar GOOD
scenario: lb (1 or 2) --> gn2 --> cyborg2 (local replication) --> gn1
(remote DB). BUT sometimes it used the BAD scenarios: lb --> gn1 --> cyborg2
, or lb -->gn2 -->cyborg1 , when the services hang, waiting for something.

 

Is there something wrong that I am doing? I've also tried to use in
etc/hosts file the local IP, instead of 0.0.0.0, but the same hanging
situations appeared. When using IP in /etc/hosts, also some latency appeared
when services responded for usual, not replicated queries (e.g. select).

 

 

Some logs: from gn2 (db service):

LOG:  PGR_Send_Replicate_Command after PGR_recv_replicate_result() result:
128

LOG:  PGR_Send_Replicate_Command after while (send_replicate_packet()) 2

LOG:  PGR_Send_Replicate_Command 16

LOG:  PGR_Send_Replicate_Command LEAVE OK result: 0x8425440 '8,57,0,1'

LOG:  PGR_Send_Input_Message result 8,57,0,1

LOG:  PGR_Send_Input_Message ENTER command:} of type: S

LOG:  PGR_Send_Input_Message before PGR_Send_Replicate_Command

LOG:  PGR_Send_Replicate_Command ENTER

LOG:  PGR_Send_Replicate_Command label retry_send_prereplicate_packet:

LOG:  PGR_Send_Replicate_Command left while() loop

LOG:  PGR_Send_Replicate_Command before PGR_recv_replicate_result()

LOG:  PGR_Send_Replicate_Command after PGR_recv_replicate_result() result:
128

LOG:  PGR_Send_Replicate_Command after while (send_replicate_packet()) 2

LOG:  PGR_Send_Input_Message ENTER command:}R of type: P

LOG:  PGR_Send_Input_Message ENTERed replicated query

LOG:  PGR_Send_Input_Message ENTER command:}n of type: B

LOG:  PGR_Send_Input_Message ENTERed replicated query

LOG:  PGR_Send_Input_Message ENTER command:} of type: D

LOG:  PGR_Send_Input_Message ENTERed replicated query

LOG:  PGR_Send_Input_Message ENTER command:} of type: E

LOG:  PGR_Send_Input_Message ENTERed replicated query

 

>From cyborg1 (remote replication service):

 

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() from_host: 0.0.0.0 from_port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() actual host: 0.0.0.0 actual port: 5432

2008-02-19 17:36:38 [24286] DEBUG:source host

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal() is same
host...0.0.0.0

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() from_host: 0.0.0.0 from_port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() actual host: 192.168.123.132 actual port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal() NOT same
host...0.0.0.0

2008-02-19 17:36:38 [24286] DEBUG:getTransactionTbl(): thread_id=-1216660592

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal():
thread_send_cluster thread created: -1218815088

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): waiting
for thread -1218815088 to finish

2008-02-19 17:36:38 [24286] DEBUG:start thread_send_cluster()

2008-02-19 17:36:38 [24286] DEBUG:thread_send_cluster(): before
send_replicate_packet_to_server

2008-02-19 17:36:38 [24286] DEBUG:send_replicate_packet_to_server() to host
gn2, num= 2

2008-02-19 17:36:38 [24286] DEBUG:send_replicate_packet_to_server() header->
cmdType = E

2008-02-19 17:36:38 [24286] DEBUG:thread_send_cluster():return value from
send_replicate_packet_to_server() is 0

2008-02-19 17:36:38 [24286] DEBUG:thread_send_cluster():pthread_exit[1]

2008-02-19 17:36:38 [24286] DEBUG:thread_send_source() session: 0x807d028

2008-02-19 17:36:38 [24286] DEBUG:thread_send_source()
session->PGR_Response_Inf: 0x8073740

2008-02-19 17:36:38 [24286] DEBUG:end thread_send_source()

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal returns 0

2008-02-19 17:36:38 [24286] DEBUG:replicate_loop(): before PGRread_packet()

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before select

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): after select(11 + 1) rtn
= 1

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before recv

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): after recv r = 564

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before PGRread_query

2008-02-19 17:36:38 [24286] DEBUG:PGRread_query(): before recv

2008-02-19 17:36:38 [24286] DEBUG:PGRread_query(): after recv r = 1

2008-02-19 17:36:38 [24286] DEBUG:replicate_loop(): after PGRread_packet()
query: ''

2008-02-19 17:36:38 [24286] DEBUG:replicate_loop(): before PGRread_packet()

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before select

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): after select(11 + 1) rtn
= 1

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before recv

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): after recv r = 564

2008-02-19 17:36:38 [24286] DEBUG:PGRread_packet(): before PGRread_query

2008-02-19 17:36:38 [24286] DEBUG:PGRread_query(): before recv

2008-02-19 17:36:38 [24286] DEBUG:PGRread_query(): after recv r = 1

2008-02-19 17:36:38 [24286] DEBUG:replicate_loop(): after PGRread_packet()
query: ''

2008-02-19 17:36:38 [24286] DEBUG:PGRdo_replicate():query ::

2008-02-19 17:36:38 [24286] DEBUG:cmdSts=P

2008-02-19 17:36:38 [24286] DEBUG:cmdType=S

2008-02-19 17:36:38 [24286] DEBUG:rlog=0

2008-02-19 17:36:38 [24286] DEBUG:port=5432

2008-02-19 17:36:38 [24286] DEBUG:pid=16993

2008-02-19 17:36:38 [24286] DEBUG:from_host=0.0.0.0

2008-02-19 17:36:38 [24286] DEBUG:dbName=TEST

2008-02-19 17:36:38 [24286] DEBUG:userName=postgres

2008-02-19 17:36:38 [24286] DEBUG:recieve sec=1203435398

2008-02-19 17:36:38 [24286] DEBUG:recieve usec=113867

2008-02-19 17:36:38 [24286] DEBUG:query_size=1

2008-02-19 17:36:38 [24286] DEBUG:request_id=149

2008-02-19 17:36:38 [24286] DEBUG:replicate_id=0

2008-02-19 17:36:38 [24286] DEBUG:recovery_status=0

2008-02-19 17:36:38 [24286] DEBUG:sem_lock [SemID] req

2008-02-19 17:36:38 [24286] DEBUG:sem_lock [SemID] got it

2008-02-19 17:36:38 [24286] DEBUG:sem_unlock[SemID]

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() from_host: 0.0.0.0 from_port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() actual host: 0.0.0.0 actual port: 5432

2008-02-19 17:36:38 [24286] DEBUG:source host

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal() is same
host...0.0.0.0

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() from_host: 0.0.0.0 from_port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): before
PGRis_same_host_packet_tbl() actual host: 192.168.123.132 actual port: 5432

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal() NOT same
host...0.0.0.0

2008-02-19 17:36:38 [24286] DEBUG:getTransactionTbl(): thread_id=-1216660592

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal():
thread_send_cluster thread created: -1218815088

2008-02-19 17:36:38 [24286] DEBUG:replicate_packet_send_internal(): waiting
for thread -1218815088 to finish

2008-02-19 17:36:38 [24286] DEBUG:start thread_send_cluster()

2008-02-19 17:36:38 [24286] DEBUG:thread_send_cluster(): before
send_replicate_packet_to_server

2008-02-19 17:36:38 [24286] DEBUG:send_replicate_packet_to_server() to host
gn2, num= 2

2008-02-19 17:36:38 [24286] DEBUG:send_replicate_packet_to_server() header->
cmdType = S

2008-02-19 17:36:38 [24286] DEBUG:send_p_sync begin

2008-02-19 17:36:38 [24286] DEBUG:send_p_sync after if 1

2008-02-19 17:36:38 [24286] DEBUG:send_p_sync after if 2

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://pgfoundry.org/pipermail/pgcluster-general/attachments/20080219/ae201df8/attachment-0001.html 


More information about the Pgcluster-general mailing list