Caucho Forums  

This forum is permanently closed because of spam. For free community support, please visit Google Groups:


Go Back   Caucho Forums > Resin

Reply
 
Thread Tools Display Modes
  #1  
Old 10-18-2012, 08:45 AM
agarcial agarcial is offline
Junior Member
 
Join Date: Oct 2012
Posts: 3
Unhappy Problem configuring session sharing in cluster

Hi there,

I'm using resin 4.0.23 and I'm having trouble getting shared sessions in a cluster to behave as I want.

The symptoms are:
  1. Two servers form the cluster and share sessions between each other.
  2. The application works well when I kill the server where the requests for a given session id are being handled. The other server picks up correctly and no data is lost. The change is transparent to the client.
  3. The problem comes when I restart the broken node as the user gets a connection error. The reason why this happens is that the node takes some time in loading the webapp's spring context and thus the application is unable to respond. The client then gets a connection error. If I wait long enough (about 2 minutes), for the application to be completely up, the session is correctly retrieved from the server that was up and the application starts behaving correctly again. Unfortunately the fact that during those 2 minutes the application is completely unresponsive is not acceptable in terms of quality of service.

What I would like to achieve is that the requests are all sent to the server that's up until the other one has COMPLETELY finished it's loading process.

Two solutions come to my mind:
  1. I can disable sticky sessions so that there is no real master for the stored jsessionid. What I hope to achieve is avoiding the LoadBalancer to send the requests into the starting server until it has finished loading everything. Here, I would have liked to use the <session-sticky-disable> tag, but apparently there is a problem in the xml reference (http://www.caucho.com/resin-4.0/reference.xtp#cluster) because if I put it under the <cluster> resin crashes with a syntax error at startup. Also, I wonder if it's the right solution as I'm unable to find an example of the usage of this tag on the net.
  2. Introducing somewhere (I wouldn't know which parameter I should use?) a latency for the starting server to "announce" himself as available. I kind of "sleep for 5 minutes until the context has been loaded". It would be even better if the resin instance was able to until the context was up to announce himself as available to the cluster. I have looked at the load-balancing-socked-timeout and load-balancing-socked-recover parameters but I'm not sure this is the good way to go.

My configuration for the cluster part is the following:
HTML Code:
<server id="node2" address="127.0.0.1" port="6806">
        <watchdog-port>6601</watchdog-port>
</server>
<server id="node1" address="127.0.0.1" port="6805">
        <watchdog-port>6600</watchdog-port>
</server>

<persistent-store type="cluster">
    <init path="cluster"/>
</persistent-store>

<web-app-default>
  <session-config use-persistent-store="true">
      <session-timeout>60</session-timeout>
      <session-max>4096</session-max>
      <always-save-session>true</always-save-session>
      <save-mode>before-headers</save-mode>
  </session-config>
</web-app-default>

Any help would be greatly appreciated.

Thanks,

Alex.
Reply With Quote
  #2  
Old 10-19-2012, 05:12 PM
ferg ferg is offline
Administrator
 
Join Date: Aug 2009
Posts: 190
Default

I've filed a bug report.

But to clarify: the issue isn't a session sharing issue. The problem is the load balancer dispatching to a server that hasn't finished initialization. Is that correct?
Reply With Quote
  #3  
Old 10-22-2012, 08:36 AM
agarcial agarcial is offline
Junior Member
 
Join Date: Oct 2012
Posts: 3
Default

Hi,

Yes, sorry if the title I put was missleading. The session sharing feature works well. It's the LoadBalancer that behaves incorrectly switching to a server that's still loading.

In the meanwhile I have also tried the following parameter, <load-balance-warmup-time>, that appears in the reference as the way to go in this kind of situations. Once the server is back-up, the cluster would wait for <load-balance-warmup-time> seconds before sending requests to it. This time is easy to estimate as we know the time it takes for the context to load in each machine. Unfortunately I have tried it and it has no effect whatsoever.

Before a bug-fix is made or some other solution is provided I have partially settled the issue using the <load-balance-recover-time> parameter which specifies the time the cluster will wait before checking if a node is back alive. With it, I can tell the cluster not to check the status of the fallen server until it has been made fully available again.

I say partially solved as this time can only be roughly estimated. Between the moment a server dies and the moment it is actually up again a lot of things happen: the monitoring alerts of the issue, the exploitation team restarts the resin server (if the machine itself is not dead) or replaces the machine and then restarts the resin server, the server restarts.....

Thanks in advance for your insight,

Alex.
Reply With Quote
  #4  
Old 10-22-2012, 10:43 PM
ferg ferg is offline
Administrator
 
Join Date: Aug 2009
Posts: 190
Default

That sounds accurate.

The <bind-ports-after-startup/> is a bit tricky. With 4.0 it only applies to the HTTP ports, because the cluster port needs to be open earlier for cluster things like sessions. Since that same port is used for the load balancing, it has the behavior you're seeing.
Reply With Quote
Reply

Tags
clustering, session sharing, sticky sessions

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 05:12 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.