PDA

View Full Version : Simple Clustering 'Fundamental' Question


jdamnation
06-17-2010, 05:45 PM
I wonder if someone could answer a simple (rather seriously fundamental!) question I have about creating a simple cluster of two servers with Resin, I have trawled the forums, and the documentation, but can't seem to find the answer.

I have a single Windows server setup for testing, but I would like to emulate as close as possible a two server setup. By two servers I mean physically separate servers. On my single test server - rather that use different IPs, I use different TCP cluster ports.

I have put two completely separate copies of Resin on the server:

C:\ResinA

In this copy - I use the supplied default config file as a base, and specify my cluster by adding a second server like this:

<server id="app-a" address="127.0.0.1" port="6801"/>
<server id="app-b" address="127.0.0.1" port="6802"/>

I then go to C:\ResinA\Lib and go

java -jar resin.jar -server app-a console

After doing that - my 'app-a' server is now up and running. I log in to the GUI and see that 'app-a' is up, but 'app-b' is down. This is what I expect - it's fine.

Now the docs suggest that in order to run a cluster - all servers in it need to run from the same resin.xml file.

So my first question is - to do this - I need to use clustered shared storage? Or is a network file share OK? Or - does the 2nd (and additional) servers somehow locate the master server and load the config file from that? It's not 100% clear in the docs.

Anyway - to get round this bit - I simply copy the resin.xml from C:\ResinA\conf to C:\ResinB\Conf. In doing this - both servers have the 'same' conf file.

Now I have that sorted - next I go to C:\ResinB\Lib and go

java -jar resin.jar -server app-b console

My ResinB instance also fires up fine.

HOWEVER - PROBLEM.

When I log in to ResinA - it tells me that ResinB is down.
When I log in to ResinB - it tells me that ResinA is down.

So it seems like what I have created here is not actually a single cluster of two servers - but two clusters of one server!

Could someone please put me out of my misery and give me some pointers on how I turn this setup into a single cluster of two servers? Remember I want to keep my Resin installations separate as I want to pretend they exist on two physically separate servers.

Cheers,

JD

emil
06-17-2010, 06:00 PM
Hi JD,

Your setup and procedure sounds correct. It may just be a connectivity problem. Do you see anything in the logs about either server not being able to connect to the other? If not, try bumping up the logging to fine, finer, or finest.

Thanks,
Emil

ferg
06-17-2010, 06:03 PM
"same resin.xml" means the same contents. It doesn't have to be the actual same file.

The reason for the same resin.xml is because all the servers need a consistent view of the cluster. It would be confusing if server A thought there were 3 servers but server B thought there were 2.

Since you're using 127.0.0.1, are all the instances able to connect to each other? In other words, you're not using a virtual machine or something similar?

emil
06-17-2010, 06:04 PM
Hi JD,

Forgot to answer your other question: The resin.xml just needs to have the same text in it when the two servers read it. Basically Resin just grabs the network locations of the other servers from the resin.xml to talk to them. The Resin instances don't transfer configuration between each other (yet). So your copying technique works great. You could also go the networked file system route, but that can cause problems if you have dependency checking enabled. Specifically, all the Resin instances might see that the configuration changed and reboot simultaneously, which is bad for availability of course. :o

Thanks,
Emil

jdamnation
06-17-2010, 08:03 PM
OK here is something confusing.

If I have exactly the same resin.xml and the default server settings have the "*" for the IP and "8080" and "8443" for the ports - if I am running both servers locally and using different ports - how can I specify these A and B servers so that they don't use the defaults?

At the moment - when I start server B it now fails with a binding error - port in use (by server A).

I have tried to set explicit HTTP / HTTPS ports per server in the resin.xml file - but the following syntax does not work:

<server id="app-a" http address="192.168.0.58" port="8090"/>
<server id="app-b" http address="192.168.0.58" port="8091"/>

Any ideas?

JD

emil
06-17-2010, 08:53 PM
Hi JD,

Just remove the <http> port from the <server-default> and put it into each <server>. Eg.:

<server id="app-a" address="127.0.0.1" port="6800">
<http address="*" port="8080"/>
</server>

<server id="app-b" address="127.0.0.1" port="6801">
<http address="*" port="8081"/>
</server>


The <http> tags are additive, so if you have one in the <server-default>, it gets added to any <http> ports specified in the <server> tags. In other words, you can't override the <http> port in the <server>, you just have to remove it from the <server-default>.

Take care,
Emil

jdamnation
06-17-2010, 10:15 PM
Good stuff - thanks for the help on this one!

So now I have my identical resin.xml and both servers start OK now.

However - I'm back to my original problem of neither server in the cluster being able to 'see' each other.

Both servers have the same IP and are on the same copy of Windows.

They just have different 'resin' directories.

When starting with 'console' both servers give me almost constant TcpSocketLink - Failed keepalive errors.

Any ideas why I might be getting these?

Can't see anything interesting in Resin/Logs - how do I up the logging level again?

Cheers,

JD

jdamnation
06-17-2010, 10:19 PM
Ah - I got it - 'finer'.

Will take a look...

JD

jdamnation
06-17-2010, 10:46 PM
hmm - well with 'finest' I can chatter between the servers - looks like this is where it starts going awry....

Apologies if this is formatted all wrong - looks like the 'preview' feature is broke...

[10-06-17 23:34:29.729] {null-2} Hmux[app-a:2] start request
[10-06-17 23:34:29.729] {null-2} Hmux[app-a:2] 7-r switch-to-hmtp
[10-06-17 23:34:29.729] {null-2} HmtpReader[server-app-a:2] querySet AuthQuery[admin.resin,SelfEncryptedCredentials] {id:1, to:null, from:baa.app-tier.admin.resin}
[10-06-17 23:34:29.729] {null-2} HempMemoryQueue[server-2-hmtp] admin.resin@aaa.app-tier.admin.resin/AAAASlIC/Mj created
[10-06-17 23:34:29.729] {server-2-hmtp-17} HmtpWriter[server-2-hmtp] queryResult AuthResult[admin.resin@aaa.app-tier.admin.resin/AAAASlIC/Mj] {id: 1, to:baa.app-tier.admin.resin, from:null}
[10-06-17 23:34:29.729] {null-2} HmtpReader[server-app-a:2] message AddSampleMetadata[-644360406864698699,01|Resin|Request|Http Request Time Max] {to:stat@aaa.app-tier.admin.resin, from:stat@baa.app-tier.admin.resin}
[10-06-17 23:34:29.729] {null-2} HmtpReader[server-app-a:2] message AddSampleMetadata[4498272321136081893,01|Resin|Request|Http Request Active] {to:stat@aaa.app-tier.admin.resin, from:stat@baa.app-tier.admin.resin}
[10-06-17 23:34:29.729] {null-2} HmtpReader[server-app-a:2] message AddSampleMetadata[5653924528076350795,01|Resin|Request|Http Request Count] {to:stat@aaa.app-tier.admin.resin, from:stat@baa.app-tier.admin.resin}
[10-06-17 23:34:29.729] {stat@aaa.app-tier.admin.resin-18} Database[/D:/Resin/Resina/resin-data]: select id from stat_name_app_a where name=?
[10-06-17 23:34:29.729] {stat@aaa.app-tier.admin.resin-18} Database[/D:/Resin/Resina/resin-data]: select id from stat_name_app_a where name=?
[10-06-17 23:34:29.729] {stat@aaa.app-tier.admin.resin-18} Database[/D:/Resin/Resina/resin-data]: select id from stat_name_app_a where name=?
[10-06-17 23:34:30.744] {null-2} TcpSocketLink[id=2,app-a] failed keepalive (select)
[10-06-17 23:34:30.744] {null-2} HmtpWriter[server-2-hmtp] close
[10-06-17 23:34:30.744] {null-2} TcpSocketLink[id=2,app-a] closing connection TcpSocketLink[id=null-2,null,CLOSED], total=4
[10-06-17 23:34:51.024] {hmtp-aaa-to-baa-19} HmtpWriter[[app-a->app-b:1]] message RequestStartupUpdates[jid=cluster-cache@aaa.app-tier.admin.resin,delta=-3062134] {to:cluster-cache@baa.app-tier.admin.resin, from:cluster-cache@aaa.app-tier.admin.resin}
[10-06-17 23:34:52.457] {resin-38} javax.management.AttributeNotFoundException: No such attribute: OpenFileDescriptorCount
at com.sun.jmx.mbeanserver.PerInterface.getAttribute( PerInterface.java:63)
at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute( MBeanSupport.java:216)
at javax.management.StandardMBean.getAttribute(Standa rdMBean.java:358)
at com.caucho.jmx.MBeanWrapper.getAttribute(MBeanWrap per.java:146)
at com.caucho.jmx.AbstractMBeanServer.getAttribute(Ab stractMBeanServer.java:600)
at com.caucho.server.admin.JmxStatAttribute.sample(Jm xStatAttribute.java:56)
at com.caucho.server.admin.StatServiceImpl$Sample.sam ple(StatServiceImpl.java:855)
at com.caucho.server.admin.StatServiceImpl.sampleData (StatServiceImpl.java:575)
at com.caucho.server.admin.StatServiceImpl.sample(Sta tServiceImpl.java:551)
at com.caucho.server.admin.StatServiceImpl.handleAlar m(StatServiceImpl.java:660)
at com.caucho.util.Alarm.handleAlarm(Alarm.java:453)
at com.caucho.util.Alarm.run(Alarm.java:425)
at com.caucho.util.ThreadPool$PoolThread.runTasks(Thr eadPool.java:901)
at com.caucho.util.ThreadPool$PoolThread.run(ThreadPo ol.java:866)

emil
06-17-2010, 11:39 PM
Hi DJ,

Interesting... It might be a windows thing or possibly your JDK. What version of Windows and Java do you have installed? I can try looking at this tomorrow.

Thanks,
Emil

jdamnation
06-18-2010, 09:38 AM
Thanks, - was sort of worried you might ask that!

This is a Windows 7 VM running on virtual box, on a Mac. I had previously tried to get resin to compile on the Mac but ran in to some issues. Also for production, we're going to be using Windows servers at least in the short term.

For now, I just wanted to get a basic dev setup on my local Windoze install, just so I could test the clustering out, plus the compatibility of some of the php stuff.

I'm on the 1.6.20 JDK - but do you think that Windows 7 is a non starter?

I guess I should probably have installed a Windows 2003 vm - but I've had resin working fine before in single server mode so I assumed it wouldn't be a problem, and that I just screwed something up in the config.

So perhaps I forget clustering on this setup and move on to testing the php stuff?

JD

emil
06-18-2010, 05:58 PM
Hi JD,

We definitely support Windows. :) Looking at the error you posted, it's possible that the newest JDK changed some of the non-standard APIs we depend on. I'm looking at it today. It might be reasonable to try out the clustering on your Mac OS or Linux/Unix box/image until we address the other issue, at least for experimentation or learning purposes. If you want to deploy on Windows eventually though, that's certainly something we want to allow you to do.

Thanks,
Emil

jdamnation
06-18-2010, 06:10 PM
Thanks again Emil,

Could you let me know the version of jdk that is known to work? I could try using that.

Getting Resin compiled on my Mac was a real pain and I think would only generate more annoying forum posts. :)

Cheers,

JD

emil
06-21-2010, 05:06 PM
Hi JD,

I was able to get Resin to run fine with JDK 1.6.20 on Vista. Could you check you task manager and make sure that all the old Resin processes are dead? Especially the Watchdog. It's possible that if you changed the configuration while the watchdog was running, the Resin processes wouldn't talk to each other, so you might need to shutdown everything and restart.

Emil