Friday, March 9, 2012

Is SQL 2000 clustering cr*p??

Howdy,

I have had questions from management about whether SQL 2000 clustering on windows 2000 is any good.

We are looking at spending *quite* a bit of money to implement it, but I want a "from the trenches" opinion of what its like , from the people that actually use it & look after it.

e.g.

DOES IT WORK LIKE IT SHOULD????
Is it reliable?
Is it resource hungry?
Can I trust it to do what its supposed to do?

All repsonses very welcome. No response too small.

Thanks for your help.

Cheers,

SG.Originally posted by sqlguy7777
All repsonses very welcome. No response too small.


SELECT LEN('What are you going to use it for?')|||I support a SQL 2000 cluster, and I'm quite happy with it.

Before spending a bunch of money on a cluster, the first question to ask yourself is what type of disasters are you expecting it to protect you from?

Clustering is primarily protection from hardware and operating system failure. And with RAID disks, you're probably already protected from most disk failures. Depending on your server, you may also be protected from single failures of netowrk cards and processors (though with some performance degredation until you resolve the problem)

Opinions on the stability of Windows are all over the map. You're milage may vary.

Whether or not clustering will protect your applicaiton is a murkier question. Since there will only be one copy of your databases, you will *not* be protected from DB corruption unless you implement further protections beyond basic clustering. Whether or not and subsidary services you write will be protected depends on their implementation.

If you have any specific questions, shoot.|||I have been watching 3 SQL 2000 clusters for around 2 years, now. Only advice I have for you is to put the quorum on its own physical device. I have had one of my clusters fail over when the transaction log got hit hard one day. After a day on the phone with MS, they pointed me to one sentence in one article in one section of technet, and promptly washed their hands. I am not sure if you can move a quorum after it has been created.

Oh, and one other thing. may you never face a quorum corruption problem. Quorum disks can still go bad.|||Originally posted by MCrowley
I have been watching 3 SQL 2000 clusters for around 2 years, now. Only advice I have for you is to put the quorum on its own physical device. I have had one of my clusters fail over when the transaction log got hit hard one day. After a day on the phone with MS, they pointed me to one sentence in one article in one section of technet, and promptly washed their hands. I am not sure if you can move a quorum after it has been created.

Oh, and one other thing. may you never face a quorum corruption problem. Quorum disks can still go bad.

One thing they don't tell you (at least not that I could ever find) is to be sure to turn off write caching on your quorum drive. We had many uncommanded failovers until we did this. Although I can't claim to know for a fact that this is the cause, it seems reasonable that the node owning the quorum writes some data to it, then signals the other node via the heartbeat network that the other node can acquire the quorum. Only problem is, the controller hasn't really put the data on disk yet, so the other node either can't read the quorum quick enough (the controller won't let it have it) or it doesn't see what it's expecting to see. Either way it then forces a failover. Whether or not this scenario is is what is really going on or not, as soon as we turned off write caching on the quorum drive, our uncommanded failovers ceased.|||Howdy,

Thanks everyone so far for your thoughts.

Brett - its mainly for supporting multiple databases ( direct access from desktop ) and some mission critical apps ( access via web system ) .

We wanted to split the cluster ( and mirror quorum drive using SAN ) across 2 computer rooms such that should one computer room die ( as has happened in the past ) the cluster will keep running.

What physical architecture do you run and what problems ( if any ) have you had with the cluster and why? Is it worth the money?

I'm interested if we are making a rod for our own backs, but we need resiliance for our systems.
Log shipping is out of the question by the way..

Thanks,

SG.|||Keep in mind there is only ONE copy of each DB. You won't get a copy of each DB in each computer room. The databases go on shared DASD, which is pretty much going to be in one room or the other. In our case, our DASD (a SAN-attached IBM FAStT 500 storage controller) has redundant power supplies on separate power circuits, you might be able to power your own DASD similarly from each room. But clustering does not give you two mirrored copies of your DB, a la one on each node.

Originally posted by sqlguy7777
Howdy,

Thanks everyone so far for your thoughts.

Brett - its mainly for supporting multiple databases ( direct access from desktop ) and some mission critical apps ( access via web system ) .

We wanted to split the cluster ( and mirror quorum drive using SAN ) across 2 computer rooms such that should one computer room die ( as has happened in the past ) the cluster will keep running.

What physical architecture do you run and what problems ( if any ) have you had with the cluster and why? Is it worth the money?

I'm interested if we are making a rod for our own backs, but we need resiliance for our systems.
Log shipping is out of the question by the way..

Thanks,

SG.|||I'd keep a warm standby...how long can you be out...or can't you?

Are you dealing with trades?|||Howdy,

We are running 24x7 apps that need to be up.

I thought ( possibly naively ) that we could have a server cluster but also pysically split the SAN so half of it is in one computer room , the other half in the other computer room, and each part of the SAN was an exact mirror of the other ( using hardware mirroring ).

Then, if we lost one computer room, the cluster would just flick over to the part of the SAN and a server in the other room.

Sounds simple in theory.....is it possible??


Thanks for oyur help so far

SG.|||The trick is going to be finding a disk controller that will allow you to do this. You'll need a pair of such controllers, one in each room, that will either communicate with each other in lock-step, or share managing a set of RAID arrays that are build in RAID 10 with one set in one room and the mirror pair in the other.

Some hardware genius is going to have to locate that for you. I'm not sure it exists in the Wintel world, but it might.

What's the failure mode whereby you lose an entire computer room?

Originally posted by sqlguy7777
Howdy,

We are running 24x7 apps that need to be up.

I thought ( possibly naively ) that we could have a server cluster but also pysically split the SAN so half of it is in one computer room , the other half in the other computer room, and each part of the SAN was an exact mirror of the other ( using hardware mirroring ).

Then, if we lost one computer room, the cluster would just flick over to the part of the SAN and a server in the other room.

Sounds simple in theory.....is it possible??


Thanks for oyur help so far

SG.|||Howdy,

Well, we seem to push boundaries on most things so...*sigh*

Q. WHat is Failure Mode called whereby we lose entire computer room ?

A. rapidly locate nearest place that serves alcohol. Stay there AT LEAST 2 days or until they find you.Deny all knowledge...

I guess the concept is borne from the fact our computer room has COMPLETELY been taken off the air in the past.

For 3 hours.

Ouch......

Cheers,

SG.|||If you are making plans for losing a computer room you may as well make plans for losing the building... the extra cost involved probably wouldn't be very significant considering the value that it would add.

I work for a few banks and they all have offsite failover systems.|||Howdy

Sadly, no offsite capabilty ( hey, nuts I know but I'm just the hired help...) so we have to assume one computer room gets nailed and the other one will take over.....

Like I mentioned, as usual...pushing the boundaries......

Have you seen any SANs where we could use hardware mirroring to replicate all data ( including quorum ) ?

Cheers,

SG.|||I know the banks that I have worked for have hot-hot swap over so if one room is lost they don't lose anything, the applications don't even pause as far as the users are concerned,...

I have no idea how it is done though, I just write the code and let the network guys and the system admins take care of all that. ;)|||Howdy

Any chance of finding out how they do it please?

Cheers,

SG

No comments:

Post a Comment