Recently I became the administrator of a virtual network via VMware. I've only been working with it for about a year, but thing were going pretty good. The installation of the virtual network was a tunr-key job. Dell was contracted to set it up and I was given the reigns.
Things were going great, I was able to spin up as many servers as needed. Both my boss and I were happy. Well, there was a power outage last December. And the virtual network went down. Upon receiving power again, it seemed the switches that connect the hosts to the storage area network (basically the computers to their hard drives), lost their configuration. I couldn't power on any of my virtual servers. So no Active Directory, DHCP, File hosting, web hosting, etc. Basically we were dead in the water.
Queue an 18 hour long session that ended in an emergency remote re-configuration of our switches to get the storage area network to talk to the VMware hosts. Eventually, everything came back. And all were happy. That is until a few months down the line. Right now we didn't need to spin up any new servers. So we were just sitting on the virtual network letting things just coast. When we DID decided to spin up some new servers, we discovered a unique problem:
Doing anything in VMware that would temporarily tax the system would bring every single server offline for 10 to 15 minutes. Migrations of datastores, vMotion of virtual machines, cloning to template, deploying FROM template, you name it. There's a chance that anything could cause every server to just drop off the network for a minimum of ten minutes.
Now, having these servers down for ten minutes is a pretty big inconvenience... but nothing more than users upset they can't get to their files and whatnot. Where this becomes a REAL problem is that the IT department here is expected to host the control system for the actual plant production software on VMware. This system absolutely can not allow random 10 minute drop outs. This software will be controlling valve pressure, levels, and all sorts of things that require constant and stable connection.
So eventually, I turn to Dell. The people that originally set up this system. And at first, they were super helpful. They worked to try and find what my problem actually might be. They tried a few things specifically geared towards my issue, like checking the Delayed ACK and LRO settings on the hosts and such. But that didn't turn up anything. It seems at this point, Dell lost all interest in this case. The next FIVE WEEKS they sent me on pointless errands of collecting data from the system. Logs collections out the ass. Multiple PuTTy logs from countless diagnostics from my switches. Readouts of settings from the hosts. Requiring a bunch of third party programs like SAN Headquarters that needed to run for a week straight collecting data, so that they would have stuff to look at. And I understand that they need time to sift through and check all this data. I really do. But they spent over a month making me get them. And you know what the big payoff was? They sent me a PDF document. It contained was was clearly a template with just values plugged in from those logs about how to set things in my environment to best practice. Not a damn line in that document actually addressed the original issue at hand. There was a meeting we had with Dell after that, and my boss and I tried our best to get them focused on the actual problem. I'd be glad to get my system up to best practice once the real issue was resolved, not before. And it there's no reason it shouldn't meet best practice anyways, Dell was the ones who originally configured it!
So we've been going back and forth with Dell for, now, eight weeks. Two months and they've just been passing the buck from one group to another. And I have to start over from square one every time they do, explaining everything I've typed over again. The entire time, I've pushed to try and get someone from Dell on site to stay here until it has been fixed, but they've refused. This Saturday, however, they decided to have a remote session to tap into the system and try and knock this out. Funny thing was, they passed the buck off to this guy who's going to be doing it... without telling him what he's even doing. I just got off the phone with him an hour ago. He though he was doing an upgrade. They just threw it in this guy's lap and walked off. So once again, I had to explain everything. He was blown away with how long this has taken and that he wasn't informed what the actual issue was. I just about broke my phone when I asked him if he knew what my problem even was, and he said no.
Luckily, after explaining everything above this sentence to him, he started asking questions that were actually targeted towards my problems. This is the first time in weeks someone has done so. So I'll be coming into work Saturday (since it's not a business day and most people won't be on site [because this will require some server down time]) and he'll be trying to knock out the issue. If this doesn't work, I got one of the original Dell representatives assigned to this case to promise me to have someone here, on site, to stay here till it is fixed. I have to have VMware in a state that I can tell my boss, and my boss' boss where this system can reliably handle the new servers for the plant control software by the end of august. And I don't know if I can. Dell's apathy and insensitivity to this issue has utterly surprised me. I honestly have no idea how they've been able to get away with not doing anything other than give me the run around for two months straight. This is Comcast level support.
TL;DR
Dell sucks.
_________________
"Belief extremely stately towards great accomplishment." -eruperade
|