Archive for the 'Information Technology' Category



No Slam Dunks in IT Update

As you may recall, one of my first entries on this blog – way back on January 31 – was about change management. It was there that I introduced the “It’s Been X days since our last self-inflicted outage” ticker. At the time, we were sitting at 19 days without an incident and from there we were able to extend that counter for another 15, almost 16 days. Unfortunately, around February 15 we had to reset that counter due to some missteps in executing a series of SAN migrations and some missteps in communicating the impact and subsequent resolution to the impacted parties. In short we failed to follow through on two of the five change management steps established at the beginning of the year, we did not execute flawlessly and we did not provide timely communication after the change.

For those of you that missed that early on entry or if you just have a hard time remembering things you have read in the past 90 days, here are those five simple steps again:

Plan – make sure each change action/project we undertake is well thought out, steps are documented, risks are assessed. If disruption in service is expected, plan for when we make this change to limit the impact of the disruption.

Communicate – communicate each change action/project to the parties potentially impacted prior to executing the change

Execute – flawlessly execute according the plan developed

Test – test to make sure that the change executed resulted in the expected results and there are no unintended consequences from the change

Communicate – communicate to the potentially impacted parties that the change has been completed and tested

;

It seems so simple – just five little things to do for each change. While we still have a ways to go, we are doing a much better job of managing and executing changes. I have seen and heard much more discussions within the IT teams about potential impacts of proposed changes and much more thought around the best time to execute the change and who/what to communicate about the change. In addition, the execution of the recent changes we have made has been flawless. Since February 15th, we have been on a pretty good roll and are going on 45 days without one of those dreaded self-inflicted outages and the painful clock reset that goes along with it. I am not sure how much longer we will go without an incident, but I am happy with the improvement I have seen in how everyone on the team approaches changes. The planning, the communication and most importantly the execution is the best I have seen in years.

As part of our continued focus in this area, I wanted to do something fun to reinforce the idea of “No Slam Dunks in IT” that I reintroduced to the team in January. Since I am a basketball junkie and it is that time of year when many of “mad” for college hoops, I decided what would be better than sending each person in IT a basketball emblazoned with our company logo AND our “No Slam Dunks in IT” logo. So last week, IT staff around the globe received a little gift to help drive home the focus on change management.

Now I hadn’t budgeted for buying and sending out basketballs to 100 or so people, so I might get a question or two from our CFO who just happens to be my boss as well. But, if it helps drive home the ideas we have been pushing for the past 90 days and it results in avoiding even one self-inflicted outage then the payback on this little “fun expense” will be huge.

Oh and since I am a hoops junkie, I will leave you with my prediction for tonight’s national title game: Kansas 74 – Kentucky 71. What can I say? I am a Big XII-II homer. Rock Chalk Jayhawk!!!!

Post game update: so much for my hoops prediction. But I think I am still right on about change management.

Slam Dunks: No Such Thing in IT

There are No Slam Dunks in IT.

That’s a saying I have thrown around for close to 10 years now. But one that I think too many people in technology fail to remember on a daily basis. They get caught up in the urgency of the moment, short cut change management procedures, fail to think about the downstream impact of what they see as a minor, isolated change. All too often the mindset of “the easy change,” “the lay-up,” or “the routine lazy fly ball” ends up as an unexpected outage. That break away slam dunk clanks off the rim and bounces out of bounds. That easy two points turns into a turnover.

As we kicked off 2012, a relatively new to the company network engineer noticed that a top of rack server switch had two fiber uplinks but only one was active. Anxious to make a good impression, he wanted to resolve that issue. It was an admiral thing to do. He was taking initiative to make things better. So one night during the first week of the fresh new year, he executed a change to bring up the second uplink. Things did not go well as the change, and I will not go into the gory technical details, brought down the entire data center network. It was after standard business hours – whatever that means in today’s 24×7 business world – but the impact of that 10 minutes outage was significant. A classic case of a self-inflicted wound from not following good change management procedures.

It was actually a frustrating incident for me, because as we put together the 2012 Business Plan for Corporate Technology Services, we were asked to list the keys to success for our operations and the actions we needed to take achieve success.

THE #1 key for success listed was: Avoid self-inflicted outages and issues that take away cycles from the planned efforts and cause unplanned unavailability of our client facing solutions.

So 30 days prior I had told our CEO, CFO and the rest of the executive management team that our #1 key to success in IT was to avoid such things, yet here I was four days into the new year staring at the carnage of a self-inflicted outage.

Outages are close to a given in the world of technology. Servers will crash, switches will randomly reboot, hard drives will fail, application will act weird, redundancy will fail, and there will be maintenance efforts that we know will cause outages. Given that, every IT organization must take steps to not be the cause of even more outages. Business leaders know that there will be some level of downtime with technology – have you ever seen a 100% SLA? Rarely. It usually some 99.xx% number. But outages that are caused by the very people charged with keeping things running drives them nuts, and rightfully so.

The morning after that self-inflicted wound, I communicated out the following to every member of the IT organization:

We need to strive to make sure that we are not the cause of any unexpected outages. We must exercise good change management process and follow the five actions listed above. As our solutions and the underlying infrastructure become increasingly intertwined, we must make an extra effort to assess the potential unintended downstream (or upstream) impact as we plan the change.

When making a change we must always follow these steps:

Plan – make sure each change action/project we undertake is well thought out, steps are documented, risks are assessed. If disruption in service is expected, plan for when we make this change to limit the impact of the disruption.

Communicate – communicate each change action/project to the parties potentially impacted prior to executing the change

Execute – flawlessly execute according the plan developed

Test – test to make sure that the change executed resulted in the expected results and there are no unintended consequences from the change

Communicate – communicate to the potentially impacted parties that the change has been completed and tested

To keep this goal of avoiding self-inflicted outages top of mind, we implemented a ‘It’s Been X Days Since our Last Self-Inflicted Outage” counter. Basically taking a page out of the factory accident prevention playbook.

We had to reset it once after we implemented it, but we are now at 19 days and counting. Let’s hope that the next reset is no time soon.


//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js

(adsbygoogle = window.adsbygoogle || []).push({});

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 540 other subscribers