Blog

Pakistan – 5 Years later

Pakistan – 5 years later Due to a series of events and responsibilities, I wasn’t able to make a visit to my homeland for a while, more specifically, I hadn’t visited my hometown, Islamabad for 5 years. I’m no one special – your average Pakistani engineer working abroad with strong family, emotional and patriotic ties to this beautiful city & country. Ok, maybe I’m a little more optimistic than the average Pakistani, but sadly, this visit is one that is steadily replacing the optimism with a more cynical and pessimistic view.

Islamabad, a city known for being 26km away from Pakistan, is starting to resemble a city under siege, yet the citizens of this city no longer remind you of the city’s namesake. There are road barriers and police checkpoints at nearly every intersection, but no one is being checked, and nothing is being protected. This has just become part of life for people living in the capital of the world’s second largest producer of fabric, with a population of 180 million people. The reason for the indifference is quite simple – they have more important things to worry about – like how they will keep their children warm while its -5C outside and there is no gas in the pipes to power the heaters, or how they will feed those children. I’m fortunate enough to not have to go through this struggle, and I am also knowledgeable enough to understand the need for heightened security – a quick glance at the local news channel reminds one that the country is at war with a faceless enemy. But, I have to believe that there is a better solution to this problem. The nation needs to set its priorities, and its needs to do it quickly.

January 27, 2014
Visualization

Visualization is considered one of the best forms of learning something new or training for something. Many believe that it is important to visualize goals and achievements throughout your training in order to reach ones full potential. I feel that it can also be applied to daily tasks and long term happiness. Take that in combo with my earlier post quoting Socrates and you’ve got a successful recipe.

So what am I visualizing at this moment
Dinner
Watch movie, Go to bed.

Long term. Dog, move, completion of specs, keeping up with email.

October 23, 2013
Understanding Incident Management
Recently, I’ve been involved with Incident Management for large scale services, and I feel it’s still a part of the tech industry that is still largely unexplored and could do with improvement. The next few posts are going to be focused on this topic – staring from API level monitoring up to processes for incident management. You would be surprised where the challenges lie. Let’s start with our basic flow:
1. Discover Error
2. Run Recovery
3. Did Recovery succeed?
4. Any other recoveries?
5. All/Partial recovery fail
6. Escalate to Engineer
7. Resolve Error
8. [Engineer] Update Recovery
9. Post Mortem
Again this is our basic flow, and there are many different ways this problem could flow, and because of it’s ability to generalize across many industries, there are also standards around incident management that I’ll also go over in my posts. Today – lets focus on steps 5 – 7.

Most times, if we know a certain set of actions will resolve the problem, or reduce noise without impact to performance etc, we will add it as recovery step for the error. i.e if A failed, run B to fix A. This is the simplest of solutions and of course begs the theoretical question of “A should never fail, focus should be on fixing A’. I’m not going to get into that, I’m going to focus on other more important questions – Did running B change how A runs? Did anyone notice B run? How did it A look when was B was Run? How long did it run for?… list goes on. Now partial recovery of a component is very common, and the problem with that is we dont have a well defined Success criteria. Component’s need to have well defined Red-Yellow-Green statuses. That is how most component operate: Red means total stoppage/failure, Yellow means degraded or partial failure, and Green is flowing/operational. Thats step 1 – Identify how my component works. Always define success criteria This is your min bar, any perf/behavior out of bounds of this is either a partial or total failure.

Escalating to the Engineering. This is the easy part – my assumption is everyone knows who owns which area.

Last but not least – Resolving error. This is a time of great relief – and one can only hope there was minimal customer or worse – SLA impact. After the work done by the Engineer – it will either boil down to configuration issue or code bug. If it’s a configuration issue, SOP’s must be put in place to prevent another failure, and if it’s a code bug the scenario should be appropriately addressed. Actions must to be taken even after a partial failure.
October 23, 2013
Sometimes taking a break from the norm is the right thing

So the last couple of weeks have seen some changes in my daily routine from e.g. a year ago. These have been positive changes, and there are too many people to thank for that, but I’m sure they know who they are. Starting from the running, to the cycling, to the moving, and to the relaxing 🙂

And I’ve been more in flux lately, so it was the old anchors that were put in place and a forced break coming up that’s helping me keep a straight course.

(more…)

October 17, 2013
May 26, 2013