Craig Mattson (Personal Website)
Home - Blog, News, About MePrograms - C#.Net, Java, VB6MusicWebsites

My Blog

Hey! Don't forget the people! (14/12/2009 08:38:06 PM)
To those who read my blog, it's nothing new when I talk about how much of an arduous task it is to work through the Software Development Life Cycle to deliver on a project. Even for what appears to be the smallest of tasks (for instance, a small spreadsheet-style application for storing small volumes of data), a considerable amount of analysis is required to ensure successful sign-off. Successful and useful analysis generally flows to a good system design which flows on to a good useful system. It sounds simple enough, right? So why then, particularly lately, am I stuck in a position where systems all around (and some with heavy investment) failing? I've done a bit of ad-hoc research into why systems are failing and the common pitfall seems to come back to Human Computer Interaction (HCI).

This has been something that has plagued some of the ingenuity of systems I've developed myself, but seemingly more of an issue now than back when systems were fairly basic (think: Windows 95 / 98 / XP). One system I have worked on recently involved the automation of data entered by multiple users (particularly customers). Whilst a considerable amount of time went into analysing business requirements, and proper coding practices were followed - the system took off to a rocky start (as most systems do and expected). Initial problems were to do with performance issues, accuracy of data and the inability to do something that was easier to achieve under an old system. None-the-less, these clients were quite positive about moving into a new system (albeit, they didn't have much of an old system in the first place). With centralising the system, the real time saving was the entire end-to-end process. Whilst certain parts of the system (particularly in the initial transactions) were slower than under a previous system, when the automation component kicks in - time savings were considerable (or so we significantly over-estimated).

So why did this happen? Well, the application was data-driven and the platform was a website. A big no-no when heavy amounts of data are required. Whilst an application may have made the initial transaction quicker, the small volume of data processed isn't much of a concern - so we had to look elsewhere. We optimised Stored Procedures, Code-Behind, created reports on alternative platforms, pulled features out etc… Whilst we were quite happy with the performance we were achieving, the client was still dissatisfied. It turns out one of the most basic fundamental principals of the system were overlooked, or what may have been something we shadowed when discussing automation.

The fact is, whilst the system automates and validates most input, the client wasn't looking for 99% accuracy, they were looking for 100% accuracy. This is fair enough, and working towards 100% was what we were hoping to achieve. With initial problems with the system, they were also expecting the system to store data inaccurately. This was proven not to be the case. The system stores exactly what you tell it. If you try to go around the perimeter of the application, or feed it garbage - of-course the data returned is going to be garbage.

The issue was quite simple and could have been rectified if we still were working on the system. They wanted to check all data coming into the system just to ensure that any mistakes could be rectified before doing anything with the data. As we are taking large volumes of data to glance over (say 94% of the input was accurate, with the most common thing - incorrect spelling of suburbs or road names making up the remaining 6%), the best way to present that is in an editable grid (or something like excel). This means you could download 100 or so records, check out, fix the data within the data grid and submit changes once you're satisfied. Quite a simple task - and oddly, something that was provided in the primitive nature of their existing system.

The point of this is to suggest how something so simple can lead to a significant waste of time. So what happens when we scale up the project from a small management system to a large state network? Maybe something like Myki (a Smart-Card Ticketing System for Public Transport in Melbourne).

Myki itself as far as I am concerned is already a failed project and not because of the usefulness (or teething issues), but because it's already 3 years overdue and way over budget. For such a simple idea, there seems to be fundamental issues with the implementation and HCI component of Myki. The two most common issues at the moment are to do with the Decision Engine for fare calculation and how long it takes to swipe a card. Myki is similar to my example above in that the end-to-end process is substantially quicker than Metcard (a metallic zone-based card system) - but particular issues are preventing users from appreciating these benefits.

Let's talk about the decision engine for fare calculation. There are quite clearly issues (such as double charges, incorrect zone charges etc…). On paper, it sounds easy - let's take a GPS device, develop some software around the GPS to pick up where the transport is, pick up the destination point on the GPS and away you go. Based on the start point and end point, you can calculate what zone you are in and away you go. This is fine for something like Melbourne's 2 Zone system (i.e. Zone 1 and Zone 2 cover the entire Melbourne Suburbs from Pakenham to Werribee and Epping / Hurstbridge in the North). There are points where Zone 1 and 2 overlap - but those are relatively straight forward to pick up.

Where Myki is failing is in the country on Bus Routes (where the primary testing is). Whilst again, on paper, you could calculate how many Kilometers have travelled, how many "stops" have been passed on the journey etc… There are points on a map you can mark as a stop. If you were to code it in an exact form, what happens if the transport is 1m ahead of the point you mark as the charge point? You could increase the variance of the stop (such as a radius of 300m?). This may be OK in a country town where stops are in excess of 1km apart, but what about the ones that are not? You end up with significant overlaps which could cause the system to confuse particular stops as being more than actually covered. Another issue may be to do with what happens if a GPS is slightly faulty? What if, the GPS signal returned goes 20km away from where you are (promptly followed by targeting back to where you are). The system may assume some 40km was covered thus you should be charged accordingly! These types of anomalies will only be uncovered through huge testing. I'm sure the developers behind Myki have seen what they would consider tiny issues as well.

In regards to swiping on and off. A contactless reader should realistically take less than a second to validate and away you go. In peak hour, you could probably move 40 people per minute (compare that to 15 with Metcard at the moment). The problem is, the boasted time saving isn't happening right now. Why? There could be multiple explanations from the system taking a moment to recognise, find an active connection to the network etc… or maybe it's to do with how the user is using the system? With Metcard, you swipe your card, the card comes out, you take the ticket with you and you can read on your ticket the expiry date / time and the zones you are allowed to travel in. You also have, if you buy multi-use cards, the ability to see how much "credit" you have left. But you do this away from the system as it's printed on the Metcard. With the recent media hype of the lengthy delays in using Myki, it is of no surprise that people are taking their time when using Myki. My "usage" of Myki certainly suggests a fairly instant procedure (much quicker than a validator impacting my ticket to print the expiry date when boarding a bus). You swipe, and the Myki reader returns some information. Hold on, returning information? On a screen? That isn't portable?

The issue doesn't seem to be due to the hardware. Granted, slow network traffic won't help - but maybe the issue is (due to the media hype in particular) that people are trying to read the information on the screen because if they don't, they have no other way of accessing it! Maybe the best thing to do is remove the screens and information all together - OR - replace the messages with a big smily face that indicates your ticket was successful. With the number of internet-capable phones and free wifi hotspots (and internet kiosks) around, maybe the solution to presenting the information is by SMS to a Mobile Phone, E-Mail, an Application (like Tram-Tracker for the Apple iPhone).

Human Computer Interaction is a big component that seems to be lacking in detail through lots of software development courses. At least, in my course I completed, the HCI component was a pile of garbage. Trying to research how a user will use a system isn't exactly easy either. Having a "tester" for instance, particularly a qualified one may yield different results compared to someone who has no idea about testing. One component of Project Management I feel many people have misunderstood was the concept of research one-on-one and a survey / discussion. The basics have been one-on-one is most useful to get targeted answers whereas a survey could be used as a preliminary discussion. What they seem to forget is that whilst a group discussion may yield only one or two main points, those one or two main points could be mission critical to the success of the project. In Myki's instance, if a group of every-day transport users were given a prototype to look at (just in the way on using the card as part of human instinct), they may have found a more appropriate way to present the interface.

So yeah - next time you are analysing something, ensure you consider your users and how they will use the system beyond the basic keyboard / mouse interaction. It may save a lot of hassles down the track. Similarly, don't get too disheartened if your system fails in User Acceptance Testing (UAT) phases. Chances are, finding the flaw in UAT will be significantly easier to target than in a controlled environment.

Oh well - time to sign off for the night. I'll try and post a little more regularly now that I have time to.

- - Craig Mattson.

[0 Comment(s)]

[Print View]