Monday, March 27, 2006

Okay, final reflections.

Myself and Dan didn't do enough work. I'd like to thank Greg for carrying the project through the mid term period, and kicking us into gear.

I messed up the questionnaire. I still don't know why I added the option for the wrist thing, I somehow picked it up and put it in. I'm not the best for making questionnaires, I volunteered as I felt that I had done very little up to that point.

The prototype was quite impressive, possibly because of feeling guilty about how far behind we were.

UCD is quite a reasonable design process, but it calls for creativity that isn't particularly strong point of mine. However, we managed to create a decent product out of it.

If I were to repeat the project, I'd probably go for a product that was more contemporary and less innovative. It'd make the comparisons a lot easier.
Hmm, I'm not sure why there's such a difference then, there are a few effects I can think of that would provoke a difference, but that wouldn't explain the gap between your and Petes data, let alone yours and mine. We did need a large sample so we couldn't have afforded to do this as a group, but maybe we should have done the first 2 or 3 together to make sure that we were doing the same thing. That being said I watched Pete do a test or two in scifi and he seemed to be doing the same thing I was, but we still have very different results. Odd.

Now for reflections. This is going to be quite tetchy, partly because it's after 3am, but mostly because this has been the most painfull group project I've been involved in since crossdressing for drama in secondary school (don't ask) However I think the things to bear in mind are that I tend to be very cynical and critical of most things I'm involved in and that everything has turned out okay in the end. That being said, here they are:

Reflections

I have been doing as many group projects as possible over the course of my university career, more than half of my modules this year have involved some kind of group work and that’s fairly representative of my course as a whole. It seemed a good idea since I believe that it most closely represents the way in which I’m likely to end up working once I finish university (or even later on in university since I’ve already got someone I’m looking at co-authoring a paper with)

All of my experiences so far have been at worst, average and at best fantastic and I have no regretted the decision for a moment. Until this year, this has been, without a shadow of a doubt, the most hellish project I’ve ever had the displeasure to be involved in. I’m glad to be seeing the back of it.

Up until a couple of weeks ago I was feeling as if I had been carrying this group, which at best was not proving to be too much of a hindrance. I’d been the only one with any ideas and I’d been doing most of the early tasks myself. I’d made more posts than anyone else and they were longer on average as well, my writings account for at least half of the blog and there are supposed to be three of us. The questionnaire would have been much improved (maybe even to the point of not being completely useless) if I’d just given up on trying to get any work out of Peter and done the damn thing myself, just like every other task up to that point. I felt that his final design looked identical to mine and had no thought put into it whatever. I was getting increasingly irate and sending stronger and stronger messages along the lines of “do some fucking work” and then everything changed.

Peter came out with a marvellous prototype, with over 100 slides. I then started getting serious about project management and stopped making vauge comments like “do some work” and worked out what needed to be done and put a timetable to it. This would have allowed us to be finished on time. Pete kept to this and produced a number of useful results, in some cases more than was required and generally really helped to get the project to the state that it’s in now.

Dan on the other hand generally produced substandard work (prototypes that could not be tested on the core tasks, ignoring 2/3 of the data collection) which was often late (I had a bunch of participants for Thursday, as was timetabled and the opportunity was wasted because the prototype did not exist to do the testing on) I don’t doubt this would have been a better project without him.

I needed to get that out of my system.

To take a step back and look at it from a less personal level the user centred design approach is an interesting one. I think that it makes a lot of sense to design computerised solutions focusing on who will be using them and for what rather than just the problem. Thinking about it this approach is already prevalent in other industries, for example the entire field of ergonomics would not exist without this sort of consideration.

I think that I started without too much idea of what user centred design was. Steps like writing the vignette about how the product might be used were performed from the point of view of someone used to traditional design, re-reading it I realise it made me think about the way in which someone might use the product, but at the time I felt more like I was writing it to dictate how someone should use the produce. A subtle difference, but an important one.

The questionnaire was a bit of a disappointment, I think that we asked the wrong questions. Really what we wanted to do was fix the aspects of the design that were dictated by what the product was and ask about the other things. Instead we included a question regarding the delivery mechanism, which was needed, but some of the options included things that would be incompatible with the core ideas behind the product. As soon as people picked the incompatible idea (and most did) the rest of the results become very hard to use, because suddenly we don’t have an evaluation of what people might want on this product, we have an evaluation of what people would want on some different product.

It’s like having a questionnaire about a wristwatch and asking “how many golf clubs should this product be able to carry?” as the first question. Anyone who doesn’t put 0 is suddenly thinking of the product as a golf bag, so when a question comes up like “where should it be worn?” comes up later people say ‘shoulder’ in a golf bag mentality and suddenly you’re designing a wristwatch which people wear about their shoulder. I think we did the right thing to discount the sections of the data that seemed inappropriate, even if they were the most popular.

Developing two designs simultaneously would have been a really good move if we’d have developed the two designs we were talking about. Sadly the eye movement prototype is extremely poor, as it makes no attempt whatsoever to represent eye movement, I recognise that this would have been hard, but as the point was to compare eye movement to Bluetooth it was really vital to make some attempt even if it was as little as saying “always move the mouse cursor to where you’re looking on screen” at the start. As it is I feel that we’ve got a comparison between the Bluetooth approach and a point and click mouse interface approach. I reckon that it made us choose the poorer design.

All of that being said we have had some positive moments on the project. The keypad prototype is fantastic. Making the original really made me think about how someone might use this and what sort of things would make a user feel good about it and I had a number of really good discussions on it later. Petes dramatic expansion has a lot of thought behind it too and really gets to the core of what this is about. Looking back a lot of the earlier theoretical work was pretty good as well, trying different approaches to creativity and design really gave us a solid start point for designing this product even if we didn’t capitalise on it as well as we might have.

I think that in the end we did manage to come out with a fairly solid design for a very novel product. It was hard to do because there are not many comparable products on the market so it really was a case of designing something from first principles. I think that despite the difficulty of the task and the numerous setbacks we’ve managed to generate a solution that’s pretty good.

Sunday, March 26, 2006

Yeah there is a great surprise on the amount of difference we had. The only priming I gave was what the general product was, and the information on the first slide. However the first slide was not updated to inform of the new back button option and the suppression, which i told them about before hand. Maybe I told them more specifically what to do, some said they did feel that guessing the correct catagorie was look. But this was quickly overcome once they had seen which functions were available in each catagorie. I suppose if we were able to test in a group we could have ensured that the testing could have been done more similarly. I recorded one error to be going the wrong direction in the menu, i.e. clicking on mulitmedia rather than communications. This kind of error always seemed to be idetified quickly and with the back button the user got back on track very quickly. The ability to correct errors being the main advantage of the new prototype. The suppression feature which was not really tested in testing, obviously is an advantage in the real world product, but not accertainable in this model or testing.
Okay, got Petes data now. I can't use it on it's own because it's only data for the new prototype (which is fine because he wasn't down to get data for the old one) It's similar to neither mine nor Dans, but it's closer to Dan's so I integrated it with that.

This brought the probabilities of the differences being results of random rather than systematic variation for effciency and effectiveness down to 50% and 25% respectively. Neither of these are reliable, so it doesn't really change anything.

Integrating them with mine instead has similar effects (raising the chi-squared values) I think this is a sympton of the fact that our results vary dramatically based upon who was gathering them. This is obviously very worrying, but fortunately we've got results for both prototypes from the same experimenter so it's not so much of an issue.

I think in fairness the problem is probably that I left some things out of the instructions because after six years of psychology, experimental method when dealing with people seems like such common sense that it's not worth mentioing. I can't confirm this suspicion without getting a chance to speak to you both, but I'm guessing we had some expectancy effects, observer effects and priming effects creeping into the data.
I've elected not to integrate Dans results with my results. The reason is that they are clearly from completely different testing methods, I don't know what we did differently, but the results are so far apart as to be utterly incomparable (unless Dans sample was entirely comptuer geniuses and my sample was entirely technophobes)

The biggest different was in errors and error recovery.
What were you classing as an error?
As I understood it any click on a button other than the button that does what the user wants to do.
Or were they primed somehow? Did you use the same sample for both tests? Or did you breif them before the thing started? Or give step by step instructions?
The sort of differences we're talking about here are phenomonal! Your participants made a total of 11 errors over the whole thing. My first participant was up to 8 by the end of the second test!

Anyway, I figure the results are internally consistant. I used the same method for testing people on prototype 1 as I did for proto 2 and I assume the same holds true for yours. So I compared the tests I did on proto 1 with the tests I did on proto 2 and the tests you did on proto 1 with the tests you did on proto 2.

Psychology books missing (It's a 2" thick A4 book, how can it go missing?)
So (after a quick google) I used the table here instead:
http://www.statsoft.com/textbook/sttable.html

Differences between proto 1 and proto 2
Dan:
Efficiency 75% chance it was random (not significant)
Effectiveness 90% chance it was random (not significant - incomplete data 1/3 measures used)
Usability Not enough results to test (none for proto 1)

Greg:
Efficiency 75% chance it was random (not significant)
Effectiveness 75% chance it was random (not significant)
Usability 5% chance it was random (significant - favoring the new prototype)

Pete:
No data at all. Grrr.

5% is a pretty rubbish significance, in psychology we normally demand 0.5% but in this case I think it's enough to demonstrate a clear increase in the number of people finding the software usable. In the other cases the measures are slightly better with the improved version but not by enough to be a significant improvement.

I suspect the reason behind this is that the addition of the 'back' button only shaves a few seconds off operation compared to resetting the device, so it doesn't affect efficiency that much - however it feels a lot better to be able to go back compared to having to reset something. So I think that there's evidence to support that design decision.

In terms of qualalative feedback the main complaint I got was that the menu system was really confusing, I don't think that this was necassarily the case, but the fact that large portions weren't implemented and that the buttons for these sections sometimes seemed like the best options caused problems. As in some cases buttons beyond the first one worked (and were necassary) it wasn't possible to give the "only choose the first item from a list" instruction effectively.

I don't feel that the prototype is that confusing, but it does look like the auxilliary tasks were programmed first in great depth with the core tasks being added as an afterthought later - this lead to the core tasks being somewhat more obscure and thus harder to access - if the auxilliary tasks were being tested I imagine it would perform much more effectively.

Anyway, conclusion from the analysis: Menu system could be improved, the main trip appeared to be "contacts" appearing under both communication and environment (but not working under environment) The main improvement has been in usability, mostly as a result of the addition of the back button.

Will post reflections before I go to bed tonight.

Saturday, March 25, 2006

Cool, all the results are there now.
I went and saw friends&family back home, so now I've got all of tommorow to do the empirical evaluation and post my reflections on the project.
Pete, have you any results to throw into the pot before I do so? Will wait till tommorow afternoon before finalising the numbers in case you don't read this before then.
Multimedia menu was just a personal note, to say they went into the multimedia menu as opposed to the correct menu, that was the error. Have taken it off and replaced it by just the number 1, not sure about the other problem, but have put it back up.

Friday, March 24, 2006

Tried to do the stats again but something seems determined to prevent it from happening. Can you upload the data again? It's displaying now but the columbs for participants 4+ are completely blank and the number of errors that participant one made are apparently "multimedia menu"
Evaluation

Well i think the hardest problem with this project was by making it quite open. Especially by the fact that we could design a product where existing technology was not currently available or being used. This caused a number of problems throughout the project. Initially with design ideas. A tough comprimise was needed between lots of creativity, and a product that people could and would want to use. Head Up display glasses which interact with your environment I think was a good comprimise. It is quite a creative idea, however has a sound grounding. This kind of technology is already available in many formats for military use. Looking at much of the technology that we have today, you can see much of it was born out of military design, and then the best features adapted to civilian public life.


Even with this idea, the project could have gone a number of ways with lots of ideas of what the product could do, and how the user could interact with it best. This would normally be a good thing, however our questionnaire was possibly not up to the task on answering which of the functionalities, and interactions should be implemented in the product. The questionnaire asked many times the wrong questions, and did not inform the user enough about exactly what our product was, to recieve back good information. Therefore we were unable to let the people speak as it were and tell us what it was they wanted, and how they would best prefer to interact. This meant we would have to decide between ourselves. However being decisive was a problem, and so it was decided that taking on two of our ideas on to the next stage(prototype) was a better way to decide, then we could get a more infomed user input. This however caused the problem, that twice as much work needed to be done for this part of the project than would be normally necessary. However it did allow a chance for comparisons to be drawn between the products, as said earlier, due to the fact we are using future technology, our product is not really around today in regular use, it is difficult to make a comparison on existing products. Producing the prototype was also a difficult thing to do, although I could produce a prototype with similarities to the actual product. There was no way of producing an accurate prototype which actually recognised the eyes focus point as a method of user selection. Although this could be simulated, any way this was done would create inaccuracies compared to a real life product. Therefore results we achieved from testing of the product was unlikely to be exactly the same as a real world product. However it was still useful to obtain the data, and just recognise that the results found were not a definative product. This was seen also from general user feedback. Although many said they liked the product, they also stated they would still require to see a final product and use it properly as well. There was enough feedback to show some improvements were needed to my prototype, such as adding a back button to improve speed where mistakes were made, and give the user more control. Also if the user were doing other things while wearing the product they would need to supress or hide the display the from time to time, rather than turn it on off, so a button was added to allow this to change. The users seemed to tune into the organisation of functionality pretty quickly, and picked up a familiarity once a few tasks were completed. So the time to make a new user into a competent or expert should be fairly swift. Some of the comments also said that although they thought it was an interesting gadget, that they would not necessarily purchase the product straight away. This was due to a general feeling that this product would have a high retail price. This would mean that the product would most likely purchased by early technology adopters. Those who like to get the latest gadgets first, willing to pay a high price and also accept a few flaws. However after time i'm sure that such a product would become much more mass marketable. The price would drop, the product would improve any problems that had not been identified, and the product would become more desirable to the masses, once the benefits of it could be seen.

The product that we have designed is desirable and very usable as has been idenfied by many of the testers. However, I think one of the key problems has been the ability to test and prototype a technology that we do no have access to. I have also found it difficult throughout the project to give time to the design, along with completeing my final year project. I think maybe if we had started the project with a better base, by fully understanding the consumer need and wants then the project may have been much more succesful. We would be able to ensure that our prototype meets with consumer expectation, before testing it, which as mentioned before has the limitations, of not being the same as the actual finished product. Blogger is a difficult tool to organise a group around and very much depends upon the group checking on a regular basis. Its a good way to share ideas and information, although an integrated file storage would be beneficial, rather than using a number of methods to put our work on the internet, like our yahoo file store. Blogger is useful to display ideas, but cannot be substituted for proper team organisation. This method of working slows down the process of a project I feel.

finally some comments from users on the final prototype:

"The product in question is a highly innovative idea, with several advances over current technology. It was easy to use and if produced I believe would be in high demand given its utility. "

"I thought that the model demonstrated a good idea of how computerised glasses could work. The glasses seemed like they would be useful to wear, without causing any unwanted hinderance to vision."
Oh sure, yes, I'll just add that to all of the other data...oh no wait. Hang on a second...there isn't any other data because the focus groups had been planned and assembled on thursday and the FUCKING PROTOTYPE WAS LATE.

I didn't go to the meeting because I wasn't sure if I could keep my temper. Looks like a lack of proximity didn't help much with that. Once I've calmed down I'll throw this data at the spreadsheet and see what it comes up with.

*sigh*

Those results aren't in the file section of the yahoo group, where are they? If they can be cut & pasted in that'll save a lot of time. I'm really annoyed because I do not own a laptop so I can't easily go around bagging participants, I need to plan it in advance and make sure everyone I need is by a computer and expecting me. I'm going home tommorow for mothers day (since mom is leaving for Japan on Sun I figured it'd be good to visit early) so I can get some data off my friends&family at home (who all have PCs) - it's just going to be after the official deadline so I hope it's okay :S
here are results from the people I have tested the latest prototype on, if want to add these to any others that have been collected. I was a joe's today for meeting, but I guess that was cancelled.


the spread sheet is also up on the yahoo file share

In terms of qualitative data, the users were able to control the device with a little less frustration and more speed, with the introduced back button, correcting errors made more quickly. The only problem idetified with the iteraction is that initially people were unsure of what catagory functionality was stored in. If they got it right first time they felt "lucky" that they had found it first go. However once discovered the first time, and the users had a better understanding of where the different functions were they could quickly navigate and use the prototype.
First protoytpe, qualitative information:

The qualitative feedback gained when testing the prototype using eye selection, was that there was a lack of a back button. This was first set this way to ensure a reduction in the number of buttons required, and make maximum use of the handsfree availablity that the eye focus input gives. However users found in annoying that if a mistake was made and a wrong selection had been made, or a task was complete and another one was ready to be done, that the user had to start back at the home menu. By adding a back button it would allow the user to take more control of the prototype, and reduce the time loss when mistakes are made. It was also identified that if the prototype was fully functioning an option to hide the graphics in the users view would be required to increase visibility when needed, without turning the entire device off, which is more time costly. These were the main issues found with the preliminary prototype, and should be fairly simple to improve upon with the current prototype. The users seem to improve with every task, and so task times improved quickly. This was mainly due to the fact that functionality was separated into catagories. Four to be exact so that the screen could be split into North, South, East and West point, so that eye input would be simple. Once the users undertood what functions could be accessed in what catagories, the user could navigate the device quite easily. Maybe rewording of these catagories would improve initial speed of tasks, however this was picked up very quickly.
I saw plenty of people who I could have collected data from today.
I have the evaluation sheet ready to give chi squareds for the three main areas comparing the old prototype to the new.
Where's the prototype?
Where is the data?
Where are any posts from you in the last week?
Tommorow (well today now) is the deadline.
I am not happy.

Thursday, March 23, 2006

Right, I've sorted out the analysis as best I can.
With only 1/3 of the data actually collected for proto 2 I could only do the efficiency measures.
I then thresholded all of the measures so that they can be divided into sucess or failure (For example a task with no errors completion rate of > 70% is considered sucess in this model) and did a chi squared with number of efficiency measures completed forming the columbs, prototypes for the rows and each indivdual participant has one place in each row

The net result is that this gives a chi squared of 2.33333 with 3 degrees of freedom (Since there are four measures and two prototypes there are (2-1)(4-1) = 3 degrees of freedom)

Grabbing a copy of "psychology: a students handbook" and flicking to the appendicies we find that this is not a significant difference (Not even at the forgiving 10% mark)

Of course I've abstracted away a lot of the data there and we might find individual signifcance to the measures. Applying to number of power tasks completed gives chi squared 0 with 0df (This was a silly test since everyone completed all tasks) Applying to tasks completed in one gives a higher 3.333 value which is still not good enough for a significance.

I suspect we have too few participants. I admit I didn't code the data from all of the work on proto 1 (We've got 15 for that one) because I didn't see how it would help to have more data for one if not the other (We've only got 5 for that one)

I've got a decent number of people lined up for tommorow so I can put a few of them on the old prototype instead of the new one. Analysis will be dead easy and quick since I can just plug data into the sheet now (the error with the previous calculations was that it was using the wrong figure for total number of participants so the expected values were all messed up) To make data entry quicker could you please all record the data in a table form, preferably in excels itself if possible. Headings I'm using are as follows, they're pretty directly descended from the document I wrote about how to collect the data. The only proccessing of the data you need to do is to count the number of positive and negitive adjectives in the descriptions at the end.


Participant
Errors T1 (T stands for task)
Errors T2
Errors T3
Errors T4
Errors T5
Rep ET1 (Rep ET = repeated errors during task)
Rep ET2
Rep ET3
Rep ET4
Rep ET5
Time T1 (Again T is task)
Time T2
Time T3
Time T4
Time T5
Time CW (Time spent on the cognitive walkthrough)
ErrorT CW (Time spent making/recovering from errors during the walkthrough)
Functions Rememberd (How many funcitons are correctly remembered)
In control? (Did they feel in control of the product?)
Pos Adj (Number of positive adjectives used to describe)
Neg Adj (Number of negitive adjectives used to describe)
Use (Would they use it if they had it)
Buy (Would they buy it if they had it)
Recommend (Would they reccomend it to a friend)

You don't need to record the following, they are derived automatically from the above (or in some cases merely automatically copied to the calculations page):

Number of tasks performed
% complete in one
Persistant errors
Errors per time
Tot time for instructions
Time error correcting
Icons remembered
PosAdj
NegAdj
Use
Buy
Recommend

How are you two doing? Haven't heard anything from you in a few days.

Monday, March 20, 2006

Okay, somethings fishy in the chi squared formula, or at least I think it does because the values do not add up. However looking at the other statistics it does Dans proto is clearly the one to work with.

For example % tasks complete first time varies from 20% to 80% in proto1 and from 60% to 100& in proto 2. I'll sort out proper numbers and barriers of statistical significance over the next few days, but the numbers are clear.

I still have reservations regarding how closely the prototype matches the product, but while I don't trust that these statistics will be remotely representational of the real world I can't see a better alternative, so I guess we go with it.
Does anyone know how to get an "or" logical operator into excel?

I figured since I haven't been given any data for analysis (grrr) I'd sort out an excels spreadsheet to do it all instantly once I had the data. For one of the chi squared tests I want to use ranges (So I can see if there is a significant difference in number of errors across the two prototypes without having to have one columb for each possible error number.) Only problem is that I can't work out how to do this.

=COUNT(range, criteria) is the command, but it seems to go tits up whenever I put logical operators into the criteria. It can deal with ">2" or "<6">2 AND <6">2, <6)" or any other one of a million permiatations.

I suppose I leave it and do it by hand. I'm just trying to speed things up a little since they appear to be slowing down again.

Just a reminder, we've got one week left and to be on target for only having an obscenely high amount of work to do on this next week we need to have the analysis done and the decision made by tommorow evening. I had the time to crunch the numbers tonight, I've a few hours tommorow morning, if we fall behind the timetable we will have to drop everything (I'm not kidding - I mean spending a 24H period doing nothing but this) at the end of the week to finish on time.

Sunday, March 19, 2006

Pete, can you get your results to me before this evening so I can have the analysis done by tommorows meeting (2:30 still, right?) I'll do all of it for this half since it's easier for one person to do it anyway and you sorted over 100 slides for that prototype (even though it's mostly cut and paste that's still an impressive figure) Talk to you all tommorow.