In my ever running tradition of Live Blogging events, I’m going to do my part to try to live blog PNSQC and the observations, updates and other things I see while I am here. Rather than do a tweet-stream that will just run endlessly, I decided a awhile ago that having an imprecise, scattered and lousy first draft level stream of consciousness approach, followed by cleaning up and making a more coherent presentation later to be a fairly good compromise. So, if you would like to follow along with me, please feel free to just load this page and hit refresh every hour or so :).
Here we go, day 2 of PNSQC, and we hit the ground running with Dale Emory’s keynote talk “Testing Quality In“. Dale starts out the conversation with the premise that testers are often under appreciated, and with that, their skills are often squandered. This is a tragedy, but one that we can do something about. What if we could identify the variables that affect the value of our testing?
Dale makes the point that “You Can’t Test Quality In” and asked if people agreed and disagreed. Hands went up for both Yes and No… and Dale said he agreed with both groups. Huh? Let’s see where this is going :). The key to this is a bit of reframing of the statement. A lot of it has to do with the way that we look at the problem.
One of the most common statements testers make is that “Testing Merely Informs”. “Merely” in this case doesn’t mean that the testing isn’t important, it’s that there’s a limit to what we can inform and how far that informing can take us.
Dale walks us through an example of a discount value for a class of customers, and determining where we can add value and informing of the requirements CAN be an example of testing quality in, in the sense that it it prevents problems from getting made in the product. On the other hand, when we say we CAN’T test quality in, we’re looking at the end product, after it is made, and in that case, it’s also true.
Can you test quality into products? How about testing quality into decisions? Does our information that we provide allow our team(s) to make appropriate decisions? If so, do those decisions help to “test quality in”? Dave gave an example of one of the Mars Landers, where the flash file system filled up and couldn’t reboot. Was it a bug to get to that state? Yes. Was it known? Again, yes. Why wasn’t it fixed? Because when the bug was found, it was determined that it would push the launch date back by six weeks, which would mean Mars would be 28 million miles further away, and thus the launch team would be faced with a situation of “can’t get there from here”. With this knowledge, they went with it, knowing that they had a potential workaround if it were ever needed (which, it turns out, was the cae, and they knew what to do in that event). Could that be seen as testing quality in? I’d dare say YES. Apparently, Dale would, too :).
So why do we say “You Can’t Test Quality In”? The reason is that, if we seem to start from a premise where we are running a lot of tests on a product that’s already built, and if the product team do nothing with the information we provide, then yes, the statement stands. It’s also a gross caricature, but the caricature is based on some reality. Many of us can speak plainly and clearly to something very much like this. The problem is the caricature has gotten more press than the reality. You absolutely can’t test quality in at the end of a project when you have not considered quality at all during the rest of the process. Again, that’s likely also a gross caricature, or at least I have not seen that kind of disregard on the terams I have been part of. Dale says that what we are actually doing is we are reacting to Ineffective Use of Testing (and often too late in the process). We also react to Blame (“Why didn’t you find that bug?”, or even “Why didn’t you convince us to fix that?!”). Instead of fight, we should seek to help (hilarious reference to Microsoft Office’s Clippy… “It looks like you are trying to test s lousy product… How can I help?”)
Another question is “why don’t people act on our information?” It’s possible they don’t see the significance (or we are having trouble communicating it). Often there may be competing concerns. It’s also possibly that they just don’t like you! So instead of just providing information, what if we were to give feedback? If feedback is accurate, relevant and timely, that also helps (scatological joke included 😉 ). A way to help is for us to notice what we are really good at, and try to see how we can find those skills to give value in new ways. We are good at detecting ambiguity, we can detect ignorance, we can detect inconsistencies, we can imagine scenarions, consequences, etc.
What if we could test when we can find errors before we commit to it. This slides right into Elisabeth Hendrickson’s advice in Explore It to Explore Requirements. Dale points it out to saying we should test “Anything about which your customers have intentions that have not been validated”. We can test requirements, we can test features, we can test stories, and ultimately, we can test the team’s shared understanding. We can test the understanding of our customer’s concerns. If they are not reacting to our testing, it’s possible we are not connecting with what is important to them. Dale makes reference to Elisabeth’s “Horror Story Headlines”… what would be the most horrifying thing you could see about your company on the front page of the newspaper? Think of those concerns, and you can now get to the heart of what testing is really important, and what really matters to the team. Consider testing your testing. How well are your customers benefiting from the information you provide?
Dale then gave us some homework… What is one small thing that you can do to increase the value or appreciation of testing in your organization? Form me, I think it’s finding a way to add value and help determine issue early in the process. Instead of doing the bulk of my testing after a story is delivered, try to find ways to test the stories before they are problems. In short, BE EARLY!
I had the chance to moderate the Test track talks. When we offer to moderate, sometimes we get talks that are assigned to us, sometimes we get to moderate the talks we really want to hear. This morning’s talk is one of the latter, called “The Black Swan of Software Testing – An Experience Report on Exploratory Testing” delivered by Shaham Yussef.
to pre-defined test cases, while Exploratory Testing is geared towards focusing on an area and letting the product inform us as to where we might want to go next. It’s text execution and test design simultaneously performed.
Lisa Shepard from WebMD is the next speaker and her topic is “Avoiding Overkill in Manual Regression Tests“
’tis better to have tested and moved on than to have written an obsolete test’
‘Beware of Copy and Paste’
‘If you are worried about migrating your tests… YOU HAVE TOO MANY TESTS!!!’
‘Less Tests = More Quality’ (not less testING, just less test writing for the sake of test writing)
‘Friends don’t let friends fall off cliffs’ (pair with your testers and help make sure each other knows what’s going on).
As we are discussing this idea, it has struck me that, for someo=ne like me, I have to seek out my mentoring or my opportunities for mentoring outside of my company. In some ways I am the sole practitioner for my “community”, yet sometimes I wonder if the skills that I offer are of real or actual value. In some ways, we reward individualistic, isolationist behaviors, and it’s only when those behaviors start to hurt us that we wonder “how could we let this happen? We live and work in a global economy, and because of that, there are a lot of people who can do competent, functional work, and because of their economic realities, they can do that work for a lot less than I can. What should I do? Should I hoard my skill and knowledge so that they can’t get to it? For what? So that when I leave the game eventually, all of my skills are lost from the community? If I believe I have to complete on a functional level, then yes, that’s a real concern. What do do? Stop competing on a functional level, and start growing in a craftsmanship and artist level. If we excel there, it will be much more compelling for us to say that we have genuine talent that is worth paying for.
What if we had a chance to change jobs with people from different companies? What if you could share ideas and skills with your competitors? What if teo startups could trade team members so they go and work for the other team for a month or two, with the idea that they will come back and share their newfound knowledge with each other and their teams? It seems blasphemous, but in Chicago, two start-up companies are doing exactly that. What have they discovered? They’ve boosted each others business! It sounds counter-intuitive, but by sharing knowledge with their competitors, they are increasing the size of the pie they both get access to.
We’ve had an interesting discussion following on about how Agile teams are in reality working and effectively dealing with issues of technical debt and how they actually apply the proceses that they use. Agile and Scrum means one thing on paper, but in practice varies wildly between organizations. Returning a focus to craftsmanship, allowing others to step up and get into the groove with others, balancing the ability to have a well functioning team with p
erforming actual discipline growth is imperative. The trick is that this needs to happen on a grass-roots level. It’s not going to happen if it’s mandated from management. Well, it may happen, but it won’t be effective. People in the trenches need to decide this is important to them, and their culture will help to define how this works for them.
Jean Hartmann picks up after lunch with a talk about “30 Years of Regression Testing: Past, Present, & Future“, and how situations have changed dramatically from how to determine the number, and whether or not the cases are really doing anything for us. Regression testing is software testing’s least glamorous activities, but is one of the truly vital functions needed to make sure that really bad things don’t happen.
Kurt Fischer proposed a mathematical approach where the goal is, if I do regression test selection, I want to pick the minimum number of test cases to maximize my code coverage”. That was in the early 80’s. At the time, the hardware and CPU speed was a limiting factor. Running a lot of regression tests was just not practical yet. Thus, in the 80’s, large scale regression testing was still relatively theoretical. Analysis capabilities were still limited, and getting more challenging with the advent and development of stronger and more capable languages (Yep, C, I’m looking at you 🙂 ).
While Exploratory Testing offers us a number of interesting avenues (free-style, scenario-based, and feedback driven) there is also an aspect that we can consider to help drive our efforts. That approach is using User Experience criteria. Think about the session based test management approach (charter – test – retrospective – adaptation ). Now imagine adding the User Experience criteria to the list (create personas, create profile data, and follow them as though they are real people) we start to see the product as more than just the functions that need to be tested, we start to see them in the light of the persons most likely to care about the success of the product.
So why does this matter?
Pete has decided to talk about ships that sank… figuratively. Thus, Pete decided to share some of his own disasters, and what we could learn from them. I couldn’t help but chuckle when he was talking about being made a test lead for 100 developers… and he would be the sole tester for that initiative. No, really, I can relate! this first process was to replace an old school mainframe system with a modern feature friendly Windowed system, but using all of their old forms. Everyone loved it… except the people who had to enter the data! All the old forms were the same… except that error messages were rampant. The forms that were created were spreading one screen over five screens… and those five screens were not accounting for the fact that the form was expecting all the values to be filled in. The system was carefully tested, the odd behavior was referred back from the developers “as designed”. Did the users approve of the design? Yes… if you mean the people who parse the data from the screens. What about the people who actually entered the data? They were not asked! Whoops! In short,
the visual appeal trumped the functionality, and lost big time.
This was a neat exercise to show that the “user” is an amorphous creature. they are not all the same. They are not necessarily doing the same job, no do they have the same goals. Yet they are using the same application. In fact, the same person can do the same thing at different times of the day and have two totally different reactions. That happens more than we realie. We want our systems to support work, but often we make systems that impede it.
We are often guilty of systematic bias: we think that the outcome we desire is the one that should happen. this is also called “Works As Designed” Syndrome. In short, the people making the rules beter be involved in the actions that are being performed. If we want to make sure that the system works well for the users entering the data, we have to include them in the process. Pete used a military phrase to describe this process in a colorful metaphor… “when the metal meets the meat!” that means that YOU are part of the process, and if YOU are part of the process, and if the rules apply to you, then YOU will be part of the PROCESS. In short, if the system is being designed for you, YOU will be the one involved in making sure the design works for YOU.
Additionally, when you are dealing with systems, know that one component cannot be tested completely and no testing being done on other components is not going to end well. People do not see “the system” the same way you do. There is no single user. This is why personas ar so important. their experiences are intertwined, but they are different. even people doing similar processes and jobs are very often unique, and have their own needs based on location and focus.
The question around UX and UX stuff is that there is a fair amount of training required to do their job well. Hoe many testers have actually received any formal UX training? For that matter, how many testers have received any formal testing training from their company? [Hilarity ensues].
On a more serious note, someone recommended handing the official users manual to their testers and let them loose for awhile… and that’s actually not at all a bad idea for some introductory UX training. Why? Because a users manual is (truthfully) the closest to a requirements doc many people will ever get their hands on! In my experience (and many other people’s) a users manual is both a good guide and almost usually wrong (or at least inconsistent). A lot of questions will develop from this experience, and the tester will learn a lot by doing this. Interestingly, many services are being made with no manual whatsoever, so we’re even losing that possibility.
Ultimately, we need to do all we can to make sure that we do the best we can when it comes to representing our actual users, as many of them as humanly possible. there are many ways to do that, and it’s important to extend the net as broadly as possible. Buy doing so, while we can’t guarantee we will hit every possible issue, we can minimize a great number of them.
You all thought I must be done, but you’d be WRONG!!! One more session, and that is with Rose City SPIN. Michael “Doc” Norton of Lean Dog is presenting a talk called “Agile Velocity is NOT the Goal!” Agile velocity is the measure of a given number of stories and the amount of work time it takes to complete them. According to Doc, Velocity is actually a training indicator. What this means is that we have to wait for something to happen before we know that something has happened. We use past data to help us predict future results.
Very often we end up “planning by velocity”. Our prior performance helps us forecast what we might be able to do with our next iteration. In some ways, this is an arbitrary value, and is about as effective as trying to determine the weather tomorrow based on the weather today, or yesterday. Do you feel confident making your planning this way? What if you are struggling? What if you are on-boarding new team members? What if you are losing a team member to another company.
One possible approach to looking at all the information and forecasting is to instead use standard deviation (look it up statistics geeks, I just saw the equation and vaguely remember it from college). Good news, the slides from this talk will be available so you can see the math, but in this case, what we are seeing is a representative example of what we know right now. Standard deviation puts us between 16 and 18 iterations. that may feel uncomfortable, but it’s going to be much more realistic.
One of the dangers in in this, and other statistical approaches, is that we run a dangerous risk when we try to manage an organization on numbers alone. On one side, we have the Hawthorne Effect, which is “That which is measured, will improve”. The danger is that something that is not being measured is getting sacrificed. Very often, when we measure velocity and try to improve the velocity, quality gets sacrificed.
Another issue is that stories are often inconsistent. Even 1 point stories or small stories can vary wildly. also, there are other factors we need to consider. Do we want velocity to increase? Is it the best choice to increase velocity? If we take a metric, which is a measure of health, and make it a target, we run the risk of doing more harm than good. An example of this is Goodhart’s Law. He said, effectively “making a metric a target destroys the metric”. There are a number of other measures that we could make, and one that Doc showed was “Cumulative Flow”. This was a measure of Deployed, Ready for Approval, In Testing, In Progress, and Ready to Start. This was interesting, because when we graphed this out, we could much more clearly see where the bottleneck was over time. While, again, this is still a leading indicator, it’s a much more vivid leading indicator; it’s a measure of multiple dimensions and it shows over time what’s really happening.
Balanced metrics help out a lot here, too. In short, measure more than one value in tandem. Take hours to consider alongside velocity and quality. What is likely to happen is that we can keep hours steady for a bit, and increase velocity for a bit, but quality takes a hit. Add a value like “Team Joy” and you may well see trends that help tell the truth of what’s happening with your team.