Saturday, August 17, 2013

THE SCIENCE DOESN’T BEAR IT OUT-PART II





No amount of experimentation can ever prove me right; a single experiment can prove me wrong.-Albert Einstein

In the last blog I disclosed that doubt is perhaps my best trait and then discussed some aspects of what science is in a general sense.  Now I want to explain how doubt is the most important part of science and for many of us, the most fun.  When we look at experimental science, or research that is done based on direct experiment, we have a formula to use to determine if information is valid or not.  It is called the Null Hypothesis[1].  The quote above sums up the principle of the Null Hypothesis or the “null “quite nicely.  If you can disprove the null, then you have evidence that supports a theory.  If I want to learn something using the scientific method, then I need to make a few notes about what I want to know.  A common statement by force free trainers and clicker trainers is that force free training is the best training.  How could I learn more about that?  Let’s start with how I talk about it.  Notice that I don’t say that I want to prove that force free training is the best, but that I want to learn more about that.  If you try and prove something you can get a lot of supporting evidence but you cannot categorically say that there is never an exception to the rule.  On the other hand, if you state your question in the form of a null, by saying that force free training is not the best way to train a dog and then set out to disprove your null, you have stronger support for your argument that force free is the way to go.


Unfortunately, most dog trainers are not scientists.  They don’t understand that you cannot prove anything; you can only disprove things and achieve any degree of certainty.  As I write this I remember struggling with this idea in university.  I remember that I couldn’t figure out why you had to go through this round about way of determining if a fact was valid or not.  Leaning on science to support what you do can make you feel like you are justified in your actions without a thorough examination of what you are actually doing.  Few dog trainers actually have any scientific training and this means that we are seeing more and more trainers throwing around information and calling it science all the time.  At the moment of this writing, we have no studies that disprove the null hypothesis that force free training is not the best training.  We also don’t have studies to disprove the null that force based training is not the best.  We see a lot of trainers sharing blogs, articles and bits of information to support their pet ideas, but very little evidence to refute the null hypothesis of what they want to do.  This means that although there is a lot of science behind the training and learning that is being done, there is very little understanding of that science and that is a big problem!



One of the popular pieces of information that is making the rounds at the moment is a “study” done by the Department of Environment, Food and Rural Affairs (DEFRA) in the United Kingdom about shock collars[2] and why they ought to be banned.  Blogs abound referring to this “study”.  In science, remember that we do research where we formulate a null hypothesis, we develop an experiment to try and disprove the null, and we make a conclusion.  If the conclusion is robust, we send that research off to a relevant scientific journal.  The journal will send the study out to several scientists to read and review and then it gets sent back to the original researcher with questions, and then it goes back to the journal and THEN if the people who read and reviewed the research feel that the information is sound, the study will be published.  In this way there is a series of checks and balances to ensure that the research is sound and the results are reliable.  The next step in the process is that a separate scientist can take the method that the original scientist used and try and confirm or refute what was done the first time.  Science is a process of gathering knowledge and then testing it.  If the results aren’t repeatable, then the results are not terribly robust.  The study done by DEFRA is just that; a study.  It has not been through the process of being submitted to a recognized journal to be juried, so no second set of eyes have looked at this study to find out if it is valid or not.  When a study is done by an individual or a group, but no one looks at the data to see if their conclusions have problems with them, then the study is not terribly credible.  It is at best a data set that can be interpreted various ways.  Never the less people who have an agenda to not use punishment are pointing at this as though it is a great supporting argument against using shock collars.  Until a second set of eyes evaluates the study, it really isn’t valid in terms of supporting or refuting anything.


This information about how science works should help you to follow along some steps when evaluating the information you are given about the science of training.  We now know that science is the collection of knowledge, based on agreed upon definitions that help us to learn facts.  Facts are gathered together to form theories which are observable trends that have been repeated many times.  When a scientist wants to study something, they should form a null hypothesis, develop an experiment, carry out the experiment and then send their results to a journal for a second set of eyes to examine what the first scientist has studied.  If the experiment refutes the null hypothesis we have good evidence that the null is not true and we can use that information to support or refute ideas about the subject matter.  If the null is not refuted, we can say that more research needs to be done, and we need to find another null hypothesis to test to support what we want to find out.  Finally, the study if it is found to be valid is published and other scientists can repeat the research and either get the same results or different results.  When a student does the research in the hopes of getting a masters degree or a PhD, then the process is the same except that usually instead of submitting their work to a journal, they submit it to their university and their advisory council examines them and if their research is valid, they get their degree.


Science is a process.  Perhaps the most important part of the process is the part that people are the least comfortable with; challenging the information they are given.  Doubt.  Doubt is a good and treasured friend to a scientist because it makes you ask important questions about what you are seeing, hearing or observing.  When you are given information that is obviously not true, that you can observe is not true, then doubt creeps in and you start to think about what you are seeing and experiencing.  When what you are being told doesn’t match with your experience, you can start to look at the research itself and see if there are any problems with either the method or the research or the process of review.  In this way, when we are told things that are based on research and studies, then we can analyse the information and figure out if it is actually supporting or refuting what we are being told.

Teasing out validity when you are reading about the science that underlies training is like solving a puzzle.  Approaching all information with doubt, and asking if the source is credible and if the research and if the researcher followed good protocols and if the research actually applies to what you want to know is both an interesting challenge and an important step to perform before you accept information.  Photo Credit: Ashwin Kharidehal Abhirama /123rf.com



Dog trainers use science, but often they don’t use the doubt part of the process very well.  Doubt needs to be your best friend when you are looking for scientific support for the practices you do.  You have to understand the process and a bit more in order to be effective.  The final piece to the puzzle that dog trainers need to learn about is how to evaluate the science they are reading.  Now we have to look at what has been done in a study to see if it is a valid piece of research at all.  Let’s start with the sample size.  If I wanted to know if jackpotting, the practice of giving a reward that is qualitatively better than other rewards in the training session will increase performance.  This is just exactly what was studied by a masters student recently. 


At the University of Texas, Kirsty Lynn Muir studied “The Effects of Jackpots on Responding and Choice in Two Domestic Dogs”.[3]  For dog trainers, this study would be really helpful if it were valid.  Many of us use larger than normal rewards to reinforce especially good iterations of a given behaviour.  Looking at this study we start off with some problems right in the title.  There are only two subjects being studied.  This means that although this research may be true for the two dogs studied, it may not be true for dogs in general.  I doubt that two dogs would be enough dogs to convince me that every dog would be the same.  I have three dogs at home.  Two of them are German Shepherds and one of them is a Chesapeake Bay Retriever.  If I randomly selected two of them and selected both German Shepherds and then extrapolated the data to say that ALL dogs in the world were German Shepherds, you should doubt that this is credible.  When you are looking at a study, you need to make sure that enough dogs were studied to give us a big enough sample size to decide if the results of the study could reasonably be expected to represent the information they are supposed to represent. 


Another problem in this study has to do with the definition of a jackpot.  Most of us use a higher value reward to reinforce better than average iterations of behaviours.  In terms of the study they wanted to standardize the responses so that the research could be replicated.  Instead of using a criteria for better than average, they used a definition of jackpot that doesn’t match how most dog trainers think of jackpots.  The definition they used was “a jackpot is a one time within session increase in the magnitude of reinforcement”.  Hmmm.  This is not how I would define a jackpot.  I would define a jackpot as a “higher than average VALUE of reinforcement paired with a higher than average LEVEL of performance”.  There are problems with my definition from a scientific perspective; I have used words that are subjective, not objective.  When you look further into the study, it turns out that the jackpot was given on a fixed schedule of reinforcement (a well defined term in the world of Applied Behaviour Analysis) and the increase in the magnitude of the reinforcement was not paired with a better than average performance.  Given that the increase in the magnitude of reinforcement was not paired with anything that the learner did differently it would be hard to say that based on the definition, they were studying how I use a jackpot.


When the definitions that are proposed in a paper don’t match the information you want to know, then you cannot say that the study supports or refutes what you wanted to know.  Never the less, this paper has been circulating on Facebook refuting the effect of the use of jackpots in training.  The researcher didn’t study what we all wanted to know, and since few dog trainers are actually reading more than just the abstract (a short paragraph describing the research and the conclusions), then it looks like we might be using science to refute the use of jackpots.  When you find out that you are using a study to refute something, you need to know what they actually studied, and if they studied enough subjects to really give us the information that we want to know.


This sort of study, an experiment, needs to follow rules in order to be useful.  When the sample size is too small, then you won’t get good results.  When the definitions don’t match what you want to know, then you cannot use the information in the study to support or refute what you want to find out.  Then you have to look at the research itself.  In the study cited above, the sample size is too small to be considered robust information, and the definition of what they were studying doesn’t match what most dog trainers are doing.  This means that when a dog trainer quotes this study to tell their readers or students not to do something, they are basing their evidence on some pretty shaky information.  What it doesn’t mean is that jackpotting is a good thing to do or not a good thing to do.  The jury is still out on that one.

After carefully reading through Kirsty Lynn Muir's study on jackpotting we still don't know if jackpotting is helpful or not; her sample size was too small to tell us if what she saw applies to all dogs, her definition is not equivalent to the definition that most dog trainers use when they talk about jackpotting and the way that her experiment was designed didn't associate the reinforcement of greater magnitude wasn't associated with a better than average iteration of the target behaviour.  Scientists spend a lot of time discussing and comparing their interpretation of studies, and this is an important part of good science.  Analyzing, evaluating and then comparing our thoughts to those of others is an important thing to do when you are doing science.  This is becoming an important part of being a dog trainer, and understanding how science works helps people to understand that disagreement is not the same as an attack on the other person's opinion.  Photo Credit: Graça Victoria /123rf.com


The final thing about studies that we need to touch on here is how to tell if a source is credible.  Just because it is written on the net doesn’t mean that information is valid.  Even THIS blog about science is just my interpretation of what I learned in university about how to interpret scientific information, and I am talking only about how to interpret experimental research; there is much more to the picture than I have included here.  Not only do we need to know if the person who did the research is credible, we also need to know if the journal or University that reviewed the research is credible.  If I do a study, and publish that on my blog, then I have given my readers a starting point, but my blog is not as credible as say an article published in an academic journal such as Nature or the International Journal of Biological Sciences.  When you are being presented with “science” in training, then use doubt to confirm if the information is valid or not, and if it supports or refutes the author’s bias.

Before leaving the topic of how to evaluate research papers, it is important to add that scientists compare notes, disagree and discuss what they are reading all the time.  Evaluating what you read should not be an unpleasant or undesired activity.  It is an important part of doing good science.  


[1] http://en.wikipedia.org/wiki/Null_hypothesis
[2] http://randd.defra.gov.uk/Default.aspx?Menu=Menu&Module=More&Location=None&Completed=0&ProjectID=15332
[3] http://digital.library.unt.edu/ark:/67531/metadc28456/m1/2/

No comments:

Post a Comment