ComplianceOnline

Design of Experiments (DoE) - Frequently Asked Questions (FAQs)


Professional Seminar

Introduction to Design of Experiments: Methods and Analysis

For scientists, engineers, analysts, and clinicians who want to design smarter experiments and turn data into confident decisions.

Online · Simu Live 9 Hrs Training will be enabled on your convenient dates
Seminar fee
$899
Register Now

1. What exactly is Design of Experiments (DoE), and why should I care?

You can regard DoE as a more intelligent approach to experimentation. The old way of going about it, testing a single variable at a time, is not only a slow process but can be deceptive. With DoE you can put several factors to the test at once and see how they play off one another. It doesn’t matter if your background is in research, engineering or marketing; DoE will have you down to reliable answers in less time and with no resources squandered on guesswork.

2. What's the difference between a factor, a level, and a response in DoE?

Think of a factor as the variable you are altering, be it temperature or dosage. The level is simply the particular value you put that factor at, say 100°C or 150°C. Then there is the response, which is your metric for the result, whether that be a patient’s recovery rate or the yield of a product. They are the three essentials of any DoE study; get a handle on them and the rest will make sense.

3. How do I decide how many samples or test runs I actually need?

At the end of the day it is a matter of statistical power, that is to say how well your experiment can pick up on an effect if there is one to be had. You will have greater confidence with a larger sample size, of course, but then you are putting in more time and money for it. DoE has its ways of dealing with this; a power analysis, for instance, is a good way to zero in on the right balance. And as a general rule, the more variability you see in the data, the more runs you are going to need to make any firm conclusions.

4. What is randomization, and why do experimenters make such a big deal of it?

When you randomize, you are putting your treatments or conditions in a random sequence as opposed to some set pattern. The point is to shield your results from any hidden variables that you have not put into the equation and which could distort what you find. If you don’t do it, you run the risk of being too sure of yourself and crediting an effect to your own factor when in fact something else was at work, be it a change of shift or a machine that has been warming up.

5. When should I use a Randomized Complete Block Design (RCBD)?

If there is a source of variability you can’t do away with but can put under control, that is when you turn to RCBD. Things like the day of the week, which operator is on duty or even the batch of raw material you are working with. You block on such a variable to take its effect off the table and keep it from skewing your results. Think of it as ensuring every team in a tournament has to play on the same field; then you know the scores are down to skill and not the venue.

6. What is ANOVA and how does it fit into DoE?

You will find that ANOVA, or Analysis of Variance, is the statistical workhorse for most DoE analysis. Its job is to put your findings to the test and show you if what you are seeing between groups is genuine or merely a matter of chance. Take a one-factor experiment with several levels for instance; a one-way ANOVA will give you the bottom line on whether altering that factor has any effect at all. Should it, you can then get into the details and see where those differences are coming from.

7. What are 'main effects' and 'interactions' and why do interactions matter so much?

You have the main effect, which is the unvarnished cause and effect of a single factor on an outcome. Then there is interaction, where one factor’s impact is contingent on the level of another. In my experience, that is where you will find the substance of the matter. Take a drug for instance: it may do its job at high doses in an adult yet be no good for a child. That is an interaction between age and dosage. And if you overlook those interactions, you are prone to drawing some very wrong conclusions.

8. What statistical software is best for analysing DoE data?

You will find JMP, Minitab and R to be the workhorses of the trade. They are all popular for good reason. If you are in industry, you tend to see a lot of JMP and Minitab because their interfaces are so intuitive and easy for a novice to pick up. In an academic setting, on the other hand, R is the tool of choice; it is free and has more than enough power for research purposes. Then there is Python, which is seeing more use of late among data scientists. But in the end, there is no single best option. It comes down to your own background and what is already in use by your team.

9. What is a 2k factorial design, and when is it the right choice?

With a 2k factorial design you put k factors to the test, running each at precisely two levels for what are typically called the low and high settings. The method is well structured and efficient; in a workable number of runs it will show you the main effects and any interactions plainly enough. If you are in the early stages of a new process and need to zero in on the important factors before you get into the finer points of optimization, it makes an excellent place to begin.

10. What's a fractional factorial design, and when do I sacrifice some information for practicality?

With a large number of factors at play, you are looking at thousands of runs for a full factorial design, which is hardly practical. The fractional approach is more sensible: it will execute only a select portion of the combinations. You give up some information in the process, typically on higher-order interactions that don’t make much difference in any case, but the time and cost you put back in your pocket is considerable. When you have to be rigorous yet are short on resources, it is the pragmatic way to go.

11. What is Response Surface Methodology (RSM), and how is it different from basic factorial designs?

With RSM you can go beyond what DoE has to offer. It does more than simply tell you which factors are of consequence; it will show you the optimal way to set them. By charting the relationship between your response and the various factors on a kind of surface, you can zero in on the peaks and valleys. That is where its real worth lies, say in product formulation or chemical engineering. When you are at the stage of wanting to know not if something works but what the very best version of it can be, RSM is an invaluable tool for process optimization.

12. Who actually uses DoE in the real world - is it just for scientists?

You won’t find that to be the case. An engineer will put it to work on a manufacturing line for optimization, while in pharma they are using it to design clinical trials and put together new drugs. Marketers rely on it for large-scale A/B testing, and agricultural researchers have it to get the most out of their crop yields. Even software teams are using DoE to run tests on product features. The truth is DoE is domain-agnostic; if you have variables and outcomes and need to make a sound decision, it is applicable.

13. Do I need a statistics background to learn DoE?

You don’t have to be a mathematician to put in the work, though some statistical know-how is useful. For the most part, any DoE course will take you through the material in its own time, even the beginner level ones. As long as you have a grasp of the fundamentals – what an average or p-value is, for instance, and the notion of variability – you are well grounded. In the end, it is putting these things to use on problems that matter to you where you will do your best learning.

Compliance Trainings related to Design of Experiments