home page banner
button bar

spacer

 
  Home > Mathematics > HS Math & Science > Overview  
 

A Brief Overview of DOE

  1. Introduction: What are the BIG Ideas?
  2. More Advanced Concepts and Possibilities
  3. A Short DOE Glossary
  4. A Motivating Example: The Effect of Herbicide on Seedling Germination

Note: This overview is presented with numerous embedded hypertext references. These enable you to click on any reference, view the associated information, and then click on |Return| to return where you left off .

What are the BIG Ideas

Statistical Design of Experiments -- DOE -- provides a rigorous and universal framework to design and analyze all comparative experiments. The major ideas were conceived and developed in the 1920's by the great British statistician and geneticist, Sir Ronald Fisher, chiefly to meet the needs of agricultural experimentation that he faced as statistician at the British Agricultural Experimentation Station in Rothamstead, England.

Fisher's discovery was that there is a simple underlying geometric structure to all such experimentation, and that a well-designed experiment is one that effectively utilizes this structure under the real constraints of time, money, and experimental environment in which the experiment must be conducted. Fisher and his colleagues further developed the underlying statistical modelling and analysis methods that became the standard procedures of the discipline. Here are some highlights of these procedures . The example provides a practical demonstration of how they are used.
The experimental geometry depends on

  • the number of experimental factors;
  • the possible settings for the factors;
  • any constraints that may exist on how the settings can be combined.

Here's what this means. DOE allows experimenters to study simultaneously the individual and interactive effects of many factors. Five or ten factor experiments are not uncommon, and larger experiments sometimes are done, although controlling and manipulating so many experimental variables at one time can be difficult. Mathematically, each factor adds another dimension to the design space. For example, a 2-factor experiment can be represented as a 2-d design space (i.e., a plane); 3-factors is ordinary 3-d space; 4-factors if 4 dimensions, and so on. Although 10-dimensional spaces may sound very esoteric, in fact, because the structures considered are so simple, they are easy to work with. In general, only points on hypercubes with all their coordinates either +1 or - 1 need to be considered in basic DOE. Here's a simple way to represent such figures in up to five dimensions that is adequate for DOE purposes. It is easy to see how the idea generalizes to any arbitrary number of dimensions, though actually drawing the figures for more would be extremely tedious. It is the concept that counts, anyway.

The letters A-E represent each of the five factors, which vary through the ranges of values coded by - and +. Note that in this design, the factors are either at the - or + levels, and all 32 possible combinations (corners) are included. Each combination is coded by a series of five + or - signs. For example, the upper left far corner of the upper left cube represents the run where A is set at it's - level, B is at its + level, C is at its + level, D is at its - level, and E is at its + level. The other 31 corners are decoded in a similar manner.

Complex models or constraints require more complicated geometries. For example, in experiments involving mixtures of ingredients like cake mixes, paint blending, or drug formulation, the proportions of ingredients are varied in order to find some "best' combination for whatever responses (texture of the cake, spreadability of the paint, speed with which the drug dissolves) are of interest. Such experiments with mixtures always have the fundamental constraint that the proportions cannot be independently varied -- they must always sum up to 100%. It turns out that this constraint changes the design geometry from a (hyper) cube to a (hyper) pyramid (technically known as a simplex). In three factors, this is an equilateral triangle; in four factors, this is a tetrahedron (regular pyramid); and so forth in higher dimensions. Of course, while interesting as background, such advanced matters are the province of experts and certainly aren't part of the downloadable modules that are used in the project.

>Experimental factors can be either continuous (like temperature, pH, voltage> or discrete (like variety of seed, type of microscope used, number of times the mixture is stirred). For continuous factors, the experimental space is the entire hypercube, since the factors can be set at any level within the range allowed (providing there are no constraints, of course). For discrete (also known as "categorical" factors) factors, only the corners of the hypercubes are possible. With some factors discrete and some continuous, the experimental space will consist of edges, faces, hyperfaces, etc. of the hypercubes. In practice, determining the details is not hard and serves as an excellent exercise in geometric thinking for students.

The importance of design geometry is clearly seen in the case where there are just two continuous experimental factors. This gives a square in the plane as the factor space (see the example). The measured response that occurs at every pair of factor settings can be graphed as the height above the plane at each point. All such heights would form a surface, the response surface of results. This response surface completely describes how the response depends on the experimental factors. So the purpose of any experiment is actually to figure out what the response surface looks like (of course, describing it with mathematical models comes to the same thing), and experimental design is therefore the science of choosing points in the factor space to "best" map out the (unknown) experimental response surface.

Of course, in more dimensions (more factors), visualizing the response surface is difficult or impossible. However, the idea is the same, and it turns out that even when a large number of factors are studied, usually only a small subset of them are responsible for most of the observed results within the ranges of interest. This is a very important idea, known as the principle of parsimony, Occam's Razor, or The Pareto Principle in science. It is a driving principle in how scientists formulate and judge scientific models. Designed experiments specifically exploit this principle to save money and improve the informativeness of results. These are fundamental ideas that should and can easily be taught to students when they first encounter the "experimental method" in their high school science courses.

Another of Fisher's important contributions was to develop statistical methods to systematically deal with experimental variation. Here is why this is important. In the description of a response surface in the last paragraph, we pretended that the surface would be obtained (in theory, anyway) by running the experiment at each pair (combination, in general) of settings of the factors. In fact, we know that when experiments are explicitly repeated by possibly different people, using possibly different equipment, different raw materials, in different environments, somewhat different results occur at the same settings of the experimental variables, This is known as experimental variability or experimental error, and experimenters so their best to reduce it as much as possible by carefully controlling everything in an experiment that is not supposed to be deliberately changed. However, it is a fact of life that this can never be done perfectly, especially in complex systems (biology and ecology experiments, for example).

What this means is that it is important to distinguish between the true, theoretical response surface and the one that is actually obtained -- the fancy word is "estimated" -- from the experimental results, which contains variability. In practice, because of this variability, there is always some uncertainty in our knowledge of the true results. As is well known, to reduce this uncertainty, one should try to replicate the experiment several times and average the results. The more one can do this, the more precise the conclusions -- the response surface estimates -- become. Of course, this is expensive (think of replicating a car crash test or a physics accelerator experiment), and so, in real life, experimenters always try to design experiments that are as efficient as possible -- that is, produce the most information possible (i.e., the most precise results) per experimental trial. Although this is something that is ordinarily not considered in high school labs, it is an essential concept and should be. Again, it is easy to do so.

Fisher discovered that the way to achieve efficiency when studying more than one experimental factor (that is, all the time in real life, since many factors always affect real phenomena of interest), is to simultaneously vary them all in carefully prescribed (but quite simple) patterns. This is in direct opposition to the scientific culture of varying only One Factor At a Time (OFAT) while holding all other factors constant. The gains in efficiency can be quite large, permitting experiments that are half, a quarter, or even less as large as OFAT experiments with the SAME experimental effort. In fact, such multifactor experiments actually provide more information than their OFAT counterparts. In multifactor designed experiments, information on interactions is also obtained; OFAT experiments provide no interaction information, since when only one factor is changed, no interactions can occur!

Although Fisher developed these methods in the 1920's, many scientists and engineers are still unaware of them. It is hard to understand why this is so, especially given the obvious and provable superiority of DOE methods. However, in certain scientific disciplines -- psychology and agronomy are two examples -- DOE methods are the standard. For obvious reasons, DOE is increasingly used in industry as more and more managers discover it's obvious cost/benefit superiority to OFAT. This keeps a small army of consultants busy training working industrial scientists and engineers in DOE, but the fact remains that they should have learned it as a basic component of their scientific education.

There are several more important ideas that Fisher and his colleagues developed in order to deal with variability in experimental results. These include

Randomization means that the different factor combinations to be investigated, called experimental runs, should be done in random order if possible. Although there are important theoretical reasons why this is useful, the practical reason is that by doing the runs in random order, one lessens the risks that unknown outside factors (for example, increasing humidity in a chemical experiment, fatigue or learning in a psychological experiment, stretching of a spring in a piece of equipment) might systematically bias the results.

Replication refers to the practice of appropriately repeating experimental runs to gain a quantitative estimate of the amount of experimental variability. The word "appropriately" is very important here and is extensively discussed in the modules. Estimating variability is very crucial in any good experiment. Without knowing how much unexplained variation is present (that is, variation that occurs when the experimental factors do not change), one cannot tell whether observed changes in the response are due to the experimental factor changes or the experimental variability of unknown, uncontrolled variables. In OFAT experiments, experimenters typically must replicate all experimental runs, and they often due so in a systematic, nonrandom way that can cause severe biases or spurious conclusions. In designed experiments, it is sufficient to replicate some (often just one) of the settings; it is even possible to forego replication entirely and build in hidden or "pseudo"-replicates. Surprisingly, the concepts and techniques to do this are quite straightforward and can be covered even in the high school math/science curriculum.

Blocking and split-plotting both reveal the agricultural heritage of DOE, as the terms derive from the blocks and plots in the fields where crop development and yield improvement experiments were conducted. Of course, despite the terminology, the concepts apply to any experiment. Blocking is a technique that allows the possible effects of known sources of experimental variation (like different pieces of experimental equipment, different locations in the experimental fields, locations in ovens in which results are baked) to be entirely removed from the experiment. This is in contrast to using randomization in which one attempts to spread out the variability evenly to avoid bias.

Split-plotting refers to a design and analysis technique that is used when experimental runs cannot be completely randomized. For example, in an experiment to determine how temperature, solute and solvent affect the amount of solute that can be dissolved in a solution, two solvents, glycerine and water, are used two dissolve two different solutes, sugar and salt, at two different temperatures, 20° C and 40° C . There are eight different combinations in all here. Getting the temperatures right is tricky, and it makes sense to do the four high temperature setups all together and the four low temperature setups all together so that even if the temperatures aren't exactly right, at least they will be the same. Unfortunately, this means the run order would not be fully random (although the order within each group of four can be random), and this could lead to problems. In this sort of situation, split-plot ideas would be just what is needed.

These are some of the basic DOE ideas. It should be clear even from this cursory description that they are concerned with practical issues that arise in all experiments. Indeed, understanding how to design and analyze experiments to produce clear and meaningful results is at the heart of what is commonly called The Scientific Method. After all, the ability of anyone to "perform the experiment for themselves to verify what happens" is at the core of what makes science science and distinct from art or philosophy. So asking what it means to "verify" an experiment when no two experimental trials can ever be exactly repeated (how close is close enough?) is clearly essential. Introducing these issues as students are learning science has proven to be very useful. And discovering that the basic algebra and geometry taught in math classes are just what is needed to begin to systematically deal with them is a wonderful opportunity to link math and science. The herbicide example though elementary, raises and (hopefully) clarifies some of these ideas.

 

Example:
The Effect of Herbicide on Seedling Germination


Some More Advanced (but Pretty Neat!) Ideas

Neat ideas are always fun, and in this short discussion, one very neat (and very useful) DOE idea is introduced. We cannot provide much explanation here as to why things works as they do. However, the DOE modules do contain a more complete explanation.

We have already mentioned that the experimental n-dimensional hypercube for n factors has 2^n corners. This means that if an experiment is run in which each factor is only run at 2 levels (call them - and + ) and all the possible combinations are used, there will be 2^n separate experimental runs (i.e., trials), one at each corner. If n = 4, this is 16; for n=5, this is 32; ... ; for n = 10, this is 1024 ! This seems like an impossibly large number, and yet it was stated at the beginning of this section that experiments with 10 or more factors using the DOE hypercube are not out of the realm of possibility. Does this mean that people actually run experiments with that many runs?

The answer is no. In fact, you can get along with a great deal less runs and still get most (or even all) the information that you need from the experiment. For example, it is possible to use the hypercube to experiment with up to 7 factors in only 8 runs and up to 15 factors in 16 runs! That is, only 8 of the 2^7=128 and only 16 of the 2^15 = 32,768 corners of their respective hypercubes are chosen. Of course, they have to be the right 8 or 16 -- not just any old ones! How they are chosen cannot be explained here, but there are some very elegant mathematical ideas involved. These can be discussed at any of several levels of difficulty, from basic algebra, to geometry, and even as an example of abstract group theory. For example, here are the 8 hypercube points (instead of 16) for a 4 factor, 2 level experiment.

 

 

Note that the points chosen form two opposite regular tetrahedrons in the two cubes that are used to represent the 4-dimensional hypercube. Very pretty, indeed!

In fact, these ideas are part of a very powerful sequential strategy of experimentation. R. A. Fisher once said that the best time to plan an experiment was after it was done! -- by which he meant that you know better how to look for things after you've already tried to find them. Although this may sound silly, it is actually a very valuable idea, because it implies that when experimental resources (time, money, people, equipment) are limited -- which is always -- one should try to experiment in stages. At each stage, you should do enough to get useful information to help you plan the next stage better. The thing you would like to avoid is planning a complete experimental program at the beginning when you are maximally ignorant. On the other hand, you must be careful to do enough at each stage so that the information is valid -- that is, experimental noise does not hide or mislead.

The ideas hinted at in this discussion are very useful for implementing such a strategy. In general, they allow one to design initial screening experiments to efficiently look at a great many experimental variables, thus allowing Mother Nature to help determine which "vital few" should be explored in more detail, and which "unimportant many" can be safely dropped from consideration. Further experimental effort can then be concentrated where it is likely to yield the greatest benefit.

Although some of these ideas may sound like overkill for high school students, in fact, they have little difficulty learning them, and they can use them as a firm base for future studies. When students do science projects as a component of their coursework, they can often put the concepts to good use right away! Even if they don't, since students rarely encounter these DOE ideas in more advanced science courses, the knowledge that they gain here could give them an advantage in later work. One of the goals of the project is to see whether this will be true for students who pursue further science studies.

 

A Short DOE Glossary


Blocking

An experimental technique that allows the possible effects of known but uncontrolled variables to be completely eliminated from an experiment. Here is a simple example. Suppose you wanted to do an experiment comparing basketball shooting accuracy with the dominant hand (e.g. right hand for right-handers) vs. the nondominant hand. One way to do the experiment would be to randomly select 10 people and the hand they shoot with, and then compare the overall right hand results with the overall left hand results. In this way of doing things, it is quite possible that any difference between the hands would get washed out by the large overall difference in individual basketball shooting ability.

A better way to do the experiment would be to randomly select 5 people and have each shoot both with their right and left hands (perhaps varying which hand they use in random order). One can then look at the difference for each person and average these differences as an overall measure of right vs./ left ability. Because the variation between people cancel out when things are done this way, this variation is completely eliminated from the experiment -- it as if it didn't even exist. Done this way, the experiment has been blocked on people. Of course, this is the simplest sort of blocking that can occur, but it does illustrate the idea.

|Return|


Comparative Experiment

Any experiment whose purpose is to determine the quantitative effect of input(s) that are deliberately changed (the experimental "variables" or "factors")on measured output(s) (the "response(s)").

|Return|


Confounding or Aliasing

Main and/or interaction effects are said to be confounded or aliased if only their combined effect, not their separate individual effects, can be determined from the experimental design. As a very simple example, suppose that one ran two experimental trials to determine the effect of teaching method and instructor on student performance. A group of 40 students are randomly split into two groups of 20. Half are assigned to Teacher A using method 1 to teach, say, factoring in a math class. The other half are assigned to Teacher B using method 2. After the factoring unit is covered, the performance of the two groups is assessed by comparing student scores on a common exam. Clearly, any systematic difference between the groups can only be ascribed to a combined effect of different teacher and different method, as both teacher and method change together. In DOE terminology, the effect of teacher and teaching method are fully confounded or aliased with one another. Note that no amount of data analysis can determine the separate effects. The aliasing is inherent to the design.

Although it may seem that one would always want to avoid such aliasing, it turns out that this is not the case -- and is essentially unavoidable anyway. Indeed, proper control of aliasing turns out to be one of the keys in the sequential design strategy. Note, also, that one may also partially confound effects in a design. This is an advanced (but quite useful) technique.


Design Constraint

A mathematical or physical limitation that restricts the possible combinations of the factors that can be tried. For example, in a mixtures experiment (an experiment in which the factors are proportions of the mixture ingredients), any given combination of proportions must always add up to 100% (of course!). In a chemical experiment in which the factors are concentrations of various chemicals, certain concentrations may be explosive and so must be avoided.

|Return|


Continuous or Measurement-Type Factor

An experimental factor that can, in principle, be set anywhere within its experimental range for an experimental trial. Examples are temperature, time, pH, amount of fertilizer added to the soil, height, weight, and so forth.

|Return|


D-optimal design criterion

D-optimality is a mathematical technique that is sometimes useful in producing experimental designs, especially in nonstandard and irregular (e.g., not hpercubes or hyperspheres) design spaces. Special purpose computer software is required to use this method as the computations are far too extensive to be done by hand. For those who might care, finding D-optimal designs is an NP-complete problem, so that such designs are only approximated by the software.


Design Resolution

The degree of confounding present in a design. Design resolution refers to the amount of detail -- separate identification of factor effects and interactions -- that the design supports. This is only relevant for multifactor, not OFAT, experiments.


Discrete or Categorical Dactor

An experimental factor that can be set only at distinct, separate levels. For example, male and female (in an experiment on fish behavior); metal, glass, or plastic stirrer in a chemical experiment; type of soil -- sandy, clay, loam, gravel (in an experiment on plant growth). Note that categorical factors can be either unordered (male/female) or ordered (number of times the rat previously traversed the maze).
|Return|


Efficiency of an Experimental Design

The amount of information generated by an experimental design. Equivalently, the precision in the fitted coefficients of the response surface. Although a complete explanation of this is rather technical, what it comes down to is a way of defining the amount of averaging that the design can achieve. A more efficient design is equivalent to saying that it generates more information which is equivalent to saying that the response surface is known with greater precision which is equivalent to saying that there is less uncertainty in the conclusions.The important idea is that for a fixed amount of experimental effort, usually the more efficient the design, the better.
|Return|


Experimental Precision

The amount of experimental variability that exists, usually determined from the variability of replicated trials. The greater the precision, the less variability there is and the less uncertainty there is in the results, including the fitted response surface.
|Return|


Experimental Bias

The tendency of an experiment to produce results that systematically differ from the true results. For example, a result may be "biased high" if an instrument is improperly operated. A biased measurement is a measurement that is either higher or lower on average than it should be.
|Return|


Experimental Variability,  Error, or Noise

These words are used synonomously to refer to the fact that when experimental trials are repeated without changing the settings of the factors, the response varies rather than remaining constant. This is due, of course, to the hopefully small effects of changes in many uncontrolled factors that exist in any experiment or measurement. It is never possible to exactly repeat anything. In order to quantify such variability -- which is necessary in order to properly assess how the response depends on the experimental factors -- statistical methods must be used.
|Return|


(Experimental) Factor or Variable

Variables that are deliberately manipulated in an experiment in order to assess their effect on the response. For example, in an experiment to assess the effect of various lengths, diameters, and materials on voltage drop across a length of wire, the experimental factors are length, diameter, and material of which the wire is made. The response is the measured voltage drop.
|Return|


Factor Level

The setting of an experimental factor. Typically, in DOE, continuous factors are "standardized" to the range from -1 to +1. For example, if temperature is an experimental factor that is to be varied between 30° and 60° C., then convert 30 to -1, 60 to +1 and linearly interpolate any value between (e.g., 50 interpolates to +1/3). This is equivalent to making a simple linear scale change (like Fahrenheit to centigrade, pounds to kilograms, and so forth).
Note, however, that with categorical factors, the ±1 standardization can only be done when there are exactly two categories. When there are more, it makes no sense because it would convert a non-ordered identifying label (which variety of 3 seed varieties) to an ordered scale (-1,0,+1). For this reason, advanced methods must be used to design experiments with categorical factors having more than two categories.
|Return|


Fisher, Sir Ronald A.

(1890-1962) The famous British geneticist and statistician who originated and developed the foundations of experimental design. His books, Statistical Methods for Research Workers and The Design of Experiments are classic. Much standard statistical terminology -- like anova and randomization test -- derives from his work. The "F" of the statistical F distribution (upon which the F test is based) is named after him.
|Return|


Foldover

A sequential design technique that produces a "mirror image" of a given design in order to separate confounded interactions. Generally, this converts designs of resolution III to designs of resolution IV.


Fractional Factorial Design

A fractional factorial design is a design in which only a selected fraction of all the possible combinations of the design factors are run. For the two level hypercube designs, this means only a subset of all the hypercube corners are actually run.


Hidden Replication

An experimental design is said to permit hidden replication when, if some of the factors can be safely ignored, there is replication in the remaining factors. For example, suppose one did a 2-factor experiment in which runs at the the four different combinations (-,-), (-,+), (+,-), and (+,+) were conducted. If the second of the two factors had essentially no effect, then, so far as Mother Nature was concerned, a one factor experiment in which the + and - settings was replicated twice was actually done. This replication was "hidden" until it became clear that the second factor could be safely ignored. Hidden replication is most commonly used in "screening" experiments with many (more than 4, say) factors.
|Return|


Hypercube

The equivalent of a cube in an arbitrary number of dimensions. In DOE, hypercubes are usually stanardized so that all coordinate entries are ±1. Hence a 2-d hypercube is just the ordinary square with 4 corners at (-1,-1), (-1,+1), (+1,-1), and (+1,+1). To save writing, the 1's are usually omitted. Hence, we would give the corners simply as (-,-), (-,+), (+,-), and (+,+).  Using this convention, a 3-d hypercube is just an ordinary cube with 8 corners at (-,-,-), (+,-,-), (-,+,-), (+,+,-), (-,-,+), (+,-,+), (-,+,+), and (+,+,+). And so on with 4,5, and more dimensions.  Note that in 2 dimensions, there are 2^2=4 corners; in 3, there are 2^3=8; in 4, there are 2^4=16; and, in general, in n dimensions, a hypercube has 2^n corners.
|Return|


Interaction

A 2-factor interaction(2 fi) is the difference in the response that occurs when both factors are changed simultaneously from what was expected to occur based on the effect of changing the factors individually. When the combined effect is significantly greater than the sum of the individual effects, it is often called symbiosis; when it is significantly less, it is often called interference. Algebraically, a 2-factor interaction is represented by the presence of a cross product term (factor_1 * factor_2) in the model.  Graphically, 2 fi's are indicated by significant non-parallelism of the two lines in an interaction plot. An example of such a plot is provided in the herbicide example.  Higher order interactions -- that is 3 or more factor interactions -- also rarely may be important. However, these require more complicated designs with more experimental trials than are typically used. So in the basic approach followed in the DOE project, they are not considered.
|Return|


Multifactor Design

An experiment with several experimental factors in which more than one factor at a time changed.
|Return|


Parsimony, Occam's Razor, or the Pareto Principle

All of these terms are used equivalently here and refer to the "vital few; trivial many" principle. That is, in any real experiment in which many factors are considered, almost always, only a very small proportion of them will have most of the effect. The rest should be treated as the "trivial many" and considered to be indistinguishable from experimental noise. So in trying to build a model(=fit a simple algebraic equation in basic DOE) to describe the experimental results, one should try to find one that uses as few factors (= parameters =coefficients) as possible. That is, one should be as parsimonious in using coefficients as possible. Occam's Razor refers to the idea that if several models (theoretical or experimental) do equally well in explaining what is observed, than the simplest one (fewest parameters) should be chosen.
|Return|


Randomization

Running the experimental trials in a random order. This is done to protect against the systematic effects of unknown non-experimental variables (like environment) that might bias the experimental results. There are also other ways in which random assignment is used. For example, in doing clinical trials to determine efficacy and safety of new drugs, patients are almost always assigned randomly to the treatment (receive the drug) vs. the control (receive an inactive placebo) group. This prevents unconscious biases (for example, assigning sicker people to receive the drug) from influencing the experimental results.
|Return|


Replication

Repeating an experimental trial at constant factor settings in order to determine the amount of experimental variability. Since the settings of the experimental factors do not change, observed variability in the response must be due to the effects of other, uncontrolled factors that are present throughout the experiment. This includes measurement factors. It is important when doing replicates NOT to do them close together in time under nearly identical circumstances. Rather, the replicates should be done over the same range on conditions in which the entire experiment is performed. This allows all the experimental variability that is actually present to be observed and quantified.
|Return|


Residual Analysis

Residual are what's "left over" from the data after a model has been fit. That is, the residuals are defined as:

residual = actual data value - value predicted by fitted model

 

If the model fits well, then all systematic behavior is predicted by the model and the residuals should look like random noise. When residuals depart from this behavior and exhibit systematic trends or dependencies, the model may need to be modified. This, in turn, may require appropriate design changes at the next stage of the experimental process.


Response Surface

The higher dimensional "surface" of true responses (that is, absent all extraneous experimental variatibility) obtained from all possible combinations of settings of the experimental factors (over their allowable experimental ranges). Knowledge of this surface is equivalent to a complete understanding of how the response depends on the experimental factors.If some or all the factors are categorical, the "surface" may actually be isolated points, curves, or other lower dimensional structures.
|Return|


Response Variable

A measured experimental outcome. Response variables may come in many forms. For example, the response in a physics experiment exploring the effects of different numbers of windings and currents on the performance of an electromangnet could be the magnetic force generated. In an industrial experiment on a chemical process, the response might be the yield of the product. In an experiment to develop a new cake mix, the response variables might be taste and texture as rated by a panel of raters on a 1 to 10 scale. In an experiment to see what effect height, sex, and distance from the basket have on foul shooting accuracy, the response could be the number of baskets made out of ten tries.
A single experiment might have several response variables that characterize different aspects of the outcome. The key idea is that there must be some kind of "reliable" measurement that is made that can be used for analysis of the results. What is meant by "reliable" is, itself, a complex statistical issue.
|Return|


Response Surface Methods

A broad category of experimental design and analysis methods based on fitting models which are linear and quadratic equations in the experimental factors (this includes cross-terms for interactions). Such purely empirical models are useful for describing systems behavior, process improvement, and often increasing understanding so that more detailed conceptual (mechanistic) models can be developed.


Screening Design

A screening design is one in which relatively few experimental runs are used to efficiently study a large number of experimental factors to "screen out" those few that are most active from the remainder that are relatively inactive over the ranges being considered. Such designs are very useful in the early stages of sequential experimentation in order to conserve resources and identify the most influential experimental factors for more detailed study . Other essentially synonomous terms for this are "Resolution III," "Plackett-Burman," and "Saturated" design.
|Return|


Sequential Experimental Strategy

Sequential experimentation means investigations that are carried out in stages so that each successive experiment can be designed and executed in the light of information gained from previous ones. This is really a description of a scientific learning strategy that encourages the efficient expenditure of limited experimental resources. Although most experimenters intuitively try to do things this way, there are specific design and analytical tools in DOE that have been rigorously developed for this purpose. Some of these procedures are:

  • Residual analysis,
  • fractional factorial designs,
  • design resolution,
  • sequential assembly,
  • foldover,
  • response surface methods,
  • D-optimality design criteria, and
  • steepest ascent/gradient optimization
  • It is important to emphasize that this provides experimenters a systematic framework -- not merely an artful philosophy -- in which to execute the strategy. This gives greater control and improved likelihood of success.

     

    |Return|


    Sequential Assembly of Designs

    Building and performing complex experimental designs one stage at a time. Later stages are added only when and if needed. This conserves experimental resources while yielding the maximal information at each stage of the assembly process.


    Split-Plotting

    A method for running experiments in non-random fashion when not all experimental factors cannot be completely randomized. This is an advanced topic not covered in the DOE Modules.
    |Return|


    Statistical Model

    A statistical model is an algebraic equation that expresses how a response of interest is related to the experimental factors and the experimental variability. For example,
    (I): Resp = K + A*Factor_1 + B*Factor_2 + C*Factor_1*Factor_2 + random_variability
    is such a model. "K", "A", "B", and "B" are unknown "parameters" or "coefficients" that must be estimated from the experimental data. Factor_1 and Factor_2 are the (known)settings of the the two experimental factors at which the response is actually measured in the experiment. This model is said to be linear because the response is a linear function of the unknown coefficients. This can be a bit confusing, because roles of unknown coefficient and known variable setting reverse what we are accustomed to in such equations. For example, the model
    (II): Resp = K + A*[Factor_1]^2 + B*Factor_1 + C*Factor_2 + random_variability
    is also linear for the same reason, even though Factor_1 now also appears as a squared term. However, the model
    (III): Resp= K + sin(A*Factor_1) + B*cos(C*Factor_2 + D) + random_variability
    is said to be "nonlinear" because the response is a nonlinear function (involving sines and cosines) of the Factors. Experimental design and analysis for nonlinear models is an advanced (but nevertheless useful)topic that is not considered in basic DOE. In fact, in basic DOE only models of type (I) are usually considered. These suffice for many applications.
    |Return|


    Steepest Ascent/Gradient Methods

    Methods for improvement based on experiments and analysis that model the response surface as a "mountain" (in n dimensions). The fastest way to climb such a mountain -- that is, the path of steepest ascent -- is to go straight up the sides. By mathematically determining this direction, one can determine how to change the experimental factors to effect the greatest possible change in the response.


    The Scientific Method

    A detailed discussion of this term would fill a book. However, certainly some of the fundamental ways in which science is different that philosophy or art or literature surely must include:

     
    1. Science is about predicting observable phenomena. Merely giving explanations after the fact like stock market analysts explaining yesterday's rises or falls in prices is not good enough. You must predict what will be observed before it is observed.
      The concept of observable phenomena is also central. This means that given instructions on how to construct measurement equipment, anyone who produces the equipment should be able to measure the "same" results (within experimental variability). Science is democratic and replicable. That is, observation should not depend on who we are, what beliefs we hold, or what salary we make. Of course, the predictions may be probabilistic and involve a level of uncertainty: we only know that the likelihood of thunderstorms is higher under some conditions than others; or that a major earthquake will almost certainly occur along the San Andreas fault within 100 years; or that overuse of antibiotics will accelerate evolutionary development of antibiotic resistant strains of pneumonococci. Although such predictions involve uncertainty, they are just as legitimate science as, say, the prediction of a space shuttle's orbit.
    2. Predictions are made by the development of scientific models. Science is not about discovering eternal truths; rather, it is about developing models from which precise and accurate predictions can be made. On the most fundamental level, science does not discuss truth or the underlying nature of reality at all -- this is the realm of philosophy. All scientific models, even famous ones like the Newtons's laws of gravity, the Theory of Relativity, or the Theory of Evolution, can be flawed or incomplete in some respects but still be useful to make predictions within certain defined realms. Another way of saying this, is that all scientific models are falsifiable, but none can be proven (unlike mathematics). There may always be another consequence that observation will contradict.
    3. Models are usually, but not always, quantitative and expressed mathematically. One can broadly distinguish two overlapping kinds of scientific models: mechanistic or conceptual models, in which some kind of theoretical construct is used to develop the model; and empirical models which are based exclusively on observed data (and use statistical analysis to develop predictions of what will be observed in the future under other conditions). Overlap occurs, because extended observation usually motivates the development of conceptual models, and conceptual models must always be criticized (i.e. put to the test)by real data.
    4. Because all science involves observable phenomena, the inevitable presence of some uncontrolled variability in all observations means that no scientific observation is exactly known -- "we see through a glass darkly." All scientific observation therefore involves uncertainty, and the uncertainty must explicitly and quantitatively be dealt with as part of the process of scientific learning. Contrast this with philosophy or religion or literature, for example.

     

     
    Site Map   |  Departments   |  Contact Us   |  MOST(Blackboard)   |  email: Apollo   |  email: Outlook   |  Medicaid
    District Support   |  Children Services   |  Special Populations   |  Parents   |  Schools   |  Courses   |  About Us   |  Jobs