- Introduction: What are the BIG
Ideas?
- More Advanced Concepts and Possibilities
- A Short DOE Glossary
- A
Motivating Example: The Effect of Herbicide on Seedling
Germination
Note: This overview is presented with numerous embedded
hypertext references. These enable you to click on any reference,
view the associated information, and then click on |Return|
to return where you left off .
What are the BIG Ideas
-
Statistical Design of Experiments -- DOE -- provides a
rigorous and universal framework to design and analyze all comparative
experiments. The major ideas were conceived and developed
in the 1920's by the great British statistician and
geneticist, Sir Ronald
Fisher, chiefly to meet the needs of agricultural
experimentation that he faced as statistician at the British
Agricultural Experimentation Station in Rothamstead, England.
-
Fisher's discovery was that there is a simple underlying
geometric structure to all such experimentation, and that a
well-designed experiment is one that effectively utilizes this
structure under the real constraints of time, money, and
experimental environment in which the experiment must be
conducted. Fisher and his colleagues further developed the
underlying statistical modelling
and analysis methods that became the standard procedures of
the discipline. Here are some highlights of these procedures .
The example
provides a practical demonstration of how they are used.
The experimental geometry depends on
- the number of experimental factors;
- the possible settings for the factors;
- any constraints
that may exist on how the settings can be combined.
Here's what this means. DOE allows experimenters to study
simultaneously the individual and interactive
effects of many factors. Five or ten factor experiments are not
uncommon, and larger experiments sometimes are done, although
controlling and manipulating so many experimental variables at one
time can be difficult. Mathematically, each factor adds another
dimension to the design space. For example, a 2-factor experiment
can be represented as a 2-d design space (i.e., a plane);
3-factors is ordinary 3-d space; 4-factors if 4 dimensions, and so
on. Although 10-dimensional spaces may sound very esoteric, in
fact, because the structures considered are so simple, they are
easy to work with. In general, only points on hypercubes
with all their coordinates either +1 or - 1 need to be considered
in basic DOE. Here's a simple way to represent such figures in up
to five dimensions that is adequate for DOE purposes. It is easy
to see how the idea generalizes to any arbitrary number of
dimensions, though actually drawing the figures for more would be
extremely tedious. It is the concept that counts, anyway.
The
letters A-E represent each of the five factors, which vary through
the ranges of values coded by - and +. Note that in this design,
the factors are either at the - or + levels, and all 32 possible
combinations (corners) are included. Each combination is coded by
a series of five + or - signs. For example, the upper left far
corner of the upper left cube represents the run where A is set at
it's - level, B is at its + level, C is at its + level, D is at
its - level, and E is at its + level. The other 31 corners are
decoded in a similar manner.
Complex models or constraints require more complicated
geometries. For example, in experiments involving mixtures of
ingredients like cake mixes, paint blending, or drug formulation,
the proportions of ingredients are varied in order to
find some "best' combination for whatever responses
(texture of the cake, spreadability of the paint, speed with which
the drug dissolves) are of interest. Such experiments with
mixtures always have the fundamental constraint that the
proportions cannot be independently varied -- they must always sum
up to 100%. It turns out that this constraint changes the design
geometry from a (hyper) cube to a (hyper) pyramid (technically known
as a simplex). In three factors, this is an equilateral triangle;
in four factors, this is a tetrahedron (regular pyramid); and so
forth in higher dimensions. Of course, while interesting as
background, such advanced matters are the province of experts and
certainly aren't part of the downloadable
modules that are used in the project.
>Experimental factors can be either continuous
(like temperature, pH, voltage> or discrete
(like variety of seed, type of microscope used, number of times
the mixture is stirred). For continuous factors, the experimental
space is the entire hypercube, since the factors can be set at any
level within the range allowed
(providing there are no constraints, of course). For discrete
(also known as "categorical" factors) factors, only the
corners of the hypercubes are possible. With some factors discrete
and some continuous, the experimental space will consist of edges,
faces, hyperfaces, etc. of the hypercubes. In practice,
determining the details is not hard and serves as an excellent
exercise in geometric thinking for students.
The importance of design geometry is clearly seen in the case
where there are just two continuous experimental factors. This
gives a square in the plane as the factor space (see the example).
The measured response that occurs at every pair of factor settings
can be graphed as the height above the plane at each
point. All such heights would form a surface, the response
surface of results. This response surface completely describes
how the response depends on the experimental factors. So the
purpose of any experiment is actually to figure out what the
response surface looks like (of course, describing it with
mathematical models comes to the same thing), and experimental
design is therefore the science of choosing points in the factor
space to "best" map out the (unknown) experimental
response surface.
Of course, in more dimensions (more factors), visualizing the
response surface is difficult or impossible. However, the idea is
the same, and it turns out that even when a large number of
factors are studied, usually only a small subset of them are
responsible for most of the observed results within the ranges of
interest. This is a very important idea, known as the principle of
parsimony, Occam's Razor,
or The Pareto Principle in science. It is a driving principle
in how scientists formulate and judge scientific models. Designed
experiments specifically exploit this principle to save money and
improve the informativeness of results. These are fundamental
ideas that should and can easily be taught to students when they
first encounter the "experimental method" in their high
school science courses.
Another of Fisher's important contributions was to develop
statistical methods to systematically deal with experimental
variation. Here is why this is important. In the description of a
response surface in the last paragraph, we pretended that the
surface would be obtained (in theory, anyway) by running the
experiment at each pair (combination, in general) of settings of
the factors. In fact, we know that when experiments are explicitly
repeated by possibly different people, using possibly different
equipment, different raw materials, in different environments,
somewhat different results occur at the same settings of the
experimental variables, This is known as experimental
variability or experimental error, and experimenters so their
best to reduce it as much as possible by carefully controlling
everything in an experiment that is not supposed to be
deliberately changed. However, it is a fact of life that this can
never be done perfectly, especially in complex systems (biology
and ecology experiments, for example).
What this means is that it is important to distinguish between
the true, theoretical response surface and the one that
is actually obtained -- the fancy word is "estimated" --
from the experimental results, which contains variability. In
practice, because of this variability, there is always some
uncertainty in our knowledge of the true results. As is well
known, to reduce this uncertainty, one should try to replicate the
experiment several times and average the results. The more one can
do this, the more precise
the conclusions -- the response surface estimates -- become. Of
course, this is expensive (think of replicating a car crash test
or a physics accelerator experiment), and so, in real life,
experimenters always try to design experiments that are as efficient
as possible -- that is, produce the most information possible
(i.e., the most precise results) per experimental trial. Although
this is something that is ordinarily not considered in high school
labs, it is an essential concept and should be. Again, it is easy
to do so.
Fisher discovered that the way to achieve efficiency when
studying more than one experimental factor (that is, all the time
in real life, since many factors always affect real phenomena of
interest), is to simultaneously vary them all in
carefully prescribed (but quite simple) patterns. This is in
direct opposition to the scientific culture of varying only One
Factor At a Time (OFAT) while holding all other factors constant.
The gains in efficiency can be quite large, permitting experiments
that are half, a quarter, or even less as large as OFAT
experiments with the SAME experimental effort. In fact, such multifactor
experiments actually provide more information than
their OFAT counterparts. In multifactor designed experiments,
information on interactions is also obtained; OFAT experiments
provide no interaction information, since when only one factor is
changed, no interactions can occur!
Although Fisher developed these methods in the 1920's, many
scientists and engineers are still unaware of them. It is hard to
understand why this is so, especially given the obvious and
provable superiority of DOE methods. However, in certain
scientific disciplines -- psychology and agronomy are two examples
-- DOE methods are the standard. For obvious reasons, DOE is
increasingly used in industry as more and more managers discover
it's obvious cost/benefit superiority to OFAT. This keeps a small
army of consultants busy training working industrial scientists
and engineers in DOE, but the fact remains that they should have
learned it as a basic component of their scientific education.
There are several more important ideas that Fisher and his
colleagues developed in order to deal with variability in
experimental results. These include
Randomization means that the different factor combinations to be
investigated, called experimental runs, should be done in
random order if possible. Although there are important theoretical
reasons why this is useful, the practical reason is that by doing
the runs in random order, one lessens the risks that unknown
outside factors (for example, increasing humidity in a chemical
experiment, fatigue or learning in a psychological experiment,
stretching of a spring in a piece of equipment) might
systematically bias the results.
Replication refers to the practice of appropriately
repeating experimental runs to gain a quantitative estimate of the
amount of experimental variability. The word
"appropriately" is very important here and is
extensively discussed in the modules. Estimating variability is
very crucial in any good experiment. Without knowing how much
unexplained variation is present (that is, variation that occurs
when the experimental factors do not change), one cannot
tell whether observed changes in the response are due to the
experimental factor changes or the experimental variability of
unknown, uncontrolled variables. In OFAT experiments,
experimenters typically must replicate all experimental
runs, and they often due so in a systematic, nonrandom way that
can cause severe biases or spurious conclusions. In designed
experiments, it is sufficient to replicate some (often just one)
of the settings; it is even possible to forego replication
entirely and build in hidden
or "pseudo"-replicates. Surprisingly, the concepts and
techniques to do this are quite straightforward and can be covered
even in the high school math/science curriculum.
Blocking and split-plotting both reveal the agricultural
heritage of DOE, as the terms derive from the blocks and plots in
the fields where crop development and yield improvement
experiments were conducted. Of course, despite the terminology,
the concepts apply to any experiment. Blocking is a technique that
allows the possible effects of known sources of
experimental variation (like different pieces of experimental
equipment, different locations in the experimental fields,
locations in ovens in which results are baked) to be entirely
removed from the experiment. This is in contrast to using
randomization in which one attempts to spread out the variability
evenly to avoid bias.
Split-plotting refers to a design and analysis technique that
is used when experimental runs cannot be completely randomized.
For example, in an experiment to determine how temperature, solute
and solvent affect the amount of solute that can be dissolved in a
solution, two solvents, glycerine and water, are used two dissolve
two different solutes, sugar and salt, at two different
temperatures, 20° C and 40° C . There are eight different
combinations in all here. Getting the temperatures right is
tricky, and it makes sense to do the four high temperature setups
all together and the four low temperature setups all together so
that even if the temperatures aren't exactly right, at least they
will be the same. Unfortunately, this means the run order would
not be fully random (although the order within each group of four
can be random), and this could lead to problems. In this sort of
situation, split-plot ideas would be just what is needed.
These are some of the basic DOE ideas. It should be clear even
from this cursory description that they are concerned with
practical issues that arise in all experiments. Indeed,
understanding how to design and analyze experiments to produce
clear and meaningful results is at the heart of what is commonly
called The Scientific Method.
After all, the ability of anyone to "perform the experiment
for themselves to verify what happens" is at the core of what
makes science science and distinct from art or philosophy. So
asking what it means to "verify" an experiment when no
two experimental trials can ever be exactly repeated (how close is
close enough?) is clearly essential. Introducing these issues as
students are learning science has proven to be very useful. And
discovering that the basic algebra and geometry taught in math
classes are just what is needed to begin to systematically deal
with them is a wonderful opportunity to link math and science. The
herbicide example
though elementary, raises and (hopefully) clarifies some of these
ideas.
Example:
The Effect of Herbicide on Seedling Germination
Neat ideas are always fun, and in this short discussion, one very
neat (and very useful) DOE idea is introduced. We cannot provide
much explanation here as to why things works as they do. However,
the DOE
modules do contain a more complete explanation.
We have already mentioned that the experimental n-dimensional
hypercube for n factors has 2^n corners. This means that if an
experiment is run in which each factor is only run at 2 levels
(call them - and + ) and all the possible combinations are used,
there will be 2^n separate experimental runs (i.e., trials), one
at each corner. If n = 4, this is 16; for n=5, this is 32; ... ;
for n = 10, this is 1024 ! This seems like an impossibly large
number, and yet it was stated at the beginning of this section
that experiments with 10 or more factors using the DOE hypercube
are not out of the realm of possibility. Does this mean that
people actually run experiments with that many runs?
The answer is no. In fact, you can get along with a great deal
less runs and still get most (or even all) the information that
you need from the experiment. For example, it is possible to use
the hypercube to experiment with up to 7 factors in only 8 runs
and up to 15 factors in 16 runs! That is, only 8 of the 2^7=128
and only 16 of the 2^15 = 32,768 corners of their respective
hypercubes are chosen. Of course, they have to be the right 8 or
16 -- not just any old ones! How they are chosen cannot be
explained here, but there are some very elegant mathematical ideas
involved. These can be discussed at any of several levels of
difficulty, from basic algebra, to geometry, and even as an
example of abstract group theory. For example, here are the 8
hypercube points (instead of 16) for a 4 factor, 2 level
experiment.

-
Note that the points chosen form two opposite regular tetrahedrons
in the two cubes that are used to represent the 4-dimensional
hypercube. Very pretty, indeed!
In fact, these ideas are part of a very powerful sequential
strategy of experimentation. R. A. Fisher once said that the
best time to plan an experiment was after it was done! -- by which
he meant that you know better how to look for things after you've
already tried to find them. Although this may sound silly, it is
actually a very valuable idea, because it implies that when
experimental resources (time, money, people, equipment) are limited
-- which is always -- one should try to experiment in stages. At
each stage, you should do enough to get useful information to help
you plan the next stage better. The thing you would like to avoid
is planning a complete experimental program at the beginning when
you are maximally ignorant. On the other hand, you must be careful
to do enough at each stage so that the information is valid --
that is, experimental noise does not hide or mislead.
The ideas hinted at in this discussion are very useful for
implementing such a strategy. In general, they allow one to design
initial screening experiments
to efficiently look at a great many experimental variables, thus
allowing Mother Nature to help determine which "vital
few" should be explored in more detail, and which
"unimportant many" can be safely dropped from
consideration. Further experimental effort can then be
concentrated where it is likely to yield the greatest benefit.
Although some of these ideas may sound like overkill for high
school students, in fact, they have little difficulty learning
them, and they can use them as a firm base for future studies.
When students do science projects
as a component of their coursework, they can often put the
concepts to good use right away! Even if they don't, since
students rarely encounter these DOE ideas in more advanced science
courses, the knowledge that they gain here could give them an
advantage in later work. One of the goals of the project is to see
whether this will be true for students who pursue further science
studies.
-
-
An experimental technique that allows the possible effects
of known but uncontrolled variables to be completely
eliminated from an experiment. Here is a simple example.
Suppose you wanted to do an experiment comparing basketball
shooting accuracy with the dominant hand (e.g. right hand for
right-handers) vs. the nondominant hand. One way to do the
experiment would be to randomly select 10 people and the hand
they shoot with, and then compare the overall right hand
results with the overall left hand results. In this way of
doing things, it is quite possible that any difference between
the hands would get washed out by the large overall difference
in individual basketball shooting ability.
A better way to do the experiment would be to randomly
select 5 people and have each shoot both with their
right and left hands (perhaps varying which hand they use in
random order). One can then look at the difference
for each person and average these differences as an overall
measure of right vs./ left ability. Because the variation
between people cancel out when things are done this way, this
variation is completely eliminated from the experiment -- it
as if it didn't even exist. Done this way, the experiment has
been blocked on people. Of course, this is the simplest sort
of blocking that can occur, but it does illustrate the idea.
|Return|
-
Any experiment whose purpose is to determine the
quantitative effect of input(s) that are deliberately changed
(the experimental "variables" or "factors")on
measured output(s) (the "response(s)").
|Return|
-
Main and/or interaction effects are said to be confounded
or aliased if only their combined effect, not their
separate individual effects, can be determined from the
experimental design. As a very simple example, suppose that
one ran two experimental trials to determine the effect of
teaching method and instructor on student performance. A group
of 40 students are randomly split into two groups of 20. Half
are assigned to Teacher A using method 1 to teach, say,
factoring in a math class. The other half are assigned to
Teacher B using method 2. After the factoring unit is covered,
the performance of the two groups is assessed by comparing
student scores on a common exam. Clearly, any systematic
difference between the groups can only be ascribed to a
combined effect of different teacher and different method, as
both teacher and method change together. In DOE terminology,
the effect of teacher and teaching method are fully confounded
or aliased with one another. Note that no amount of
data analysis can determine the separate effects. The aliasing
is inherent to the design.
Although it may seem that one would always want to avoid
such aliasing, it turns out that this is not the case -- and
is essentially unavoidable anyway. Indeed, proper control of
aliasing turns out to be one of the keys in the sequential
design strategy. Note, also, that one may also partially
confound effects in a design. This is an advanced (but quite
useful) technique.
-
A mathematical or physical limitation that restricts the
possible combinations of the factors that can be tried. For
example, in a mixtures experiment (an experiment in which the
factors are proportions of the mixture ingredients), any given
combination of proportions must always add up to 100% (of
course!). In a chemical experiment in which the factors are
concentrations of various chemicals, certain concentrations
may be explosive and so must be avoided.
|Return|
-
An experimental factor that can, in principle, be set
anywhere within its experimental range for an experimental
trial. Examples are temperature, time, pH, amount of
fertilizer added to the soil, height, weight, and so forth.
|Return|
-
D-optimality is a mathematical technique that is sometimes
useful in producing experimental designs, especially in
nonstandard and irregular (e.g., not hpercubes or hyperspheres)
design spaces. Special purpose computer software is required
to use this method as the computations are far too extensive
to be done by hand. For those who might care, finding
D-optimal designs is an NP-complete problem, so that such
designs are only approximated by the software.
-
The degree of confounding present in a
design. Design resolution refers to the amount of detail --
separate identification of factor effects and interactions --
that the design supports. This is only relevant for
multifactor, not OFAT, experiments.
-
An experimental factor that can be set only at distinct,
separate levels. For example, male and female (in an
experiment on fish behavior); metal, glass, or plastic stirrer
in a chemical experiment; type of soil -- sandy, clay, loam,
gravel (in an experiment on plant growth). Note that
categorical factors can be either unordered (male/female) or
ordered (number of times the rat previously traversed the
maze).
|Return|
-
The amount of information generated by an experimental
design. Equivalently, the precision in the fitted coefficients
of the response surface. Although a complete explanation of
this is rather technical, what it comes down to is a way of
defining the amount of averaging that the design can achieve.
A more efficient design is equivalent to saying that it
generates more information which is equivalent to saying that
the response surface is known with greater precision which is
equivalent to saying that there is less uncertainty in the
conclusions.The important idea is that for a fixed amount of
experimental effort, usually the more efficient the design,
the better.
|Return|
-
The amount of experimental variability that exists, usually
determined from the variability of replicated trials. The
greater the precision, the less variability there is and the
less uncertainty there is in the results, including the fitted
response surface.
|Return|
-
The tendency of an experiment to produce results that
systematically differ from the true results. For example, a
result may be "biased high" if an instrument is
improperly operated. A biased measurement is a measurement
that is either higher or lower on average than it should be.
|Return|
-
These words are used synonomously to refer to the fact that
when experimental trials are repeated without changing the
settings of the factors, the response varies rather than
remaining constant. This is due, of course, to the hopefully
small effects of changes in many uncontrolled factors that
exist in any experiment or measurement. It is never possible
to exactly repeat anything. In order to quantify such
variability -- which is necessary in order to properly assess
how the response depends on the experimental factors --
statistical methods must be used.
|Return|
-
Variables that are deliberately manipulated in an experiment
in order to assess their effect on the response. For example,
in an experiment to assess the effect of various lengths,
diameters, and materials on voltage drop across a length of
wire, the experimental factors are length, diameter, and
material of which the wire is made. The response is the
measured voltage drop.
|Return|
-
The setting of an experimental factor. Typically, in DOE,
continuous factors are "standardized" to the range
from -1 to +1. For example, if temperature is an experimental
factor that is to be varied between 30° and 60° C., then
convert 30 to -1, 60 to +1 and linearly interpolate any value
between (e.g., 50 interpolates to +1/3). This is equivalent to
making a simple linear scale change (like Fahrenheit to
centigrade, pounds to kilograms, and so forth).
Note, however, that with categorical factors, the ±1
standardization can only be done when there are exactly two
categories. When there are more, it makes no sense because it
would convert a non-ordered identifying label (which variety
of 3 seed varieties) to an ordered scale (-1,0,+1). For this
reason, advanced methods must be used to design experiments
with categorical factors having more than two categories.
|Return|
-
(1890-1962) The famous British geneticist and statistician
who originated and developed the foundations of experimental
design. His books, Statistical Methods for Research Workers
and The Design of Experiments are classic. Much
standard statistical terminology -- like anova and randomization
test -- derives from his work. The "F" of the
statistical F distribution (upon which the F test is based) is
named after him.
|Return|
-
A sequential design technique that produces a "mirror
image" of a given design in order to separate confounded
interactions. Generally, this converts designs of resolution
III to designs of resolution IV.
-
A fractional factorial design is a design in which only a
selected fraction of all the possible combinations of the
design factors are run. For the two level hypercube designs,
this means only a subset of all the hypercube corners are
actually run.
-
An experimental design is said to permit hidden replication
when, if some of the factors can be safely ignored, there is
replication in the remaining factors. For example, suppose one
did a 2-factor experiment in which runs at the the four
different combinations (-,-), (-,+), (+,-), and (+,+) were
conducted. If the second of the two factors had essentially no
effect, then, so far as Mother Nature was concerned, a one
factor experiment in which the + and - settings was replicated
twice was actually done. This replication was
"hidden" until it became clear that the second
factor could be safely ignored. Hidden replication is most
commonly used in "screening" experiments with many
(more than 4, say) factors.
|Return|
-
The equivalent of a cube in an arbitrary number of
dimensions. In DOE, hypercubes are usually stanardized so that
all coordinate entries are ±1. Hence a 2-d hypercube is just
the ordinary square with 4 corners at (-1,-1), (-1,+1),
(+1,-1), and (+1,+1). To save writing, the 1's are usually
omitted. Hence, we would give the corners simply as (-,-),
(-,+), (+,-), and (+,+). Using this convention, a 3-d hypercube is just an ordinary
cube with 8 corners at (-,-,-), (+,-,-), (-,+,-), (+,+,-),
(-,-,+), (+,-,+), (-,+,+), and (+,+,+). And so on with 4,5,
and more dimensions. Note that in 2 dimensions, there are 2^2=4 corners; in 3,
there are 2^3=8; in 4, there are 2^4=16; and, in general, in n
dimensions, a hypercube has 2^n corners.
|Return|
-
A 2-factor interaction(2 fi) is the difference in the
response that occurs when both factors are changed
simultaneously from what was expected to occur based on the
effect of changing the factors individually. When the combined
effect is significantly greater than the sum of the
individual effects, it is often called symbiosis;
when it is significantly less, it is often called interference.
Algebraically, a 2-factor interaction is represented by the
presence of a cross product term (factor_1 * factor_2) in the
model. Graphically, 2 fi's are indicated by significant
non-parallelism of the two lines in an interaction plot. An
example of such a plot is provided in the herbicide example.
Higher order interactions -- that is 3 or more factor
interactions -- also rarely may be important. However, these
require more complicated designs with more experimental trials
than are typically used. So in the basic approach followed in
the DOE project, they are not considered.
|Return|
-
An experiment with several experimental factors in which
more than one factor at a time changed.
|Return|
-
All of these terms are used equivalently here and refer to
the "vital few; trivial many" principle. That is, in
any real experiment in which many factors are considered,
almost always, only a very small proportion of them will have
most of the effect. The rest should be treated as the
"trivial many" and considered to be
indistinguishable from experimental noise. So in trying to
build a model(=fit a simple algebraic equation in basic DOE)
to describe the experimental results, one should try to find
one that uses as few factors (= parameters =coefficients) as
possible. That is, one should be as parsimonious in
using coefficients as possible. Occam's Razor refers to the
idea that if several models (theoretical or experimental) do
equally well in explaining what is observed, than the simplest
one (fewest parameters) should be chosen.
|Return|
-
Running the experimental trials in a random order. This is
done to protect against the systematic effects of unknown
non-experimental variables (like environment) that might bias
the experimental results. There are also other ways in which
random assignment is used. For example, in doing clinical
trials to determine efficacy and safety of new drugs, patients
are almost always assigned randomly to the treatment (receive
the drug) vs. the control (receive an inactive placebo) group.
This prevents unconscious biases (for example, assigning
sicker people to receive the drug) from influencing the
experimental results.
|Return|
-
Repeating an experimental trial at constant factor settings
in order to determine the amount of experimental variability.
Since the settings of the experimental factors do not change,
observed variability in the response must be due to the
effects of other, uncontrolled factors that are present
throughout the experiment. This includes measurement factors.
It is important when doing replicates NOT to do them
close together in time under nearly identical circumstances.
Rather, the replicates should be done over the same range on
conditions in which the entire experiment is performed. This
allows all the experimental variability that is actually
present to be observed and quantified.
|Return|
-
Residual are what's "left over" from the
data after a model has been fit. That is, the residuals are
defined as:
- residual = actual data value - value predicted
by fitted model
-
If the model fits well, then all systematic behavior is
predicted by the model and the residuals should look like
random noise. When residuals depart from this behavior and
exhibit systematic trends or dependencies, the model may need
to be modified. This, in turn, may require appropriate design
changes at the next stage of the experimental process.
-
The higher dimensional "surface" of true responses
(that is, absent all extraneous experimental variatibility)
obtained from all possible combinations of settings of the
experimental factors (over their allowable experimental
ranges). Knowledge of this surface is equivalent to a complete
understanding of how the response depends on the experimental
factors.If some or all the factors are categorical, the
"surface" may actually be isolated points, curves,
or other lower dimensional structures.
|Return|
-
A measured experimental outcome. Response variables may come
in many forms. For example, the response in a physics
experiment exploring the effects of different numbers of
windings and currents on the performance of an electromangnet
could be the magnetic force generated. In an industrial
experiment on a chemical process, the response might be the
yield of the product. In an experiment to develop a new cake
mix, the response variables might be taste and texture as
rated by a panel of raters on a 1 to 10 scale. In an
experiment to see what effect height, sex, and distance from
the basket have on foul shooting accuracy, the response could
be the number of baskets made out of ten tries.
A single experiment might have several response variables that
characterize different aspects of the outcome. The key idea is
that there must be some kind of "reliable"
measurement that is made that can be used for analysis of the
results. What is meant by "reliable" is, itself, a
complex statistical issue.
|Return|
-
A broad category of experimental design and analysis methods
based on fitting models which are linear and quadratic
equations in the experimental factors (this includes
cross-terms for interactions). Such purely empirical models
are useful for describing systems behavior, process
improvement, and often increasing understanding so that more
detailed conceptual (mechanistic) models can be developed.
-
A screening design is one in which relatively few
experimental runs are used to efficiently study a large number
of experimental factors to "screen out" those few
that are most active from the remainder that are relatively
inactive over the ranges being considered. Such designs are
very useful in the early stages of sequential experimentation
in order to conserve resources and identify the most
influential experimental factors for more detailed study .
Other essentially synonomous terms for this are
"Resolution III," "Plackett-Burman," and
"Saturated" design.
|Return|
-
Sequential experimentation means investigations that are
carried out in stages so that each successive experiment can
be designed and executed in the light of information gained
from previous ones. This is really a description of a
scientific learning strategy that encourages the efficient
expenditure of limited experimental resources. Although most
experimenters intuitively try to do things this way, there are
specific design and analytical tools in DOE that have been
rigorously developed for this purpose. Some of these
procedures are:
It is important to emphasize that this provides
experimenters a systematic framework -- not merely an artful
philosophy -- in which to execute the strategy. This gives
greater control and improved likelihood of success.
-
|Return|
-
Building and performing complex experimental designs one
stage at a time. Later stages are added only when and if
needed. This conserves experimental resources while yielding
the maximal information at each stage of the assembly process.
-
A method for running experiments in non-random fashion when
not all experimental factors cannot be completely randomized.
This is an advanced topic not covered in the DOE Modules.
|Return|
-
A statistical model is an algebraic equation that expresses
how a response of interest is related to the experimental
factors and the experimental variability. For example,
(I): Resp = K + A*Factor_1 + B*Factor_2 + C*Factor_1*Factor_2
+ random_variability
is such a model. "K", "A", "B",
and "B" are unknown "parameters" or
"coefficients" that must be estimated from the
experimental data. Factor_1 and Factor_2 are the (known)settings
of the the two experimental factors at which the response is
actually measured in the experiment. This model is said to be
linear because the response is a linear function of the
unknown coefficients. This can be a bit confusing, because
roles of unknown coefficient and known variable setting
reverse what we are accustomed to in such equations. For
example, the model
(II): Resp = K + A*[Factor_1]^2 + B*Factor_1 + C*Factor_2 +
random_variability
is also linear for the same reason, even though Factor_1 now
also appears as a squared term. However, the model
(III): Resp= K + sin(A*Factor_1) + B*cos(C*Factor_2 + D) +
random_variability
is said to be "nonlinear" because the response is a
nonlinear function (involving sines and cosines) of the
Factors. Experimental design and analysis for nonlinear models
is an advanced (but nevertheless useful)topic that is not
considered in basic DOE. In fact, in basic DOE only models of
type (I) are usually considered. These suffice for many
applications.
|Return|
-
Methods for improvement based on experiments and analysis
that model the response surface as a "mountain" (in
n dimensions). The fastest way to climb such a mountain --
that is, the path of steepest ascent -- is to go straight up
the sides. By mathematically determining this direction, one
can determine how to change the experimental factors to effect
the greatest possible change in the response.
-
A detailed discussion of this term would fill a book.
However, certainly some of the fundamental ways in which
science is different that philosophy or art or literature
surely must include:
-
- Science is about predicting observable phenomena. Merely
giving explanations after the fact like stock market analysts
explaining yesterday's rises or falls in prices is not good
enough. You must predict what will be observed before
it is observed.
The concept of observable phenomena is also central. This
means that given instructions on how to construct measurement
equipment, anyone who produces the equipment should be able to
measure the "same" results (within experimental
variability). Science is democratic and replicable. That is,
observation should not depend on who we are, what beliefs we
hold, or what salary we make. Of course, the predictions may
be probabilistic and involve a level of uncertainty: we only
know that the likelihood of thunderstorms is higher under some
conditions than others; or that a major earthquake will almost
certainly occur along the San Andreas fault within 100 years;
or that overuse of antibiotics will accelerate evolutionary
development of antibiotic resistant strains of pneumonococci.
Although such predictions involve uncertainty, they are just
as legitimate science as, say, the prediction of a space
shuttle's orbit.
- Predictions are made by the development of scientific models.
Science is not about discovering eternal truths;
rather, it is about developing models from which precise and
accurate predictions can be made. On the most fundamental
level, science does not discuss truth or the underlying nature
of reality at all -- this is the realm of philosophy. All
scientific models, even famous ones like the Newtons's laws of
gravity, the Theory of Relativity, or the Theory of Evolution,
can be flawed or incomplete in some respects but still be
useful to make predictions within certain defined realms.
Another way of saying this, is that all scientific models are
falsifiable, but none can be proven (unlike mathematics).
There may always be another consequence that observation will
contradict.
- Models are usually, but not always, quantitative and
expressed mathematically. One can broadly distinguish two
overlapping kinds of scientific models: mechanistic
or conceptual models, in which some kind of theoretical
construct is used to develop the model; and empirical
models which are based exclusively on observed data (and use
statistical analysis to develop predictions of what will be
observed in the future under other conditions). Overlap
occurs, because extended observation usually motivates the
development of conceptual models, and conceptual models must
always be criticized (i.e. put to the test)by
real data.
- Because all science involves observable phenomena, the
inevitable presence of some uncontrolled variability in all
observations means that no scientific observation is exactly
known -- "we see through a glass darkly." All
scientific observation therefore involves uncertainty, and the
uncertainty must explicitly and quantitatively be dealt with
as part of the process of scientific learning. Contrast this
with philosophy or religion or literature, for example.
|