Teaching American History
Join our Email list
TAH Evaluation Newsletter
GPRA Indicators
Evaluation Resources and Technical Assistance Staff Only
TAH Program website
About Us News Evaluation Resources Direct Assistance Events Discussion Network Home
Search
space space space
General Information space
Topics space
Experimental & Quasi-Experimental Resources space
Tool Kits & Web-based Courses space
Past Teleconferences space
space space

 

4.0 Glossary of Evaluation Terms

 

 

 

 

The following is a list of terms that you will need to know when planning and implementing your own
experimental or quasi-experimental program evaluation.

 

Experimental design

An experimental design is one in which the participants to the intervention group and the control group

(the group that does not receive the intervention) are assigned randomly. This is often called the “gold

standard” in evaluation research because it is the best way to isolate program effect and cut down on

biases that might confound evaluation results. It is also often the hardest design to achieve in real world

 settings, such as the classroom. Experimental evaluations are best used in the following environments:

(1) for programs not in early stages of development (e.g., you must have some idea about the kinds of

questions to ask), (2) in environments that are conducive to randomization (e.g., where you have the

ability to randomly select students or teachers for program activities), and (3) as long as ethical

considerations have been met (e.g., one group is not harmed by exposure or lack of exposure to the

intervention).

Intervention
An intervention is what the intervention group will receive. It is the program whose effects you wish to

measure.

Intervention group
Intervention groups are the groups within your sample that will be or have received the intervention (e.g.,

have participated in the program in question). Outcome measures from intervention groups are typically

compared with the outcome measures of one or more control groups.

Matched comparison
If a random control group cannot be selected, evaluators may choose a matched comparison group design.

In the matched design, an intervention group is typically selected first, then a control group is purposively

selected that will not receive the intervention and that matches identified characteristics of the intervention

group. Knowing which characteristics to match for is important with this design, which because of this is subject

to selection bias.

Multiple regression
Multiple regression is a statistical technique for estimating the effects of several predictors (variables/measures)

at one time.

Population
In evaluation research, sample groups are selected from target populations. The sample is a select group that

you want to be as representative of your population as possible.

Pre-post comparison
A pre-post comparison is a quasi-experimental design in which one group is measured before the intervention

has been administered and after the intervention has had time to take effect. The main drawback of this design

occurs when the target group cannot be controlled for other effects that might impact the before and after measurements.

Program effect
Program effect is the measured change that can be attributed to the intervention. This is ideally what you want to

measure in an evaluation is the size of the effect of the intervention in your sample group. The purpose of an impact

assessment is to isolate and measure program effect.

 

Quasi-experimental design
An evaluation in which the intervention and control groups cannot be selected by random assignment is a quasi-experimental

design. In quasi-experimental designs, participants who receive the intervention are compared with a non-random control

group. Quasi-experimental designs might include a pre-post comparison, a matched comparison, or a time-series design.

Random assignment
Experimental designs require random assignment of evaluation participants to both intervention and control groups.

In random assignment, every participant in a target population should have the same probability of being selected

for either group. Random assignment is the ideal way in evaluation research to isolate program effect, but in some

settings random assignment is not possible, in which case you may choose for a quasi-experimental design.

Reliability
Reliability is the extent to which a measure produces the same effects when measured repeatedly or by some other

researcher.

Sample
A sample is a select group identified for participation in an evaluation, selected from a larger population. In rigorous

program evaluations, researchers want to reduce the possibility of biases that might skew the results of an impact study

by using randomized or rigorous comparison sample selection techniques.

Selection bias
Selection bias results from systematically over- or under-estimating program effects as a result of not controlling for

differences between intervention and comparison groups. Random sampling and other methods of rigorous sampling

techniques are used to control for selection bias.

Time-series design
The time-series design is a reflexive control design in which one single measurement is taken of a target group before

the intervention has been administered and multiple measurements are taken of the same target group after the

intervention has had time to take effect. Time-series designs do not have to measure the same respondents, but

should measure the same target group multiple times after an intervention occurs. This design is good for measuring

long-standing trends.

Type I error
Type I is a statistical error that occurs when a program effect estimate is found to be statistically significant, when it is not.

Type II error
Type II is a statistical error that occurs when a program effect estimate is not found to be statistically significant, when it is.

Validity
The term validity in evaluation research refers to the extent to which a measure (a variable) actually measures what you intend

to measure. There are many different kinds of validity that are important to control for. Validity can be controlled for by either

choosing a random sample or by using several different measures for the same concept. To control for validity, researchers

should ensure that evaluation questions clearly correspond to the criterion being measured; when in doubt, use several

different sources or methods for one concept; and make sure that evaluation questions taken together cover the wide range of

possibilities of the target concept.

 


QUICK LINKS:

Join our email list | TAH Evaluation Newsletter | GPRA Indicators | Staff Only | TAH Program website

About Us | News | Evaluation Resources | Direct Assistance | Events | Discussion Network | Home