Teaching Materials
Ask a Master Teacher
Lesson Plan Gateway
Lesson Plan Reviews
State Standards
Teaching Guides
Digital Classroom
Ask a Digital Historian
Tech for Teachers
Beyond the Chalkboard
History Content
Ask a Historian
Beyond the Textbook
History Content Gateway
History in Multimedia
Museums and Historic Sites
National Resources
Website Reviews
Issues and Research
Report on the State of History Education
Research Briefs
Best Practices
Examples of Historical Thinking
Teaching in Action
Teaching with Textbooks
Using Primary Sources
TAH Projects
Lessons Learned
Project Directors Conference
Project Spotlight
TAH Projects
Technical Working Group
Research Advisors
Teacher Representatives
Quiz Rules
Teaching History.org logo and contact info

The Critical Role of Assessment in Evaluating TAH Projects

Evaluation requirements for Teaching American History (TAH) grants have varied from year to year, but in some years, grants were required to utilize experimental or quasi-experimental designs. This requirement helps the federal government and local project directors learn if the TAH projects are accomplishing what they ultimately are expected to accomplish. Measuring the ultimate expectations, or outcomes, is a crucial part of the evaluation design. For many projects, outcomes are expressed as gains in student achievement as a direct result of TAH projects, so the assessment of student achievement can be the linchpin of an evaluation design.

The question then remains, "What is a valid assessment?" The key to answering that question, based on my experience evaluating several TAH grants, is the extent to which the assessments used in the evaluation are "content valid," or aligned to the content and goals of the project. There are many excellent and well-researched assessments available to TAH evaluators. For example, the National Assessment of Education Progress (NAEP) publishes test items, including history items, on their website, which are in the public domain.

However, a second validity question must be asked: "What are my assessments valid measures of?" Content validity requires that the assessments used in evaluation are accurate and reliable measures of the specific entity being evaluated—in this case, student achievement of the content taught in the TAH project. If assessments are not aligned to the content and goals of the project, we will never be able to demonstrate that the project is successful, even if the project is highly successful.

Do we use nationally validated assessments [. . . ], or do we develop our own assessments [. . . ] aligned to the specific goals and content of our project?

Now we are faced with a dilemma. Do we use nationally validated assessments (assessments that have substantial research behind them), or do we develop our own assessments that are not nationally validated but are aligned to the specific goals and content of our project? I may be able to answer that question with two examples of TAH grant projects that I evaluated as an external evaluator.

National Versus Custom

Project A and Project B are similar in that both involved K–12 teachers in summer institutes where they learned history content that they delivered to their students once they returned to the classroom in the fall. Both projects maintained that student knowledge in American history would increase as a result of the project. Both projects were implemented as planned, there were no significant setbacks, and participant feedback was positive in both projects. Both projects employed a quasi-experimental design for evaluation, and by doing so matched control groups were obtained. By using control groups, we could determine if student knowledge of the history content increased more with students who were taught the history content gained in the summer institutes compared to those students who did not receive such instruction. The assessments used in Project A to measure American history content were NAEP items chosen from the NAEP website. The NAEP items chosen were as closely aligned to the content of the project as possible. These items were technically sound since they have been used nationally in NAEP assessments, but clearly were not designed for TAH evaluation.

Project B went a different route in arriving at assessments. During the summer institutes, teachers were required to write at least five test items that were related to the content of the history instruction received during the institute. The items were organized into a Table of Specifications so that when all items were received, we could see how much they covered the topics presented in the institute and how much they covered the full range of thinking skills—from rote memorization to analysis to evaluation skills. These items were edited qualitatively, assembled into a test for elementary, middle school, and high school versions, and administered to a pilot group of students. The pilot tests were scored and each item analyzed quantitatively for item difficulty and discrimination. Test reliability was also determined from this quantitative analysis. Poor items were revised or eliminated and a final test was created that could be administered in one class period.

The statistical analysis from Project A and Project B are revealing. The pretest and post test means for both the project group and the control group for Project A and Project B are displayed in the tables below. For both projects, the data was for the 2006–07 school year.

Table 1

Project A

Middle School
High School
  Project Control   Project Control
Pre Test 51.9 50.3 Pre Test 49.9 46.5
Post Test 49.0 50.0 Post Test 49.7 53.9


A statistical analysis was conducted to determine if the gains from pre-test to post-test with the Project Group were statistically different than the pre- to post-test gains of the control group. With both the middle school students and the high school students, the gains from pre- to post-test were not statistically different than those of the control group. In other words, we could not statistically demonstrate that the project had any effect on student achievement beyond those effects realized by students who were not associated with the TAH project. The primary conclusion drawn by the evaluator and the project team was that the NAEP assessments, although validated and used nationally, were not valid measures of the content delivered in Project A.

Table 2

Project B

Middle School
High School
  Project Control   Project Control   Project Control
Pre Test 48.1 50.1 Pre Test 51.0 53.2 Pre Test 49.2 50.5
Post Test 54.0 45.6 Post Test 47.9 50.4 Post Test 51.6 45.9


The same statistical analysis conducted for Project A was conducted with Project B. In the elementary population and the high school population, gains from pre-test to post-test were realized in the project group but declines from pre-test to post-test were realized in the control group. In the middle school, nominal declines were realized in both project and control groups. However, the gains of the project group over the control group in elementary and high school were statistically significant at the .01 level of probability. That means that we can be 99% sure that the project group gains over the control group were due to the intervention of the project, which was TAH instruction. Even though the middle school showed declines from pre- to post-test, the difference in the declines between project and control group was not statistically significant. The primary conclusion drawn by the evaluator and the project team about the project gains was that the assessments were sensitive to instruction. Not only that, but teachers in the project group graded the assessments since they were part of and sensitive to instruction. That was not the case in the control group, since TAH instruction was not provided.

These two examples suggest to us that when we attempt to determine ultimate effects of TAH projects on student achievement, the assessments used to measure those effects are a crucially important part of the evaluation. If properly aligned, assessments can tell a true story of ultimate effects. If not properly aligned, even if nationally recognized, assessments fail to provide the information needed to determine if a project has met its desired and expected outcome.