A. Grace Martin

Author, Student Teacher, Optimist and Promoter of Self-Empowerment


Validity of Standardized Tests and Solutions for Change

Posted by agracemartin on March 5, 2015 at 11:30 AM

I wrote the following paper for my Evaluation of Student Learning class on the topic: Assessment of Validity in Standardized Tests.

Validity of Standardized Tests and Solutions for Change

Validity of Assessments

Authenticity arguments improve our assessment instruments to be both reliable and valid (Winke, 2011; Chapelle, 1999). Reliability means that the assessment results are reproducible and repeatable (Davies, 2011; Chapelle, 1999). Validity, according to Chapelle (1999), is the overall quality and acceptance of an assessment, including concurrent validity—which measures the same skills and knowledge as other assessments—and predictive validity to predict future performance or skill development.

Standardized testing may or may not be valid. Wiggins (1993) believes that conventional test design assumptions are false because they are based on being able to break knowledge down into elements and being able to know a particular concept in absolutely every context. Standardized tests do not assess whether all students everywhere have the same “knowledge” because genuine intellectual performance is individualized (Wiggins, 1993). A test’s reliability, concurrent validity, and predictive validity can be quantitatively measured, though the statistics of these tests shows a narrow perspective; teacher’s opinions are an important component in determining exam validity (Winke, 2011). For a particular language exam, teachers disagreed with the time dedicated for testing, inappropriate length and difficulty of the test, and the singling out and social labeling of ELL students (Winke, 2011). Many teachers said, “the test stressed and frustrated some students, made them feel inadequate, humiliated, or embarrassed, or led them to question their self-worth” (Winke, 2011). Valid tests must be developmentally appropriate, fair, feasible, and practical for students as well as statistically reliable, concurrently valid, and predictably valid (Winke, 2011).

On the contrary, Slomp et al. (2014) argues that content, concurrent, and predictive validity evidence have failed; we now call upon construct and consequential validity evidence. In standardized tests, complex constructs, such as writing ability, are less likely to be assessed completely (Slomp et al., 2014). Studies found that “class time was diverted from regular instruction to focus on test preparation [for] low-level skill-and-drill work rather than higher-order literacy skills; teachers fell substantially behind in their course material; and teachers felt compelled to prepare students for the test even though they questioned its usefulness and validity” (p. 294). Test standardization stunts the growth of innovative teachers but reinforces veteran teacher strategies that lack diversity (Slomp et al., 2014). Tests constrained writing as a construct by ignoring the importance of differentiated assessment (collecting multiple evidences of student learning over time), limited pedagogical diversity by encouraging convergent thinking, and marginalized students and teachers by undermining diversity in the classroom (Slomp et al., 2014).

Not everyone agrees on what it means to have valid assessment. Davies (2011) says that validity is the extent to which the evidence from several sources aligns with the learning objective. While the above articles argued the validity of standardized tests, a more practical approach can be taken in classrooms:

"Evidence of learning needs to be diverse because it requires performance and self-assessment or reflection to demonstrate application and the ability to articulate understandings. This means that written work or test results can never be enough. Observing application of knowledge, listening to students articulate understandings, and engaging students in demonstrating acquisition of knowledge can be valid evidence." (Davies, 2011)

Davies (2011) argues that triangulation of evidence increases both reliability and validity. Triangulation involves observations, conversations, and collecting products (Davies, 2011). In triangulation, standardized tests only make up a small portion of summative assessment under the category of collecting products, and therefore are not entirely valid.

My Personal Connections

My grade 9 Social Studies teacher instructed our class heavily on Russian history, the implications of communism in the USSR, the industrial revolution, and the implications of capitalism on North American society. I remember studying dutifully; I could have been assessed as understanding all course content with impeccable detail. Yet, when I wrote my PAT (Provincial Achievement Test) it focused on applying ideologies to made-up economic situations. Everyone was frustrated: our teacher had thought that she had prepared us as well as she could, while my classmates and I felt like we had not been assessed on what we had learned. The standardized test caused a lot of anxiety and confusion. I ended up feeling that school put an immense pressure on answering as many questions correct as possible, making me stop caring about what I had learned. Was the time taken to learn content and facts wasted? While I value the history that I learned, I certainly felt as though Alberta Education did not.

Many of my teachers since have revolved their instruction around exam content. Ironically, I have not learned subject matter as in-depth in such courses when compared to my excellent teacher with passionate and engaging differentiated instruction. I have found that both high school students and teachers put far too much emphasis on test results. I placed a huge priority on my grade 12 Social Studies diploma, but afterward intentionally forgot all course content. If I had been re-assessed later, I doubt I would have received honors. I do not even remember what we covered. I do remember getting an 83% in grade 9 and a 95% in grade 12, but I believe that I learned more in the former and was not validly assessed in either. I do not feel that standardized testing promoted the longevity of my learning.

As a tutor, I have worked with students who experience test anxiety. In tutoring sessions, one girl seemed like a 70 percentile student and yet her quiz and exam grades were failing. I encouraged her to ask her teacher for an oral interview. As a student teacher, I watched my teacher associate give oral re-tests. Without the writing component and formal situation, many students were able to verbally explain their understandings. Yet we expect students to perform well on standardized tests even if it disadvantages them. Why do this?

My Conclusions on Test Validity

I do not think that standardized tests are valid forms of assessment. Concurrent validity is meaningless to me because I do not care if a test can measure the same knowledge as another test—that way of thinking justifies a cyclic loop of poor tests. I like the idea of predictive validity to gain insight into future skill development, but why would I want to label my students’ future success based upon past exams?

I believe in Winke (2011)’s consideration of teachers’ professional opinions as important in assessment validity. Standardized tests encourage teachers to focus on getting students to achieve high marks instead of focusing on student learning and assessing students fairly. As Slomp et al. (2014) suggested, I think that teachers are pressured to focus on testable skills, narrowing their assessment and limiting student choice in demonstrating their knowledge. Not every student has an equal opportunity to perform well on standardized tests, which can have negative psychological impacts on students.

Valid assessment must include more collections of evidence than just examination statistics. I identify with Davies (2011) triagulation of observational evidence (such as watching the scientific method being applied during an experiment), conversational evidence (class discussions, student-teacher interviews, peer feedback, written conversations, and group work records), and collection of products (summative projects, exams, quizzes, and assignments, as well as formative assessment notebooks, journals, photos, student worksheets, graphs, and work-in-progress portfolios). To me, triangulation of evidence is the most valid form of assessing a student’s knowledge and understanding—something that standardized test results cannot fully convey.

Implications For My Assessment Practice

I plan on teaching high school physics, which means that I unfortunately cannot prevent standardized diploma exams from influencing my students’ grades. To work around this barrier, I intend to incorporate PAT-like questions into my regular assignments. While I will still require students to show all of their work, I can structure the practice problems in either multiple choice or numerical response formats. I hope that familiarity with the standardized format will alleviate some anxiety in this situation. I can validly assess student coursework through triangulation of evidence from multiple sources such as homework checks (for observed participation), experimental lab reports, assignments, projects, quizzes, unit tests, and presentations.

If I am pressured for higher student test achievement, I could compact my year-plan timeline to include a full week of review and test preparation at the end of the semester. I want to avoid this because I do not wish to transfer excess pressure onto my students. As well, this time could be better used to delve deeper into the key learning objectives. I will try to avoid teaching for the test, and instead strive to focus my attention on the learning processes of my students. This approach could arguably help my students do better on exams anyway, since their learning experiences have been richer. I feel that validity must help educators improve their assessment instruments, and that frequent assessment and feedback will enable learning for higher achievement in my future students.


Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. doi: 10.1017/S0267190599190135.

Davies, A. (2011). Making classroom assessment work. [Book]. Solution Tree. 555 North Morton Street, Bloomington, IN 47404.

Slomp, David H.; Corrigan, Julie A.; Sugimoto, Tamiko. (2014). A Framework for Using Consequential Validity Evidence in Evaluating Large-Scale Writing Assessments: A Canadian Study. Research in the Teaching of English, 48(3), 276-302.

Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75(3), 200-08.

Winke, P. (2011). Evaluating the Validity of a High-Stakes ESL Test: Why Teachers' Perceptions Matter. Tesol Quarterly, 45(4), 628-660.

Categories: Teaching Blog

Post a Comment


Oops, you forgot something.


The words you entered did not match the given text. Please try again.

Already a member? Sign In