EP 300 EDUCATIONAL MEASUREMENT AND EVALUATION

Introduction

This course introduces students to the concept of educational measurement, monitoring, assessment and evaluation as they are used in education system. Specifically the course intends to equip students with basic knowledge and skills to;

i) To measure and assess educational outcomes

ii) Construct different types of test items and measurement scales

iii) To administer examinations, score and grade examination results

iv) To judge critically the practice of examinations in Tanzania

v) To summarize examination results

vi) To analyze the use and misuse of examinations

MODULE ONE

1. BASIC CONCEPTS IN MEASUREMENT AND EVALUATION

In obtaining basic information with regard to classroom instruction and instructional decisions, the following basic terms are used namely testing, measurement, evaluation and assessment.

Test

It is an instrument or device usedto gather/collect information about students’ achievements or other cognitive skills. It is a set of questions set by the teacher to measure a sample of behavior of students. It is like a balance used to obtain weight or a foot used to obtain the height of an object or person.Test determines the degree to which and individual student perform in comparison with other students.

Test can either be a

i. Diagnostic test which is used to determine areas of difficulties encountered by the learner to enable the teacher to take corrective measures

ii. Aptitude test it is a test of a person’s ability to learn a task or to perform a task

iii. Achievement test it is a test given to the learner to determine how much the learner has learned

An effective teacher will make use of test results to make instructional decisions. Some of the instructional decisions are

i. Determination of the appropriateness of the teaching plans

ii. Grouping students for effective learning

iii. Determining learning difficulties that students face

iv. Identifying students who are underachieving

v. Determining the effectiveness of instruction

vi. Identifying students who have a poor self-understanding

vii. Identifying students who are in need of special assistance

viii. Guiding and counseling students to choose a career

ix. Selecting and promoting students from lower level to the higher level e.g from primary school to secondary school.

Types of tests

Informationthat teachers collect and use in their classrooms comes from assessment procedures that are either standardized of nonstandardized.

Standardized testis a type of test which is administered, scored and interpreted in the same way for all students across schools, district, state or nation. They are administered to students in many different classrooms but always under identical conditions of administration, scoring and interpretation. The main reason for standardizing testing is to ensure that the testing conditions and scoring procedures have similar effect on the performance of students in different schools and states

Examples of Standardized Tests

Aptitude Test. The aim of this test is to measure general mental capacity or efficiency to learn a given content. They are used to predict future performance. These tests include language proficiency tests, mathematics, and comprehension. Examples of these tests include Potential Ability Test (PAT), matriculation examination, mature age entry examination.

Achievement Tests.The measurement of students’ achievement of Objectives of instruction in a given course.

Intelligence tests. The IQ (Intelligence Quality) tests areas such as reading and reasoning. Such important tests include Stanford Binet intelligence test. For instance if you want to be recruited to ITV or TBC or any Radio, you will be subjected to an interview that requires you to read to listeners clearly, so that your ability can be identified.

Nonstandardized( teacher made) tests

Are developed for a single classroom with a single group of students and are not used for comparison with other groups. Nonstandardized test are prepared by the subject teacher

Measurement

Measurement is the process of quantifying or assigning a number to a performance or trait. It involves the process of assigning numerical scores to the performance of a quiz or test. Therefore numerical scores are used to represent the individual performance or trait. Or it is a process of assigning numbers to tests according to specific rule such as counting the correct answers. When we measure we want to answer the question. How much a student has performed?

Evaluation. Is the process of attaching value judgment to a performance or measure.It involves the judgment of whether the student is performing high or low level. For example: Alfera who scored 70 out of 100 in mathematics was above average, showed steady progress, is a bight student. Evaluation aims at answering the question: how good? Evaluation is more comprehensive to include testing, measurement and non-formal observation. Evaluation does not always base on measurement. Sometimes evaluation can base on the information acquired from non-measurement techniques of evaluation like observation.

The functional role of evaluation procedures in the classroom context are as follow:

Placement evaluation: this is concerned with student’s entry performance and intends to check if the student posses the necessary knowledge and skills to begin the programme. It can also suggest skipping some topics which are well known to the students. Placing student in a special classor placing a student in a more advanced course of study.

Formative evaluation

This is the continuous evaluation of classroom teaching in order to find out if students are following well the lesson. It is carried out during instruction for the purpose of improving or monitoring learning progress. The purpose is to provide continuous feedback to the parents and students and teachers with regard to success or failure.

Diagnostic evaluation

This is a highly specialized procedure aimed at detecting learning difficulties that are left unresolved during formative evaluation. When a student continues to perform poorly in any learning task, a more detailed diagnosis is needed. The purpose of diagnosis is to find out the causes of recurring learning difficulties so as to formulate a plan for remedial actions.

Summative evaluation

Summative evaluation is done at the end of the unit of instruction or at the end of the course so as to determine if the instruction objectives have been achieved. Summative evaluation is used for grading courses, certifying student’s mastery of intended learning outcomes, judging the appropriateness of course objectives and effectiveness of school programme.

Purpose of Evaluation

Generally the purpose of evaluation is to determine the performance i.e skills attainment and knowledge learned in the educational program. Specifically the purpose of evaluation is to:

i. Monitor/check students progress

ii. strengthen desired outcome and behavior

iii. to guide students to choose a career

iv. to group and place students

v. aid in curriculum /instruction improvement

vi. Assess teachers’ effectiveness and efficiency

Effectiveness is the degree to which the teacher achieves the goals

Efficiency refers to the achievement of the goal at the lowest cost

Assessment

Assessment is the process of collecting, synthesizing, and interpreting information in order to make a decision. Testing, measurement and evaluation often contribute to the process of assessment. There are different meanings of educational assessment.

Assessment means sitting beside a learner in an attempt to help the learner understand what he/she knows or is able to do. Assessment is a process of gatheringdata by interacting with the learner in order to understand his/her needs. It also simply means talking to a learner in order to understand learner’s needs or problems.

Assessment is judging. It is a process of determining the degree to which a learner has attained some standard or level of achievement. The assessor/teacher may require the learner to answer specific questions or perform specific task depending on what that standard is.

Assessment is coaching. Here it means that assessment is there to help the learner achieve a specific objective. The assessor observes the learner and provides some direction on how to proceed. At the same time the assessor collect information about what the learner knows and can do and also where the learner has difficulty or may need more instruction. The main argument of the coaching metaphor is that assessments occur as part of the learning process.

How Do Teachers Gather Assessment Information?

Airasian and Russell (2008) have identified 3 ways through which teachers collect information for making decision. These are student product/work, observation and oral questioning

i. Student product includes homework, written assignment, essays, science project etc. These products provide teacher with information about students’ cognitive skills.

ii. Observation it involves watching or listening to students as they carry out activity or respond in a given situation. Observation helps teacher to understand behavior of students such as:

· mispronunciation of words in oral reading,

· interacting in groups,

· speaking out in class, bullying (or annoying) other students,

· losing concentration,

· having puzzled look on their faces,

· raising their hands in class ,

· Failing to sit still more than three minutes.

Observations can be formal or informal.

iii. Oral questions are used to collect formal and informal information about student. Oral questions are used by teachers during instruction so as ;

· to review a prior topic,

· brainstorm the new one,

· find out how the lesson is understood by students,

· Engaging a student who is not paying attention.

The teacher can gather information without breaking the lesson.

Types of assessment

There are three types of assessment.

i. Diagnostic assessment or Early Assessment. This is carried out before instruction to determine readiness or entry behavior. The collection of information is normally based on informal observation. The type of information collected is based on cognitive, affective and psychomotor domains.

ii. Formative assessment which is carried out during instruction for the purpose of improving or monitoring progress. The collection of information is based on both formal and informal observation and student papers such as tests, homework, project . The type of information collected is largely cognitive and affective and it is kept in writing.

iii. Summative evaluation. This is carried out at the end of instruction/program for the purpose of grading, selection, placement, certification and evaluation of the achievement or not of instructional objectives. The information gathered is mainly based on cognitive domain.

Purpose of assessment

1. To establish a favorable classroom environment that support students learning through helping students to interact one another, respect one another, cooperate and follow school rules and regulation.

2. To plan and conduct classroom instruction. Conducting instruction/teaching process requires constant assessment and making decision. E.g. when student seem not to understand the lesson or they seem to be bored, the instructor has to make decision on how to solve the problems to make learning proceed.

3. To place students to the group which they fit e.g higher reading group, middle reading group, fast learners and slow learners in order to assist them.

4. To provide feedback to the students and their caregivers. Observation and feedback is intended to modify and improve students’ learning

5. To diagnose student learning difficulties and disabilities so as to help them learnby carrying remedial teaching or make accommodations or referring for more specialized diagnosis and intervention.

6. Summarizing and grading academic learning and progress. The act of making final decision about students learning at the end of instruction is termed summative assessment. Much of a teacher’s time is spent on collecting information that will be used to grade students or summarize their academic progress.

The Uses of Classroom Assessment

Classroom assessment is used to monitor student progress by various education stakeholders at various levels namely:

National and State Policy makers

Classroom assessment assists national policy makers in:

i. Setting state and national standards of performance

ii. Developing policies based on assessment

iii. Tracking the progress of national achievement in education sector

iv. Providing resource to improve learning

v. Providing rewards or sanctions for student, school and state achievements

School Administrators

i. to identify program strengths and weaknesses

ii. to plan and improve instruction

iii. to monitor classroom teachers

iv. to identify instructional needs and programs

v. to monitoring students achievement over time

Teachers

i. to monitor students progress

ii. to judge and change classroom instruction

iii. to identify students with special needs

iv. to motivate students to do well

v. to place students in groups

vi. to provide feedback to teachers and students

Parents

i. to judge strengths and weaknesses of students/program

ii. to monitor student’s progress

iii. to meet with teachers to discuss student classroom performance

iv. to judge teacher’s quality/effectiveness

Objectives

Objectives help a teacher to focus on what is important and what a teacher wants to accomplish. They also describe the kind of contents; skills and behaviors teachers hope their students will develop through instruction. Other names of objectives are instructional objectives, learning targets, educational objectives, behavioral objectives, student’s outcomes and curriculum objectives. There are three levels of objectives

i. Global objectives, they are also called goals. They are general and broad.

Because of their breadth they are not used in planning classroom instruction and assessment.

ii. Educational objectives. They are more specific than global objectives

iii. Instructional objectives are the most specific type of objectives they are used in planning classroom instruction and assessment

Instructional Objectives and Evaluation

Instructional objectives are statements prepared by a teacher that describe specific behavior or a pattern of behavior that a learner is expected to demonstrate as a result of a lesson or series of lessons. Pupil’s behavior or achievement should be observable and measurable. It is a precise description of the competencies the learner is expected to develop from the instruction (Ondiek, 1986).

These statements clearly define the desired learning outcome that is expected from teaching. Learning outcomes are learning products a student is expected to demonstrate as evidence that learning has occurred. Therefore during and after instruction, the teacher determines the extent to which instructional objectives are being achieved.

Instructional objectives are stated in terms of what we expect the learner to be able to do or express at the end

The purpose of instructional objectives is to identify what students are expected to learn in order to help teacher to:

i. communicate to others the purpose of instruction

ii. provide direction for the teaching processe.gselect appropriate instructional methods and materials

iii. form the base for evaluating student’s learning achievementi.e help the teacher to plan assessment that will allow him/her to decide whether or not students have learned desired content and skills that are the focus of instruction.

Characteristics of good instructional objectives

i. They are related to intended learning outcomes of instruction

ii. They are concerned with students i.e. they describe students performance and they are stated in terms of what student is to learn from instruction

iii. They are specific because they describe the student’s actual action/behavior/skills that can be observed, measured, instructed and assessed.

iv. Indicate the condition under which the behavior must be performed and the level of performance the students must show.

e.g Given a map of Tanzania, students should be able to shade all major lakes correctly.

· Given a map of Tanzania – condition under which the behavior is to be demonstrated

· Shade all major lakes –behavior expected is to be demonstrated

· Correctly – standard or level of performance

v. They are derived from general objectives which are stated by using general verbs like understand, appreciate and know which are not measurable.

vi. They are stated by using action verbs that indicate observable response that can be evaluated by another person

vii. They have time bound

Components of Instructional objectives

Instructional objective is made up of two parts

i) The behavioral part which specifies expected student behavior or achievement as a result of instruction.

ii) The content part which specify the context of which the behavior in the objective is to operate.

Example. At the end of the lesson students should be able to list three types of communicable diseases.

Behavior: to list

Content: Communicable diseases

CLASSIFICATION OF EDUCATIONAL OBJECTIVES

Benjamin Bloom and his colleagues in 1956developed three taxonomy of educational objectives or learning domain/behavioral domains in order to promote higher forms of thinking.The educational objectives are divided into three major areas of behavior called

i. The cognitiveobjectives because the behavior involved in this skill deal with recalling knowledge and applying intellectual skills

ii. The affective objectives because the behavior involved in this objective deals with attitudes and values.

iii. Psychomotor objectives because the behavior involved in these objectives deal with motor skills.

The objectives are described using narrower, more specific verbs (action verbs which can be evaluated at the end of the lesson).

These behavioral domains are sub-divided into sub-categories which are arranged from simple to complex. The student s are supposed to demonstrate mastery of each skill at the lower order before they move to the more advanced skills.

For example in the cognitive domain teacher should focus on helping students to remember information before they understand it, helping them to understand before they can apply it to a new situation. Each skill in the taxonomy represents a building block to the next.

These categories and sub-categories of learning domains are useful in preparing lesson objective and identifying learning outcomes.

Cognitive Domain

Cognitive skills are used by teachers to determine the level of thinking their students have achieved. The skills are ranked on a continuum from lower order to the higher order thinking. The knowledge level of the taxonomy represents lower level thinking because it focuses on memorization or recall of information learned while the rest levels in the taxonomy represent higher level of thinking and reasoning that calls for students to carry out thinking and reasoning process more complex than memorization.

There are six major categories of cognitive processes which are listed in order starting from the simplest to the most complex.

Competence	Skills demonstrated
Knowledge	Deals with recall of information or remembering previous learned information Example. knowledge of dates, events, specific facts, procedures, concepts, methods, principles, terminologies, names, prices of commodities, formula, places, major ideas ,mastery of subject matter Question Cues list, define, mention, name, label, collect, tabulate, what, who, when, where.
Comprehension	Deals with understanding of information or grasp of meaning, Example. translate knowledge into new context, interpret facts/materials, compare, contrast, order words or items, group items, infer causes and predict consequences, paraphrase, rewrite, explain ideas, summarizing information Key words. Summarize, interpret, contrast, predict, differentiate, estimate, reasoning, brainstorm, expand, explain, qualify, and propose.
Application	Use of information to a new situation, Example. Applying methods, procedures, theories, concepts in the new situation, Solving problems using required skills or knowledge learnt Constructing charts or graphs Key words. apply, demonstrate, calculate, complete, illustrate, show, examine, organize, modify, relate, classify, change, and discover, experiment, compare and contrast, construct, exercise,
Analysis	Separate materials or concepts into component parts Example -understanding the organizational structure. -recognition of organization principles involved -analysis of the elements - Uncovering unique characteristics, - recognition of unstated assumptions Key words. analyze, identify facts, separate, classify, select, compare, arrange, distinguish, differentiate,
Synthesis	Deals with building a structure or pattern from diverse elements Examples putting different parts together to form the whole Formulating the new patterns or structure, relate knowledge from several areas, predict Writing a creative story, poem, or song. Integrating the learning from different areas into a plan for solving problem Key words. Combine, integrate, modify, rearrange, substitute, plan, create, design, invent, compose, formulate, prepare, generalize, rewrite.
Evaluation	Making judgment about the values of idea or materials Key words. Examine, assess, appraise, critique, criticize, defend, evaluate, justify, support, relate, rank, discriminate, justify, rate, , weigh, decide, determine, summarize.

Affective Domain of Learning

Affective domain of learning consists of change of values, attitudes, and way of feeling, outlook, interests, emotions, appreciation, motivations and preferences. Affective Taxonomy is based upon the degree of a person’s involvement in an activity e.g. classroom activities or idea.

There are five levels of affective learning domain which are organized from simple to complex as shown in the table below;

Competence	Skills Demonstrated
Receiving Phenomena	Willingness to hear/listen/select and to pay attention. For example. The ability of student to listen to others with respect, ability to remember the name of newly introduced people. Key verbs: choose, describe, follow, give, hold, identify, locate, name, point, select, use, reply etc.
Responding	Active participation on the part of the learner, attend and react to a particular phenomena Example. Learners participation in class discussion, giving presentation, questioning new ideas, concepts, models, theories, principles in order to fully understand them Key verbs : answer, assist, discuss, greet, help, label, perform and practice
Valuing	The worth or value a person attaches to a particular object, phenomena or behavior. This ranges from simple acceptance to the more complex state of commitment. Example. Demonstrate belief in democratic process, informs the school management on matters that one feels strongly important. Key words: Complete, demonstrate, differentiate, explain, follow, form, initiate, invite, join, justify, propose, read, report, select, share, study, work,
Organization	Organize values into priorities by contrasting different values, resolving conflict between them and creating a unique value system. The emphasis is on comparing, relating, and synthesizing. Key words stick on, alter/adjust, arrange, combine, compare, complete, defend, explain, formulate, generalize, identify, integrate, modify, order, organize, prepare, relate, synthesize
Internalizing values [ characterization]	The development of new values controls person’s behavior. The behavior is persistent, consistent, predictable, Examples, cooperate in group activities, [displays team work], displays professional commitment to ethical practice on daily basis. Revise behavior and changes in the light of new evidence. Value people for what they are, not how they look. Key words. Act, discriminate, display, influence, listen, modify, perform, practice, propose, qualify, question, revise, serve, solve, verify.

Psychomotor learning domain

Psychomotor domains of learning consist of change in the way of acting, skills performance, physical productivity or manipulative skills and body movement. The developments of these skills require practice, and they are measured in terms of speed, precision( accuracy/exactness), distance, procedures, or techniques in execution. There are seven sub-categories of psychomotor behavioral domain which are arranged hierarchically from simple behavior to the most complex one. The organization of taxonomy ranges from a student showing readiness to perform a psychomotor task to the use trial and error to learn a task andto actually carrying out a task on his/her own.

There are seven levels of psychomotor learning domain which are organized from simple to complex as shown in the table below;

Competence	Skills demonstrated
Perception [ awareness]	The ability to use sensory organs to obtain hints that guide motor activity. Example. Detect non-verbal communication cue/signal; adjust heat stove to correct temperature by smell and taste of food. Key words. Choose, describe, differentiate, distinguish, identify, isolate, select, relate.
Mindset. Readiness to act.	It includes mental, physical and emotional sets [mindset] Example. Showing desire to learn a new process [motivation]. Key words. Begin, display, explain, move, process, react, show, state
Guided response	This includes imitation, trial and error and practice. Example. Perform mathematical questions as demonstrated. Follow instruction to build a model. Key words. Copy, trace, follow, reproduce, and respond.
Mechanism	This is the intermediate stage in learning complex skills. Learner’s responses have become habitual and the movement can be performed with confidence and proficiency. Example. Using a personal computer. Key words. Assemble, construct, dismantle, display, fasten, fix, grind, heat, manipulate measure, mend, mix, organize, and sketch.
Complex overt response	This is the act that involve complex movement such as performing without hesitation. Key words. Assemble, construct, dismantle, display, fasten, fix, grind, heat, manipulate measure, mend, mix, organize, and sketch [ here there are adverbs and adjectives that indicates the performance is quicker, more better and accurate]
Adaptation	Skills are well developed and the individual can modify movement patterns to fit special requirements. Example. Responding effectively to unexpected experience, modify instruction to fit the needs of the learners Key words. adapt, adjust, change, rearrange, reorganize, revise,
Origination	Creating new movement pattern to fit a particular situation or specific problem. Learning outcomes emphasizes creativity based on highly developed skills Examples. Construct a new theory, develop a new comprehensive training programme, Key words. Arrange, build, combine, compose, construct, create, design, initiate, make, originate.

Table 1. Examples of action verbs used to write educational objectives for each category of cognitive domain.

Knowledge	Comprehension	Application	Analysis	Synthesis	Evaluation
Count	Classify	Compute	Break down	Arrange	Appraise
Define	Compare	Construct	Differentiate	Combine	Conclude
Identify	Contrast	Demonstrate	Discriminate	Compile	Criticize
Label	Convert	Illustrate	Outline	Create	Critique
List	Discuss	solve	Separate	Design	Grade
Match	Distinguish		Subdivide	Formulate	Judge
Name	Estimate			Generalize	Recommend
Quote	Explain			Generate	support
Recite	Generalize			Group
Repeat	Give examples			Integrate
Reproduce	Infer			Organize
Select	Interpret			Relate
state	Paraphrase			summarize
	Rewrite
	Summarize
	Translate

Below are sample instructional objectives derived from Blooms Taxonomy with the taxonomic categories.

1. Knowledge: At the end of the lesson students should be able tolist three types of family.

2. Comprehension: At the end of the lesson students should be able to distinguish between democratic states from authoritarian states.

3. Application: At the end of eighty (80) minutes lesson students should be able to construct a bar graph by using information available in the table.

4. Analysis:At the end of two hour lecture, students should be able to analyze the main features of monopoly capitalism.

5. Synthesis: At the end of two hour lecture, students should be able to integrate information from the science experiment into lab report.

6. Evaluation:At the end of two hour lecture, students should be able to judge the quality of varied persuasive essays

NB. Instructional objectives should be SMART

S=Specific

M= Measurable

A= Attainable

R= Realistic

T= Time bound

Assessment of the learning domains

Different assessment approach characterizes the different behavioral domains. For example.

Cognitive Domain

The most commonly taught and assessed educational objectives are those in the cognitive domain [ Airasian and Russell, 2008]. Nearly all tests that students take in schools are intended to measure one or more of these cognitive activities. Teacher’s instruction is usually focused on helping students to attain cognitive mastery of some contents or subject area.

Cognitive domain is measured through paper and pencil test of various type as well as oral questioning ( e.g comprehensive examination, research defense).

Affective Domain

Affective domain is assessed by observation and questionnaires e.g rating scale

Affective behaviors are rarely assessed formally in school and classroom. Teachers assess affective behavior informally especially when grouping up students. For Example to determine students who can work under no supervision and who cannot.

Most classroom teachers can describe their students affective characteristics based on their informal observations and interactions with their students.

Psychomotor Domain

Psychomotor domain is assessed by observing students carrying out the desired physical activity.

It is measured in terms of speed, precision, distance, procedures, or techniques in execution.

1.6. Table of specification of instructional objectives

What is table of specification?

TOS is the table that helps teacher align objectives,instruction and assessment when teaching.

Table of specification has two dimensions:

i. The content dimension which includes the main topic of instruction and assessment

ii. The process dimensions which include the six categories cognitive domain related to each topic content or objectives.

Content dimension	Process dimension/objectives
Content dimension	Knowledge	Comprehension	Application	Analysis	Synthesis	Evaluation
Stages of writing	X (L) Mention the three stages of writing process	X(M) Explain the purposes of the three stages of writing process
Topic sentences			X(M) Write down the topic sentences	X(L) Differentiate topic sentences from other types of sentences		X (H) How useful is the topic sentence in writing an essay?
Writing essays					X (H) Write an essays on the stages of writing process
Note: L=Low ( time M= Middle ( time) H= high amount of time X = Objective(s)

The intersection between process dimension and the content dimension is referred to as objective.

For example: X= Students should be able to mention the three stages of writing (prewriting, writing and editing).i.e students will remember the three stage of writing process.

L= Refers to the amount of time allotted to this objectives. Since it is a simple memorization task, a low amount of time is spent teaching it.

For example. Student should be able to explain in his or her own words the purposes of the three stages of the writing process. Stages of writing relate to two different objectives

Once objectives are identified and organized the next step is to develop a lesson plans for them. In selecting activities to be done so as to achieve the objectives the following should be taken into consideration:

i. The ability level of your students

ii. Their attention spans

iii. Suggestions made in the textbook.

iv. Additional resources available to supplement and reinforce the textbook.

In the first objective the table reminds teacher that he/she needs both remembering and explaining activities to attain the first two objectives. For example specific objectives could read as follows;

a. At the end of the lesson students should be able to mention the three stages of writing process ( knowledge)

b. At the end of the lesson students should be able to explain the purposes of the three stages of writing process (comprehension).

Instruction

Once objectives and planned activities are identified, the next step is to instruct the student basing on objectives developed.

Assessment

When constructing a test teacher need to be concerned with the class content/ the subject matter and the kind of thinking/responses required on the test.

A test should also align with the level of thinking required of students during instruction and assessment. For example if teacher taught about knowledge (recall information) the test should require students to recall information learned.

Decision in planning a test.

When deciding what should be included in the assessment and the type of tasks students should take, four most important questions teacher need to answer.

1. What should I test?

In deciding what to test it is important to focus on both the objectives and actual classroom instruction that took place.

2. What type of assessment items or tasks should be given

The type of assessment item or tasks should base on learning objectives. For example from table above. The type of assessment procedure chosen depends on the nature of the objective being assessed.

a. Comprehension( write in one’s own words)

b. Applying ( write a topic sentence)

c. Synthesize ( integrate and write an essay)

3. How long should the test take?

a. The age of the students, the subject being tested and the length of the class period all affect the length of a test.

b. The number of questions per objective depends on the instructional time spend on each objective and its importance.

Stages in planning classroom test

Tests are used to evaluate students learning. Testing involves determining the behavior to be measured and designing test items that will elicit desired performance. The goal of classroom testing is to improve teaching and learning. Things to consider when planning for a test.

i. Determine the purpose of the test.

The purpose of the test is to:

a). to determine if student have perquisite skills needed for instruction and to find out what they know about the lesson to come ( pre-testing).

b). to monitor learning process, to detect learning problems and to provide feedback to students and teacher. ( formative testing)

c). to measure the extent to which instructional objectives have been achieved. (Summative testing).

ii. Developing test specification

Test specification entails making a test a representative sample of the instructional objectives (the content that was taught) by using a table of specification. The table of specification is made up of instructional objectives / behavioral objectives and the content

Table 1. Specification of concepts in measurement and evaluation

OBJECTIVES	Knowledge	Comprehension	Application	Analysis	Synthesis	Evaluation	No. Of test items	%
CONTENT
testing	2	1	2	1	1	2	9	18
measurement	4	2	2	2	3	2	15	30
evaluation	2	2	4	3	1	1	13	26
assessment	2	3	2	2	2	2	13	26
Total items	10	8	10	8	7	7	50
% of items	20	16	20	16	14	14		100

iii. Selecting Appropriate Test Items

There are two types of test items namely objectives(multiple choice, matching and true/false itemsand subjective test items (essay items).

According to Thungu et al (2010) the allocation of marks for cognitive skills can be as follows:

Skills	Percentage
Knowledge	12%
Comprehension	16%
Application	32%
Analysis	20%
Synthesis	12%
Evaluation	8%
Total	100%

iv. Preparing a set of relevant items

The intended learning outcome will dictate the type of items to be used. If the intended learning outcome is to mention, name, list, the selection items will be appropriate. If the intended outcome is to identify, a supply type test items will be used. An item should be included only if it can measure a sample of the intended learning outcome.

PRINCIPLES OF TEST CONSTRUCTION

Purpose of Testing

i). to identify what students have learned after the completion of a lesson or unit of instruction. These tests are also important when discussing student progress at parent-teacher conferences.

ii). to identify student strengths and weaknesses.

This is effective when teachers use pretests at the beginning of units in order to find out what students already know and where the teacher's focus needs to be.

iii). It is used for placing students

iv). Tests can be used as a way to determine who will receive awards and recognition

iv). to assess teacher and/or School's Effectiveness

v). Show the depth of understanding of an idea or mastery of a skill

vi). Show student growth over time in a particular area of knowledge.

vi). Compare one student’s or group’s achievement to another’s on the same task.

vii). Predict students’ future performance

Construction of test items

General Rules for Writing Test Items

i). Use examination format as a guide to item writing. Examination format describes the scope and content coverage to be measured and the sample of tasks to include.

ii). Write more items than needed for a particular examination so as to allow the weaker items to be discarded during later review.

iii). Write items well in advance of the submission date. Setting items aside for several days and then review will help reveal any lack of clarity and ambiguity that was overlooked.

iv). each test item should call forth the performance described in the intended learning outcome.

v). Write each item so that the task to be performed is clearly defined. In formulating questions use simple and direct language, correct punctuation and grammar and avoid unnecessary wording.

vi). Write each item at an appropriate reading level. Pupil’ responses should be determined by the performance being measured and not by some factor the item was not designed to measure.

vii). Write each item so that it does not provide help in answering other items. For example, a name, date, or fact called for in a short – answer item might be unintentionally included in the stem of a multiple –choice item in another part of the test.

viii). Write each item so that the answer is one that would be agreed upon by experts. This rule is easy when measuring factual knowledge but more complex measuring complex outcome calling for the best answer such as the best reason, the best method, the best interpretation and the like. Be sure that experts would agree that the answer is clearly the best.

ix). Write each item so that it is at the proper level of difficulty. That is to say the difficulty of the item matches the performance to be measured and the purpose of the test.

x). whenever the item is revised, recheck its relevance i. e to be sure that it still provide a relevant measure of the intended learning outcome.

Classification of Tests

There are two types of tests

Objective tests/ selection items are those in which examinee select the correct answer from among a number of choices presented in them. Objective items include multiple choice items, true and false items and matching items.Objective tests are distinguished from subjective test in that the task is highly structured and limit the type of response. Students are not free to redefine and organize and present the answer in their own words.

Subjective / supply item test.It is an item format that requires the student to structure a rather long written response up to several paragraphs. Supply items require the students to supply or construct his or her own answer. Subjective/supply or constructed - response items include restricted response items, short answer items, completion items and essay items.

Objective/selection tests

Multiple Choice Items

Multiple choice items consist of a stem which presents the problem or question to the student (premise) and a set of option or choices from which students select an answer. Options could be 4-5 and there should be only one correct answer. Incorrect but reasonable options in the multiple choice questions are called distracters. The problem may be stated as a question or as an incomplete statement. For example

i. Direct question

Which of the following is not an element of weather?

a) Humidity

b) Leaching

c) Sunshine

d) temperature

2. Incomplete statement

…………………is one of the elements of weather

a). rainfall

b). erosion

c). deforestation

d). leaching

Multiple choice items are used to measure simple and complex learning outcomes.

a). knowledge of terminology/vocabulary/terms

Example An organism living in or on another organism is

a. A predator.

b. Prey.

c. A parasite.

d. A host.

b). Knowledge of specific facts Multiple choice can be used to assess students’ grasp of discipline based factual knowledge it deals with what who where and when

Example: Which of the following states does not border Oklahoma?

a. Colorado

b. Missouri

c. Nebraska

d. New Mexico

c). Knowledge of Procedure

Example: The correct procedure for combining acid and water is

a. Add acid to large amount of water

b. Add water to large amount of acid

c. Add acid to water, cool and swirl

d. Add water to acid, cool and swirl

d). Knowledge of Principles

Example. The principle of capillary action helps to explain how fluids:

a. Enter solutions of lower concentration

b. Escapes through small openings

c. Pass through small semi-permeable membranes

d. Rise in fine fluids

Multiple choice questions also measure higher level outcome such as:

i. Application ( Faradays law can be used to explain

a).

b).

c).

d).

ii. Interpretation

Majimaji war occurred in the southern part of Tanzania because.

a).

b).

c).

d).

iii. Justification of methods and procedure

Why do farmers rotate their crops?

a).

b).

c).

d).

Guidelines for constructing multiple choice type of item

i. State the problem clearly in the stem

For example:

The components of a multiple-choice item are

ii. Include one correct or most defensible answer

For example

According to …….the most serious aspect of the energy crisis is the

iii. Select attractive distracters. Distracters should be attractive to examinees

iv. Options should be presented in a logical, systematic order. For example dates of events should be arranged chronologically, numerical quantities in ascending order and names in alphabetical order.

v. Options should be grammatically parallel and consistent with the stemother wise they can provide clue to the correct alternative.

Example 1

A test which can be scored by untrained person in the content area of the test is an

a. Diagnostic test

b. Criterion-referenced test

c. Objective test

d. Reliable test

e. Subjective test

Examinees take advantage of inconsistent stem and options to get the correct answer. They respond in terms of verbal skill possibly quite different from the skills intended to measure.

The item might be rewritten as follows

A test which can be scored by untrained person in the content area of the test is said to be

a. Diagnostic

b. Criterion-referenced

c. Objective

d. Reliable

e. Subjective

vi. Options should be mutually exclusive i.e it should contain one option which is the most correct or the best answer.

vii. Ensure that correct responses are not consistently shorter or longer than other distracters. The difference in length might give clue o the correct answer.

viii. The options such as “none of these”, “none of the above”, “all of these”, “ all of the above” should not be used when the examinee is to select the best but not necessarily absolutely correct answer.

ix. Correct answers in a test should appear randomly

What are the advantages and disadvantages ofmultiple choicetest items?

Matching Items

Matching items consists of two columns;

a. A column for the stem/problem to be answered called Premise( Column A)

b. A column of responses(column B).

Normally a column of stem is placed on the left hand side and the column of responses is placed on the right. Matching items often measure recognition of factual knowledge based on simple associations that may include:

· Persons who are associated with events

· Dates with historical events

· Terms with definition

· Rules with examples

· Symbols with concepts

· Parts with functions

· Plants/animals with classification

Guidelines for writing matching items

i. Include homogenous materials in each exercise

ii. Include at least three to five but no more than eight to ten items in a matching set why? Long set of matching items require examinee to do a good deal of work in keeping track of stems and searching for options. Furthermore it is difficult to write long matching items which are homogenous. Thus three to eight items per matching set is a reasonable compromise.

iii. Eliminate irrelevant clues. There should not be verbal association clues, plural and singular clues between the stem and the correct option pair.

iv. Place each set of matching items on a single page

v. Reduce the influence of clues and thereby increase the difficulty of matching item. This can be accomplished through

a. Using a different number of options than there are items

b. Allowing each option to be used more than once.

vi. Compose the response list of single word or very short phrases

vii. Arrange the responses in systematic order: alphabetical, chronological. This order enables examinees to find correct responses more quickly.

viii. A column of response should have more items than the other.

ix. Items in the columns should be grouped homogeneously

E.g. LIST A LIST B

1. Leonardo Da Vinci a. American Gothic

2. Edward hopper b. The Thinker

3. Michelangelo c. Mona Lisa

4. Auguste Rodin d. The last Supper

5. Grant Wood

What are the advantages and disadvantages of matching test items?

True-False Or Alternative Response Items

These are test items with only two possible answers.It consists of declarative statement that the pupil/student is asked to mark true or false, right/wrong; correct/incorrect; yes/no, agree/disagree etc. Because of these different responses they are called alternative responses.

The alternative response or true and false items are used in measuring.

a. The ability to identify the correctness of statements or definitions of terms.

b. The ability to distinguish facts from opinion.

c. The ability to recognize cause and effects relationship.

Guidelines for Constructing True-False/Alternative Items

i. Include only one idea in each item

ii. Eliminate partly true-partly false items

iii. Ensure that true and false items are approximately equal in length.

iv. Balance the number of true items and false items

v. Eliminate vague terms of degree or amount.E.g words like frequently, seldom are open to interpretation in the true-false items.

vi. Use caution in writing negative item statements.

What are the advantages and disadvantages oftrue-false or alternative response test items.

Subjective/Supply Test Items

Subjective items consist of completion (fill in the blank) items, short answer items, essay type items.

1. Short answer present the problem with the direct question which require the students to answer using their own constructed responses.e.g What is the name of the first president of Tanzania?

What are the main parts of human body?

2. The completion items present the problem as an incomplete sentence. E.g. the name of the first president of Tanzania is ……………………………………..

Short answer and completion items assess primarily factual knowledge-recall-dates, places, specific person and comprehension.

The main parts of human body are i……………………….ii……………iii………………iv………….

Guidelines for writing good short answer items

i. Construct the stem so that the answer is definite and brief.

ii. Make sure that there is only one correct answer

iii. Avoid lifting sentences from textbook

iv. For completion and fill-in-blank formatting

- Make response blanks equal length

- Avoid grammatical clues preceding the blank.

- Do not use too many blanks in one item-usually no more than two

- Include enough information in the stem to ensure the desired response

Essay Items

Essay items allow students to communicate a unique constructed answer to a question.

There are two categories of essay type questions. These are:

i. The restricted-response questions are essay questions that limit content in terms of scope and response. A student is required to state or list factors, reasons, differences, similarities, merits and demerits. Such questions limit the student in terms of content of the answer and length of the response.

Example-

Limited content

List the types of leadership style

Limited response

Briefly the advantages and disadvantages of each style.

ii. The extended-response items

These are test questions that require the students to select factual information, organize the answer in a way they like, integrate idea as they deem appropriate. Extended response questions are used to measure the ability of student to select information, organize, integrate and evaluate ideas.

Example

Describe the influence of climate change on agricultural development in Africa today

Evaluate the significance of participatory teaching techniques at primary school level in Tanzania.

Principles for construction of better essay questions/items

· Essay questions should measure learning outcomes that cannot be satisfactorily measured by objective test items.

· They should measure the achievement of instructional objectives.

· Each question should indicate clearly the task to be undertaken by students

· They should indicate the time limit for each question

What are the advantages and disadvantages of essay test items.

ASSEMBLING, ADMINISTRATION AND ANALYSIS OF TEST RESULTS

Classroom testing process

i. Identify the learning outcomes to be tested and measured

ii. Selection of appropriate test format

iii. Construct test items that are relevant to learning outcomes specified

iv. Assembling of the test questions

Assembling Classroom Test

Assembling classroom test refers to the process of grouping test items by type such as multiple choice, true-false etc.

The importance of grouping test by type is

i. To avoid the necessity of students shifting from one response mode to another as they move from item to item.

ii. To help students cover more items in a given time

iii. Makes scoring easier

Organization of test items.

One of the important considerations in assembling the test is the order in which the item types are presented. In most tests selection items come first and supply items come last.

Guidelines for assembling test items

1. Record test items in a special way e.g on a paper

2. Review test items several times so as to make items appropriate to learners’ outcomes that are intended to be measured.

3. Arrange items in a logical manner according to the examination format

i. Organize test items by type selection before and supply items last

ii. Do not split multiple choice or matching items across two pages of the test

iii. Separate multiple choice option from the stem by beginning the options on a new line.

iv. Number the test items

v. Space items for easy reading and writing responses.

vi. Make sure that you have enough copies of examination.

vii. Provide enough questions to ensure reliability

4. Prepare instructions to be followed by students in answering the test items

5. Each section of a test should have instructions that direct students what to do

Test Administration and Marking

Test administration involves establishment of a conducive physical and psychological setting that allow students to demonstrate their best performance as well as to manage time.

Guidelines for administering test

i. Creating a quiet comfortable Physical and psychological setting.

Physical setting

Examination environment should be quiet and comfortable. This can be achieved through minimizing interruption of any kind. Some of the ways to minimize interruption in the examination room are:

a. Posting a sign on the door indicating that testing/examination is in progress.

b. Proofreading the test items and directions before administering it

c. Ensuring that enough facilities such as desks, chairs, clock are available

d. Making sure that there is enough ventilation and light

Psychological setting

This involves creating psychological setting that reduces student anxiety. Test anxiety is diminished through informing students on test, giving students good instruction, a good unit of review.

ii. Keeping track of time by informing student on the remaining time

While administering test, teacher should be aware of cheating. Cheating is a common disease in school. Students cheat in the examination for various reasons such as:

a. Pressure from parents/teachers

b. Failure to prepare and study for the test

c. Internal pressure from being in an course that gives a limited number of high grades

d. Danger of losing a scholarship

Forms of Cheating

a. Copying from another student’s examination/test answers.

b. Dropping a test paper so that others can copy from it.

c. Writing test information on an eraser or a small piece of paper and passing it to another student or using it.

d. Developing codes, formula, key words on object for use in the test

e. Changing answers when teacher allow students to grade each other.

f. Keeping test information in a toilet room

g. Writing test information on the arms or thighs to cheat.

h. Use programmed material in watches or calculator in the test room.

i. Look at another student’s paper during a test

How can we discourage cheating?

a. Search students while they enter they enter a test room.

b. Providing students good instruction and information about a test

c. Before testing, students’ books and other materials should be kept away from the test room

d. Observing students during testing

e. Knowing the common methods of students’ cheating

f. Students’ seat should be spread out in the test room

g. Discouraging students’ to wear caps in the test room.

h. Using different test forms.

i. Assigning students seats for a test

j. Giving more in-class test and fewer take home test.

Scoring of Tests

The process of scoring a test involves measurement that is assigning a number to represent a student’s performance. It provides a summary of student’sperformance. The complexity of scoring varies with the type of test. Selection test is easier to score than the supply item test.

Scoring the selection test

Selection test consists of multiple choice, matching and true-false test items. Scoring selection test is objective because they are brief and have only one correct answer. There are different methods of scoring objective items. One common method is to put a tick to the correct answer and a cross to the wrong answer. However it is advised to indicate the correct answer instead of a cross to the wrongly answered item and the score instead of a tick to the correctly answered item.

Scoring Short Answer Test

Short answer and completion test items call for short responses like word, phrase, date, name etc. Therefore scoring is not difficult and can be quite objective

Scoring essay test items

Essay item is the most complex item to score because essay questions allow each student to construct a unique and lengthy answer/response to the question posed. Therefore there is no single answer key uniform to all responses. Thus the interpretation of responses is necessary.

Factors which undermine teacher’s ability to evaluate essay fairly and reliably are:

a. Halo effect i.e. irrelevant factors canattract the attention of the marker making an essay appear better than it really is. Such factors include:

i. Hand writing

ii. Style of writing such as sentence structure

iii. Spelling and grammar

iv. Neatness

b. Identity of the student

c. Location of one’s paper in the pile of test papers

d. Teacher’s dislike for a student

e. Teacher’s mood

How to minimize biases

a. Develop a scoring guide (rubric)/ marking scheme. Scoring guide lists the key components in the essay that will be graded as well as the level of performance that will receive points or it refers to a short description that defines each level of performance along with the number of points that level will receive.

E.g accuracy of the content; language/vocabulary; sources/citations; spelling/ grammar, organization of essay

b. Teacher should identify students by number when scoring essay responses.

c. Score student on the basis of present performance, not on the ability, interests or past performance of student.

d. Inform the students on the demands of essay questions such as good hand writing, proper punctuation, spelling, accuracy and organization of the essay

e. Score the first essay for all students before moving to the next essay in order to be consistent and to do justice to students when scoring.

f. Describe in advance how you are going to handle factors that are not relevant to the learning outcome being measured. Such factors include: spelling, handwriting, sentence structures, punctuation and neatness.

g. Re-read essay answer a second time after scoring so as to check objectivity.

Approaches to scoring essays.

There are two approaches to scoring essays

a. Holistic scoring which provides a single overall score/ grade of the complete essay. Holistic score is useful when the overall impression of student achievement is made.

b. Analytical scoring. This provides separate score for each components of a test e.g. score for accuracy, organization, supporting arguments, grammar and spelling. Analytical score provides students with detailed feedback that can help students improve different aspects of their essays. It is useful when determining the strengths and weaknesses in student’s work or to assess multiple objectives that are integrated in the essay.

ITEM ANALYSIS

Item analysis refers to the process of judging the quality of selected response test item. It is a set of procedures designed to evaluate the quality of test items used for assessment.Item analysis is done after a test has been done and scored so as to determine whether

· each item in the test functioned as it was intended

· the item was capable of discriminating between the best and weak student in terms of achievement

· the item was able to measure the effect of teaching and learning process

· The item was of appropriate difficulty.

Individual item for assessment can have unique characteristics namely:

i. Item difficulty ( how hard a test item is)

ii. Item discrimination (tells us how frequently an item is answered correctly by those who perform well on a total test). Item discrimination reflects the relationship between student’s responses for the total test and their responses to a particular test item.

Item Difficulty

Item difficulty is the ratio or percentage of individuals who answered an item correctly.

Item difficulty index = number of correct answers

Total no. of students who answered the item

The easier the item, the larger the item difficulty index.If item 1 is answered correctly by 15 out of 20 students then the item difficulty index is 15 which is 0.75 or 75%

Item difficulty is used as a measure of how hard an item is for all students, those who performed well overall and those who performed poorly. A good assessment is one that balances the difficulty of items to provide information about a range of student abilities and performance.

Item discrimination

Is the degree to which an item differentiate those who have higher level of achievement from those who have lower level of achievement. The discriminating power of an item is a measure of the ability of an item to distinguish between those students who performed well overall on a test and those who did not.

Procedures for analyzing test items

h. Identify the three groups of students in the classroom, the higher, middle and lower performing students. Ranks order all of the test papers from the highest score to the lowest score.

iii. select about 25% of papers from the top and 25% of the papers from the bottom

iv. Put aside the middle papers as they will not be used for analysis

v. For each test item tabulate the number of students in the upper and lower group who selected each alternative

vi. Compute the difficulty index of each item for the upper and lower group using the following formula

Item difficulty index = number of correct answers

Total no. of students who answered the item

Example. Item number 1

Tanganyika attained its independence in

High Low

10 6 a. 1961

8 4 b. 1962

00c. 1965

2 10d. 1967

Total ( 20 ) (20 )

Option (a) is the correct answer of an item number one (question 1)

Calculate the item difficulty index on each item for the high and low groups.

High group. Item difficulty = No. of students who answered the item correctly

No. of students in that group

= 10

= 0.5 or 50%

Low group. Item difficulty = No. of students who answered the item correctly

No. of students in that group

= 6

= 20

= 0.3 or 30 %

Item discrimination

Take the item difficulty for the high group – the item difficulty for the low group= item discrimination.

Item discrimination index = 0.5 – 0.3= 0.2

item discrimination values range from -1.00 to + 1.00

The discriminator can either be positive or negative

Positive discriminator is an item that is answered correctly by majority of students who did well on the test compared to those who performed poorly. The more positive the discriminator the better the item is functioning in differentiating among the varying levels of achievement. Such item is said to be precise, useful and effective test item.

Negative discriminator is an item that is answered correctly by the majority of poor performing students compared to those who did well overall. Such kind of item is undesirable.

Non discriminator is an item which does not differentiate between the higher performing and the low performing students.

The purpose of item discrimination is to compare the response rate of the high-performing students to the low performing students on individual items.

vii. Evaluate the effectiveness of distracters in each item (the effectiveness of incorrect alternatives. This is achieved by inspecting the number of students in the upper and lower group who selected the distracter being evaluated.

For example the result in item number 1 of a test was as follows:

Example. Item number 1

Tanganyika attained its independence in

High Low

10 6 a. 1961

8 4 b. 1962

0 0 c. 1965

2 10 d. 1967

Total (20) (20)

Option (a) is the correct answer of an item number one(question 1)

Interpretations

a. Option A is a good option and it functions as intended because it attracted more students from the upper group.

b. Distracter B is a poor distracter because it attracted more students from the upper group than students from the lower group

c. Distracter C is ineffective because it attracted no student

d. Distracter D is a good distracter because it functions as intended by attracting more students from the lower group than from the upper group

Effects of item analysis

i. Provide a base for efficient discussion of test results

iii. Provide a base for improving classroom instruction by revising the curriculum on the part that seemed to be difficulty

iv. Provide a base for improving skills in test construction

v. Provide a base for carrying out remedial teaching in area that are difficulty

Grading

Grading is the process of holistically evaluating student’s performance and assigning evaluative symbols to represent what learners know and can do or may not know or be able to do as evidenced by various assessments( Airassian, 2008). They represent teachers’ summary judgment about how well students have mastered the contents and processes taught in the subject area during a particular term or grading period. Grades are based on two dimensions such as:

i. Analysis of assessment data such as quizzes, homework, tests, assignments and others.

ii). Interpretation and communication of grades. Having gathered data from your student’s assessment, teacher need to make judgment about the meaning of these data. The interpretation should base on a set of criteria your school has established. Thomas Guskey and Jane Bailey (2003) identified three types of learning criteria used in grading and reporting. These are:

a) Product criteria. This is a type of grading which is based on the final examination report (summative evaluation)

b) Process criteria this is the process of grading and reporting which is based on the course work and final examination.

c) Progress criteria. This is the grading system which deals with how much students have gained from their learning experiences (e.g. oral comprehensive examination).

Why do we grade?

The purpose of grading includes:

i). to communicate students’ academic achievement to students, parents and others. However grades become distorted when non academic factors such as attendance, efforts, attitudes, class participation, group work, class discussion or behavior are included.

ii) Administratively, grades are used

a). to determine the students’ ranks in class,

b). to credit for graduation

c). to determine the suitability for promotion or graduation or employment.

iii). they are used to determine the strength and weaknesses of the different teaching approaches for teachers.

iv). they are used to motivate students and parents to improve students efforts.

vii) They are used for guidance. They help teachers, students and counselors to choose appropriate courses and course level.

viii) They help teachers to identify students who are in need of special services.

vi) They are used to sort out the best student from the rest.

How Do We Grade?

There are different forms of grading namely

i) Letter grade e.g. A, B, C, D, E, F,

ii) Using standard based achievement categories such as excellent, good, fair, poor

iii) Using percentage or numerical grades such as 100%, 90% or 100, 80, 70

iv) Using pass/fail system

v) Using point system( tracking grades by adding the points received during the term e.g quiz = 6/10; group work = 8/10 etc

vi) Use of teachers written comments

Classroom grading is based on teacher’s judgment. Teachers’ judgment is based on

i). information about the performance being judged (test scores, book report, performance assessment).

ii). a basis of comparison that can be used to translate that information into grading judgments ( e.g what level of performance is worth A, B, C, D, etc.

Approaches to Comparison for Grading.

Grade is a judgment about the quality of a student’s performance. Several bases of comparison can be used to assign grades to students. The most commonly used classroom grading compare a student’s performance to:

i. The performance of other students

ii. Predefined standards of good or poor performance

iii. Student’s own ability

Comparing student’s performance with other students (Norm referenced Grading).

It refers to the process of assigning grades by comparing the performance of one student with the performance of other students For example when teacher says that Helen has performed better than the rest of students in the class he/she is making norm referenced grading

Comparing student’s performance with the pre-established performance standards) Criterion referenced grading).

The performance standards define the level or score that a student must attain so as to receive a particular grade. All students who reach a given level get the same grade regardless of how many students reach that level. For example, students’ assessment contain two parts the course work and final examination. Passing the course depends on getting 50 percent of the total marks. Thus 50% is the performance standard. Pass or failure will depend on how you compare yourself to the performance standards of 50 percent.

Comparison with student’s own ability/Ability based grading approach

It involves comparing student’s actual performance with the performance they expect based on teachers’ judgment of the student’s ability. The term overachiever and underachiever describe students who do better or worse than teachers’ expectations for what they should be doing. Normally many teachers assign grades to students by comparing a student’s actual performance with their perception of the student’s ability.

Disadvantages of perception based grading system are;

a) The approach depends on teacher having an accurate perception of each student’ ability. In the real sense teachers do not know the reality about student’s ability

b) Teachers get difficulty time differentiating a student’s ability from other characteristics such as self-assurance, motivation or responsiveness. Currently several studies have revealed multiple abilities that help students learn and perform in different modalities such as visual, oral, written etc hence, which one should a teacher focus on to judge student ability?

c) Perception based approach confuses parents and outsider. For example a high ability student might attain 80% mastery of instruction and might receive a C grade if perceived to be underachieving while a low ability student who attains 60% mastery might receive an A grade for exceeding expectations. An outsider might think that the low ability student has mastered more of the course because he/she got high grade.

Grades are regarded as prize that you receive when you study hard or a punishment you get when you do not work hard.Some negative effects of grading are

i. Student’s getting low grade may lose their self - esteem

ii. Failure to graduate if you receive low marks

iii. Detainment if you get low grade

iv. School dropout

SUMMARIZING TEST RESULTS

Summarizing involves synthesis of assessment information into a single grade. The steps involved in summarization are

i. Combine information from various assessments into a single grade

ii. Each type of assessment information should be expressed in terms of the same scale so that all information can be combined into a composite one

iii. Compute the overall scores by:

a). giving each kind of assessment the weight it deserves

b). sum the scores

c). divide the total scores by the number of assessment information.

SUMMARIZING TEST RESULTS

INTERPRETATION OF TEST RESULTS

Once you have scored and graded students’ tasks you need to interpret in order to get meaning from it. Scores on an assessment tells only part of the story. To be meaningful the scores must be interpreted with respect to other variables such as:

i. The scores of other students

ii. The student’s prior performance on similar assessments

iii. The contents of the items answered correctly.

statistics provide a picture about individual, group performance as well as the effectiveness of instructional method. This is because statistics helps us to know the typical/average student’s performance on assessment, the overall performance and the spread of scores i.e the lowest and the highest scores

Ways of Showing the Distribution of Scores

The distribution of scores shows the pattern or organization of data so as to detect meaning from it. The distribution of scores can be indicated through:

i. Frequency table. It is developed by arranging scores from the lowest to the highest score. Then tallying the number of times a particular score occurred. From the table one can:

a. Compare the performance of individual against the others

b. See the distribution of scores i.e the highest the middle and the lowest scores

c. See the poorly and well performed students.

ii. Histogram. This is a pictorial representation of data in the form of bar graphs. It is used to display frequency distribution. It has two axes, the X-axis (horizontal line) which displays the scores and the Y-axis (the vertical line) which display the frequency of each score.

iii. Frequency polygon. It is the line graph similar to bar graph.

Measures of central tendency

This is a numerical summary of a set of scores. There are three measures of central tendency; mean, median and mode. Each of these is a different ways of summarizing scores into a single number.

i.The mean (X) it is an arithmetic average of a set of scores. it is calculated by summing up individual scores and dividing by the total number of scores. The formula is:

Mean (X) = total number of individual scores

Number of students (N) or Scores

The mean uses all scores in the set of data. Every assessment score is used to calculate the mean including those who did extremely well and those who did extremely poorly. . Scores that are quite different from the majority (either higher or lower) are called outliers. Outliers can distort the mean by pulling it lower or higher than what might be the typical or average performance on the test. A skewed distribution that is pulled lower by outliers is called a negatively skewed distribution. A distribution that is pulled higher by the outlier is a positively skewed distribution

What is the importance of knowing the shape of the distribution? It helps teacher to know how students have been growing or not growing from the entry point (test done before instruction) to the end of instruction

Advantages of the mean

a. It takes all of the scores into account. None of the scores is left out

b. It is simple to calculate

Disadvantages of the mean

a. It is affected by extreme values or outliers. Outliers tend to pull the mean lower than we might expect. When thereis no outlier the mean is high.

b. The mean may not exist in the data set

j. The median. This is the middle score in a set of scores. it is calculated by following certain procedures such as

a. Arranging the scores from the lowest to the highest

b. Determine the middle score/s

c. If there are odd number the median is the middle score and if there are even numbers the median is obtained by adding the two middle numbers and then divide by two. The median is best used when you are concerned that outliers might be affecting the mean making it less representative of a group of scores.

Advantages of the median

a. It is not affected by outliers

b. It is easy to compute and comprehend.

c. It is useful when comparing a set of data

Disadvantages of the median

a. Sometimes the median is the number that is not actually present in a data set

b. It consumes a lot of time to sort outscores from the smallest to the highest.

c. Does not take into account all the data in a data set/does not use all information available

The mode. Mode is the most frequently occurring score in a set of scores (Popham, 2008; Musial et al, 2009). In a set of scores there can be two frequently occurring numbers, and then we call this a bimodal distribution. In case there are more than two modes we would call these multimodal distributions.

Example: 1. Scores 35, 56, 73.67.43.62.70, 39.45, 51.56.61.56, 71, 82, 80, 66, 58.64, 54.

The mode is 56 (the frequently occurring number in a set of scores)

Example 2 Scores 35, 56, 73.67.43.62.61, 39.45, 51.56.61.56, 71, 82, 80, 61, 58.64, 54.

The modes are 56 and 61 (bimodal distribution)

Example: 3. Scores 35, 56, 73.67.43.62.61, 39.45, 51.56.61.56, 73, 82, 80, 61, 58.73, 54.

The modes are 56, 61 and 73 (multimodal distributions).

In a set of score where there is no frequently occurring number there is no mode.

Advantages of the mode

a. It is simple to determine

b. It is not affected by extreme large or or small values

c. It is useful for qualitative data

Disadvantages of the mode

a. It focuses only on the most frequent number in a data set leaving other scores

Measures of Variability

Measures of variability tells us about

i. The variability of student learning and the overall effectiveness of instruction.

ii. The consistency of student performance

iii. Whether the scores are spreadout or bunched together

The measures of variability include: range, standard deviation and variance

Range

Range is the differences between the highest scores and the lowest scores. It deals with the consistency or diverse of a set of scores (Musial et al, 2009).

How to calculate: Range = Highest score – Lowest score

Example 56,67,63,38,62,66,45,51,53,43,52,44,77,58,69.

Range the highest score = 77; the lowest score = 38

Range = 77 – 38 = 39

Advantage = it is easier and quicker to estimate

Disadvantage= it is greatly influenced by outliers i.e. higher or lower scores.

Standard Deviation

Standard Deviation is a measure of the average distance each individual score is from the mean. It indicates how spread out the scores around the mean. If the standard deviation is relatively small compared to the mean then the scores are more homogeneous that is they are grouped together.This means that on average individual scores do not deviate much from the mean. When the standard deviation is large we say individual scores are heterogeneous (they are spread out) meaning that on average the individual scores do deviate quite a bit from the mean.

SD tells how spread out or clustered a set of scores are from the mean. This helps teachers to see how variable student performance is in a classroom.

How to calculate the SD

SD =

(X-X)2 = represents each individual scores minus the mean, squared;

N = the number of scores

EXAMPLE

Scores: 20, 20, 25, 25, 30, 30,

i. To calculate the mean

Individual Scores X	The Average (Mean) X	Deviation (X-X)	Squared Deviation ( X-X)2
20	25	20-25 = -5	25
20	25	20-25 = -5	25
25	25	25-25 = 0	0
25	25	25-25 = 0	0
30	25	30-25 = 5	25
30	25	30-25 = 5	25

X = 20+20+25+25+30+30=150			( X-X)2 = 25+25+25+25=100
N = 6			N = 6
X = X = 150 = 25 N 6			( X-X)2= 100 = 16.7 N 6 16.7

Variance = sum of the squared deviations

SD =

The average distance of each score from the mean of 25 is 4.1, this means that on average the scores are approximately 4 points above or below the mean. SD is an indicator of how spread is the scores from the mean. The larger the SD the more the spread the scores.

SD uses all the scores in a set thus it is likely to be representative of the spread of scores. It is used as a unit of measuring. It could tell which student scored two SD higher or lower than the mean.

Conclusion

1. A teacher can display test scores in a meaningful ways by using frequent tables, histogram or a frequent polygon.

2. Measures of central tendency such as the mean, mode and median can be used to extract meaning from test scores

3. A teacher can determine the variability among scores by calculating the range and standard deviation

Generally measures of central tendency and variability can be used to judge whether students met learning objectives and how effective instruction was.

ASSESSMENT OF NON-COGNITIVE OUTCOMES AND INTELLIGENCE QUOTIENT (IQ)

Classroom observation techniques

Teaching is driven by what we observe in the classroom. Observation is the process of gaining information by watching and listening to students, it can be used to evaluate student’s knowledge, skills, disposition, and behavior. Through simple observation teacher know when students follow directions to complete an assigned task or they do not.

What actually don teachers need to observe?

Teachers observe both appropriate behaviors so as to increase as well as inappropriate behaviors in order to decrease. Teacher’s classroom observations are based on

Academic skills such as reading, mathematics, science, social studies, language. Academic skill is assessed and stated in observable and measurable terms.

Psychomotor skill such as physical movement in various sports, dance, performing arts, singing, playing and so on.

Prosocial skills which involves attitudes, feeling, belief and disposition.

Approaches to Classroom Observation

Observation tools such as anecdotal notes, observation checklist and rating scales are widely used for observing and assessing student’s learning.

i. Anecdotal notes or records

This is a technique which is used to document observations of significant skills or behavior of students. It records factual description of incidents that teacher has observed personally.

It provides a purposefully and detailed description of the strength and weaknesses of a student performance based on pre-specified performance criteria such a student’s ability to: transition to a new activity, follow instructions, focus on the task at hand.

The records can be

i. Anecdotal notes which consists of date, name of student, setting and incident/s or what happened.

Anecdotal Notes

Student name Grace Luis Date 2/1/2015

Setting: Group poster project

What happened/incident

Today during group project, Grace complained about the marker colors she was given. I reminded her of the rule but she grabbed a marker and scribed on the poster, ruining it.

Anecdotal A,B,C records

Date/time	Context/activity	Antecedent	Behavior	Consequences	Student reaction
2/1/2015 12.00 Noon	Students were working on a group poster project	Grace was the material manager for Grayson’s group. She gave him 4 light colored markers	Grayson said these colors stink. he then grabbed a black marker from grace and scribed all over the poster	I stated the rule and punished him by taking him out of class for some time.	Grayson returned into the classroom and joined the workgroup.

NOTE:

Context= the setting

Antecedent= what happened before the behavior

Behavior = what the behavior looks like

Consequences= what happened after challenging behavior

The usefulness of anecdotal notes/records

It is useful when writing report cards comments in parent or student conference

They are useful if intervention such as acceleration or remedial teaching is needed for the students

Observation Checklist

Is a list of behavior that is used to asses a student’s skills such as academic, psychomotor and prosocialskills. Teacher observes the skills and marks them as present or absent, correct or incorrect. Each skill should be written in such a way that it is observable and measurable.

Observation checklist for school facilities (put a tick where appropriate)

School Name

District

Date

FACILITY	AVAILABLE	NOT AVAILABLE
Classrooms
Toilets
Teachers Houses
Library
Laboratories
Sports grounds
Text books
Reference books

Rating scales

Is a form of checklist which consists of a list of qualities that are judged according to a scale that indicate the degree to which each quality is present. Each characteristic can be observed according to some underlying degrees of accomplishment.

Descriptive Rating Scale

It is a rating scale which is based on a series of adjectives or thumbnail sketches. They allow teacher to rate the adequacy or inappropriateness of a student behavior on the scale. In constructing a descriptive rating scale

1. Specify the observable behaviors that are important in your case.

2. Write adjectives that describe point on a scale. The best way to develop adjectives is to determine the best and the worst likely performances and then choose in between levels to create the full scale.

Example of rating scale designed for a student working math problem individually.

Student………………………Date……………………..Assignment: Math skills

Work on problems	Doesn’t start problems	Starts problems abandons some without finishing	Works each problem until completed
Checking work	Doesn’t check work	Checks some work	Checks all work
Correct mistakes	Doesn’t correct mistakes	Correct some mistakes	Correct all mistakes
Stays on task	Is distracted several times	Is distracted once	Stays on task during work time

Numerical rating scales

It is a rating scale which associates number with descriptions along the scale. The higher the number the greater the accomplishment and the lower the number the lower the accomplishment. It is used when summarizing observations across some period of time. The number of points within a rating scale could be based on the number of times a particular behavior has been noted. This kind of rating could look as follow:

1. Never = behavior is not observed

2. Occasionally = behavior has been performed but repeated instances of nonperformance are observed

3. Usually = behavior is performed but a small number of instances of nonperformance are observed

4. Always = behavior is consistently and regularly performed

EXAMPLE. Rating scale for a group project

Group work Rating scale

Project……………………………………………..

Rating scale

1.=seldom or never

2. =some/only part of the time

3.= usually

4.=Always

	Group 1	Group 2	Group 3
Stays on work
Makes progress
Participates in group
Respect other groups
Cleans up

Rating scale for a individual project

Observed student……………………………………

Date…………………………………………

Activity……………………………………………

The student activities	1 Never	3. Seldom	4. Often	5. Always
Work with a wide range of peers not just with close friends
Shares materials and ideas with others
Participates in discussions
Fulfills his or her responsibilities
Show respect for others by listening and considering others points of views
Contribute ideas to the group discussions

Advantages of assessing through observation

1. Allows teacher to assess and monitor progress and behavioral skills as part of normal teaching

2. Allows teacher to discover unique information such as skills and problems that would be difficult to discover by through other means

3. Observation method permits teacher to adapt other assessment methods so as to meet the needs of students.

4. Information gathered through observation can be used together with formal methods such as paper and pencil test to assess students

Disadvantages

1. Faults can occur when judgment is based on single observation

2. It is time consuming to obtain information through observation

3. If teacher is not focused on the specific skill he /she will end up observing unrelated behavior.

Peer Appraisal and Self Assessment

Peer appraisal

This is one of the ways in which students internalize the characteristics of quality work by evaluating the work of their peers. However, in order to offer helpful feedback, students must be given instructions of what they are to look for in their peers' work. The instructor must explain expectations clearly to them before they begin. One way to make sure students understand this type of evaluation is to give students a practice session with it. The instructor provides a sample writing or speaking assignment. As a group, students determine what should be assessed and how criteria for successful completion of the communication task should be defined. Then the instructor gives students a sample completed assignment. Students assess this using the criteria they have developed, and determine how to convey feedback clearly to the fictitious student. Students can also benefit from using rubrics or checklists to guide their assessments. At first these can be provided by the instructor; once the students have more experience, they can develop them themselves. The checklist asks the peer evaluator to comment primarily on the content and organization of the essay.

For peer evaluation to work effectively, the learning environment in the classroom must be supportive. Students must feel comfortable and trust one another in order to provide honest and constructive feedback. Instructors who use group work and peer assessment frequently can help students develop trust by forming them into small groups early in the semester and having them work in the same groups throughout the term. This allows them to become more comfortable with each other and leads to better peer feedback.

Search This Blog

Vack Media

EP 300 EDUCATIONAL MEASUREMENT AND EVALUATION

Comments

Post a Comment

Popular posts from this blog

KATIBA YA KIKUNDI

MKATABA WA KUPANGA NYUMBA