EP 300 EDUCATIONAL MEASUREMENT AND EVALUATION



Introduction
This course introduces students to the concept of educational measurement, monitoring, assessment and evaluation as they are used in education system. Specifically the course intends to equip students with basic knowledge and skills to;
i)       To measure and assess educational outcomes
ii)     Construct different types of test items and measurement scales
iii)   To administer examinations, score and grade examination results
iv)   To judge critically the practice of examinations in Tanzania
v)     To summarize examination results
vi)   To analyze the use and misuse of examinations


MODULE ONE
1. BASIC CONCEPTS IN MEASUREMENT AND EVALUATION
In obtaining basic information with regard to classroom instruction and instructional decisions, the following basic terms are used namely testing, measurement, evaluation and assessment.
Test
It is an instrument or device usedto  gather/collect  information about students’ achievements or other cognitive skills. It is a set of questions set by the teacher to measure a sample of behavior of students. It is like a balance used to obtain weight or a foot used to obtain the height of an object or person.Test determines the degree to which and individual student perform in comparison with other students.
Test can either be a 
i.                    Diagnostic test which is used to determine areas of difficulties encountered by the learner to enable the teacher to take corrective measures
ii.                  Aptitude test it is a test of a person’s ability to learn a task or to perform a task
iii.                Achievement test it is a test given to the learner to determine how much the learner has learned
An effective teacher will make use of test results to make instructional decisions. Some of the instructional decisions are
i.                    Determination of the appropriateness of the teaching plans
ii.                  Grouping students for effective learning
iii.                Determining learning difficulties that  students face
iv.               Identifying students who are underachieving
v.                 Determining the effectiveness of instruction
vi.               Identifying students who have a poor self-understanding
vii.             Identifying students who are in need of special assistance
viii.           Guiding and counseling students to choose a career
ix.                Selecting and promoting students from lower level to the higher level e.g from primary school to secondary school.

Types of tests
Informationthat  teachers collect and use in their classrooms comes from assessment procedures that are either standardized of nonstandardized.
Standardized testis a type of test which is administered, scored and interpreted in the same way for all students across schools, district, state or nation. They are administered to students in many different classrooms but always under identical conditions of administration, scoring and interpretation. The main reason for standardizing testing is to ensure that the testing conditions and scoring procedures have similar effect on the performance of students in different schools and states
Examples  of Standardized Tests
Aptitude Test. The aim of this test is to measure general mental capacity or efficiency to learn a given content. They are used to predict future performance. These tests include language proficiency tests, mathematics, and comprehension. Examples of these tests include Potential Ability Test (PAT), matriculation examination, mature age entry examination.
Achievement Tests.The measurement of students’ achievement of Objectives of instruction in a given course.
Intelligence tests. The IQ (Intelligence Quality) tests areas such as reading and reasoning. Such important tests include Stanford Binet intelligence test. For instance   if you want to be recruited to ITV or TBC or any Radio, you will be subjected to an interview that requires you to read to listeners clearly, so that your ability can be identified.
Nonstandardized( teacher made) tests
Are developed for a single classroom with a single group of students and are not used for comparison with other groups. Nonstandardized test are prepared by the subject teacher
Measurement
Measurement is the process of quantifying or assigning a number to a performance or trait. It involves the process of assigning numerical scores to the performance of a quiz or test. Therefore numerical scores are used to represent the individual performance or trait. Or it is a process of assigning numbers to tests according to specific rule such as counting the correct answers. When we measure we want to answer the question. How much a student has performed?
Evaluation. Is the process of attaching value judgment to a performance or measure.It involves the  judgment of  whether the student is performing high or low level. For example: Alfera who scored 70 out of 100 in mathematics was above average, showed steady progress, is a bight student. Evaluation aims at answering the question: how good? Evaluation is more comprehensive to include testing, measurement and non-formal observation. Evaluation does not always base on measurement. Sometimes evaluation can base on the information acquired from non-measurement techniques of evaluation like observation.

The functional role of evaluation procedures in the classroom context are as follow:
Placement evaluation: this is concerned with student’s entry performance and intends to check if the student posses the necessary knowledge and skills to begin the programme. It can also suggest skipping some topics which are well known to the students. Placing student in a special classor placing a student in a more advanced course of study.
Formative evaluation
This is the continuous evaluation of classroom teaching in order to find out if students are following well the lesson. It is carried out during instruction for the purpose of improving or monitoring learning progress. The purpose is to provide continuous feedback to the parents and students and teachers with regard to success or failure.
Diagnostic evaluation
 This is a highly specialized procedure aimed at detecting learning difficulties that are left unresolved during formative evaluation. When a student continues to perform poorly in any learning task, a more detailed diagnosis is needed. The purpose of diagnosis is to find out the causes of recurring learning difficulties so as to formulate a plan for remedial actions.
Summative evaluation
Summative evaluation is done at the end of the unit of instruction   or at the end of the course so as to determine if the instruction objectives have been achieved. Summative evaluation is used for grading courses, certifying student’s mastery of intended learning outcomes, judging the appropriateness of course objectives and effectiveness of school programme. 
Purpose of Evaluation
Generally the purpose of evaluation is to determine the performance i.e skills attainment and knowledge learned in the educational program. Specifically the purpose of evaluation is to:
i.                    Monitor/check  students progress
ii.                  strengthen desired outcome and behavior
iii.                to guide students to choose a career
iv.               to group and place students
v.                  aid  in curriculum /instruction improvement
vi.               Assess teachers’ effectiveness and efficiency
Effectiveness is the degree to which the teacher achieves the goals
Efficiency refers to the achievement   of the goal at the lowest cost

Assessment
Assessment is the process of collecting, synthesizing, and interpreting information in order to make a decision. Testing, measurement and evaluation often contribute to the process of assessment. There are different meanings of educational assessment.
Assessment means sitting beside a learner in an attempt to help the learner understand what he/she knows or is able to do. Assessment is a process of gatheringdata by interacting with the learner in order to understand his/her  needs. It also simply means talking to a learner in order to understand learner’s needs or problems.
Assessment is judging. It is a process of  determining the degree to which a learner has attained some standard or level of achievement. The assessor/teacher  may require the learner to answer specific questions or perform specific task depending on what that standard is.
Assessment is coaching. Here it means that assessment is there to help the learner achieve a specific objective. The assessor observes the learner and provides some direction on how to proceed. At the same time the assessor collect information about what the learner knows and can do and also where the learner has difficulty or may need more instruction. The main argument of the coaching metaphor is that assessments occur as part of the learning process.

How Do Teachers Gather Assessment Information?
Airasian and Russell (2008) have identified 3 ways through which teachers collect information for making decision. These are student product/work, observation and oral questioning
i.             Student product includes homework, written assignment, essays, science project etc. These products provide teacher with information about students’ cognitive skills.
ii.           Observation it involves watching or listening to students as they carry out activity or respond in a given situation.  Observation helps  teacher to  understand behavior of students such  as:
·         mispronunciation of   words in oral reading,
·         interacting in groups,
·         speaking out in class, bullying (or annoying) other students,
·         losing concentration,
·         having puzzled look on their faces,
·          raising their hands in class ,
·         Failing to sit still more than three minutes.
Observations can be formal or informal.
iii.         Oral questions are used to collect formal and informal information about student. Oral questions are used by teachers during instruction so  as ;
·         to review a prior topic,
·         brainstorm the new one,
·         find out how the lesson is understood by students,
·         Engaging a student who is not paying attention.
The teacher can gather information without breaking the lesson.
Types of assessment
There are three types of assessment.
i.                    Diagnostic assessment or Early Assessment. This is carried out before instruction to determine readiness or entry behavior. The collection of information is normally based on informal observation. The type of information collected is based on cognitive, affective and psychomotor domains.
ii.                  Formative assessment which is carried out during instruction for the purpose of improving or monitoring progress. The collection of information is based on both formal and informal observation and student papers such as tests, homework, project . The type of information collected is largely cognitive and affective and it is kept in writing.
iii.                Summative evaluation. This is carried out at the end of instruction/program for the purpose of grading, selection, placement, certification and evaluation of the achievement or not of instructional objectives. The information gathered is mainly based on cognitive domain.

Purpose of assessment
1.     To establish a favorable classroom environment that support students learning through helping students to interact one another, respect one another, cooperate and follow school rules and regulation.
2.     To plan and conduct classroom instruction.  Conducting instruction/teaching process requires constant assessment and making decision. E.g. when student seem not to understand the lesson or they seem to be bored, the instructor has to make decision on how to solve the problems to make learning proceed.
3.     To place students to the group which they fit e.g higher reading group, middle reading group, fast learners and slow learners in order to assist them.
4.     To provide feedback to the students and their caregivers. Observation and feedback is intended to modify and improve students’ learning
5.     To diagnose student learning difficulties and disabilities so as to help them learnby carrying remedial teaching or make accommodations or referring for more specialized diagnosis and intervention.
6.     Summarizing and grading academic learning and progress. The act of making final decision about students learning at the end of instruction is termed summative assessment. Much of a teacher’s time is spent on collecting information that will be used to grade students or summarize their academic progress.


The Uses of Classroom Assessment
Classroom assessment is used to monitor student progress by various education stakeholders at various levels namely:
National and State Policy makers
Classroom assessment assists national policy makers in:
i.                    Setting state and national standards of  performance
ii.                  Developing policies based on assessment
iii.                Tracking  the progress of national achievement in education sector
iv.               Providing resource to improve learning
v.                 Providing rewards or sanctions for student, school and state achievements
School Administrators
i.                    to identify program strengths and weaknesses
ii.                   to plan and improve instruction
iii.                to monitor classroom teachers
iv.               to identify instructional needs and programs
v.                 to monitoring students achievement over time
Teachers
i.                    to monitor students progress
ii.                  to judge and change classroom  instruction
iii.                to identify students with special needs
iv.               to motivate  students to do well
v.                 to place  students in groups
vi.               to provide  feedback to teachers and students
Parents
i.                    to judge strengths and weaknesses of students/program
ii.                  to monitor student’s progress
iii.                to meet  with teachers to discuss student classroom performance
iv.               to judge  teacher’s quality/effectiveness











Objectives
Objectives help a teacher to focus on what is important and what a teacher wants to accomplish. They also describe the kind of contents; skills and behaviors teachers hope their students will develop through instruction. Other names of objectives are instructional objectives, learning targets, educational objectives, behavioral objectives, student’s outcomes and curriculum objectives. There are three levels of objectives
i.                    Global objectives, they are also called goals. They are general and broad.
Because of their breadth they are not used in planning classroom instruction and assessment.
ii.                  Educational objectives. They are more specific  than global objectives
iii.                Instructional objectives are the most specific type of objectives they  are used in planning classroom instruction and assessment

 Instructional Objectives and Evaluation
Instructional objectives are statements prepared by a teacher that describe specific behavior or a pattern of behavior that a learner is expected to demonstrate as a result of a lesson or series of lessons. Pupil’s behavior or achievement should be observable and measurable. It is a precise description of the competencies the learner is expected to develop from the instruction (Ondiek, 1986).
These statements clearly define the desired learning outcome that is expected from teaching. Learning outcomes are learning products a student is expected to demonstrate as evidence that learning has occurred. Therefore during and after instruction, the teacher determines the extent to which instructional objectives are being achieved.
Instructional objectives are stated in terms of what we expect the learner to be able to do or express at the end

The purpose of instructional objectives is to identify what students are expected to learn in order to help teacher to:
i.                    communicate  to others the purpose of instruction
ii.                  provide direction for the teaching processe.gselect appropriate instructional methods and materials
iii.                form the base for evaluating student’s learning achievementi.e help the teacher to plan assessment that will allow him/her to decide whether or not students have learned desired content and skills that are the focus of instruction.

Characteristics of good instructional objectives
i.                    They are related to intended learning outcomes of instruction
ii.                  They are concerned with students i.e. they describe students performance and they are stated in terms of what student is to learn from instruction
iii.                They are specific because they describe the student’s actual action/behavior/skills that can be observed, measured, instructed and assessed.
iv.               Indicate the condition under which the behavior must be performed and the level of performance the students must show.
e.g Given a map of Tanzania, students should be able to shade all major lakes correctly.
·         Given a map of Tanzania – condition under which the behavior is to be demonstrated
·         Shade all major lakes –behavior expected  is to be demonstrated
·         Correctly – standard or level of performance
v.                 They are derived from general objectives which are stated by using general verbs like understand, appreciate and know which are not measurable.
vi.               They are stated by using action verbs that indicate observable response that can be evaluated by another person
vii.             They have time bound
Components of Instructional objectives
Instructional objective is made up of two parts
i)                   The behavioral part which specifies expected student behavior or achievement as a result of instruction.
ii)                 The content part which specify the context of which the behavior in the objective is to operate.
Example. At the end of the lesson students should be able to list three types of communicable diseases.
Behavior: to list
Content: Communicable diseases




CLASSIFICATION OF EDUCATIONAL OBJECTIVES
Benjamin Bloom and his colleagues in 1956developed three taxonomy of educational objectives or learning domain/behavioral domains in order to promote higher forms of thinking.The educational objectives are divided into three major areas of behavior called
i.                    The cognitiveobjectives because the behavior involved in this skill deal with recalling knowledge and applying intellectual skills
ii.                  The affective objectives because the behavior involved in this objective deals with attitudes and values.
iii.                Psychomotor objectives because the behavior involved in these objectives deal with motor skills.
 The objectives are described using narrower, more specific verbs (action verbs which can be evaluated at the end of the lesson).
These behavioral domains are sub-divided into sub-categories which are arranged from simple to complex. The student s are supposed to demonstrate mastery of each skill  at the lower order before  they move to the more advanced skills.
For example in the cognitive domain teacher should focus on helping students to remember information before they understand it, helping them to understand before they can apply it to a new situation. Each skill in the taxonomy represents a building block to the next.
These categories and sub-categories of learning domains are useful in preparing lesson objective and identifying learning outcomes.

Cognitive Domain
Cognitive skills are used by teachers to determine the level of thinking their students have achieved. The skills are ranked on a continuum from lower order to the higher order thinking. The knowledge level of the taxonomy represents lower level thinking because it focuses on memorization or recall of information learned while the rest levels in the taxonomy represent higher level of thinking and reasoning that calls for students to carry out thinking and reasoning process more complex than memorization.
There are six major categories of cognitive processes which are listed in order starting from the simplest to the most complex.
Competence
Skills demonstrated
Knowledge
Deals with recall of information or remembering previous learned information
Example.  knowledge of dates, events, specific facts, procedures, concepts, methods, principles, terminologies, names, prices of commodities, formula, places, major ideas ,mastery of subject matter
Question Cues   list, define, mention, name, label, collect, tabulate, what, who, when, where.
Comprehension
Deals with understanding of information or grasp of meaning, Example. translate knowledge into new context, interpret facts/materials, compare, contrast, order words or items, group items, infer causes and predict consequences, paraphrase, rewrite, explain ideas, summarizing information
Key words.   Summarize, interpret, contrast, predict, differentiate, estimate, reasoning, brainstorm, expand, explain, qualify, and propose.
Application
Use of information to a new situation,
Example. Applying methods, procedures, theories, concepts in the new situation,
Solving problems using required skills or knowledge learnt
Constructing charts or graphs
Key words.  apply, demonstrate, calculate, complete, illustrate, show, examine,  organize, modify, relate, classify, change, and discover, experiment, compare and contrast, construct, exercise,
Analysis
Separate materials or concepts into component parts
Example
-understanding the organizational structure.
-recognition of organization principles involved
-analysis of the elements
- Uncovering unique characteristics,
- recognition of unstated assumptions
Key words.   analyze, identify facts, separate,  classify, select, compare, arrange, distinguish, differentiate,
Synthesis
Deals with building a structure or pattern  from diverse elements
Examples
putting different parts together to form the whole
Formulating the new patterns or structure,
 relate knowledge from several areas, predict
Writing a creative story, poem, or song.
 Integrating the learning from different areas into a plan for solving problem
Key words.  Combine, integrate, modify, rearrange, substitute, plan, create, design, invent, compose, formulate, prepare, generalize, rewrite.
Evaluation
Making judgment about the values of idea or materials
Key words.    Examine, assess, appraise, critique, criticize, defend, evaluate, justify, support, relate, rank, discriminate, justify, rate, , weigh, decide, determine, summarize.


Affective Domain of Learning
Affective domain of learning consists of change of values, attitudes, and way of feeling, outlook, interests, emotions, appreciation, motivations and preferences. Affective Taxonomy is based upon the degree of a person’s involvement in an activity e.g. classroom activities or idea.

There are five levels of affective learning domain which are organized from simple to complex as shown in the table below;
Competence
Skills Demonstrated
Receiving Phenomena


Willingness to hear/listen/select and to pay attention.
For example. The ability of student to listen to others with respect, ability to remember the name of newly introduced people.
Key verbs:   choose, describe, follow, give, hold, identify, locate, name, point, select, use, reply etc.
Responding
Active participation on the part of the learner, attend and react to a particular phenomena
Example. Learners participation in class discussion, giving presentation, questioning  new ideas, concepts, models, theories, principles in order to fully understand them
Key verbs :  answer, assist, discuss, greet, help, label, perform and practice
Valuing
The worth or value a person attaches to a particular object, phenomena or behavior. This ranges from simple acceptance to the more complex state of commitment.
Example. Demonstrate belief in democratic process, informs the school management on matters that one feels strongly important.
Key words: Complete, demonstrate, differentiate, explain, follow, form, initiate, invite, join, justify, propose, read, report, select, share, study, work,
Organization
Organize values into priorities by contrasting different values, resolving conflict between them and creating a unique value system. The emphasis is on comparing, relating, and synthesizing.
Key words  stick on, alter/adjust, arrange, combine, compare, complete, defend, explain, formulate, generalize, identify, integrate, modify, order, organize, prepare, relate, synthesize
Internalizing values [ characterization]
The development of new values controls   person’s behavior. The behavior is persistent, consistent, predictable,
Examples, cooperate in group activities, [displays team work], displays professional commitment to ethical practice on daily basis. Revise behavior and changes in the light of new evidence. Value people for what they are, not how they look.
Key words. Act, discriminate, display, influence, listen, modify, perform, practice, propose, qualify, question, revise, serve, solve, verify.

Psychomotor learning domain
Psychomotor domains of learning consist of change in the way of acting, skills performance, physical productivity or manipulative skills and body movement.  The developments of these skills require practice, and they are measured in terms of speed, precision( accuracy/exactness), distance, procedures, or techniques in execution. There are seven sub-categories of psychomotor behavioral domain which are  arranged hierarchically  from simple behavior to the most complex one. The organization of taxonomy ranges from a student showing readiness to perform a psychomotor task to the use trial and error to learn a task andto actually carrying out a task on his/her own.
There are seven levels of psychomotor learning domain which are organized from simple to complex as shown in the table below;

Competence
Skills demonstrated
Perception [ awareness]
The ability to use sensory organs to obtain hints that guide motor activity.
Example. Detect non-verbal communication cue/signal; adjust heat stove to correct temperature by smell and taste of food.
Key words. Choose, describe, differentiate, distinguish, identify, isolate, select, relate.
Mindset. Readiness to act.
It includes mental, physical and emotional sets [mindset]
Example. Showing desire to learn a new process [motivation].
Key words. Begin, display, explain, move, process, react, show, state
Guided response
This includes imitation,  trial and error and practice.
Example. Perform mathematical questions as demonstrated. Follow instruction to build a model.
Key words. Copy, trace, follow, reproduce, and respond.
Mechanism
This is the intermediate stage in learning complex skills. Learner’s responses have become habitual and the movement can be performed with confidence and proficiency.
Example. Using a personal computer.
Key words. Assemble, construct, dismantle, display, fasten, fix, grind, heat, manipulate measure, mend, mix, organize, and sketch.
Complex overt response
This is the act that involve complex movement such as performing without hesitation.
Key words. Assemble, construct, dismantle, display, fasten, fix, grind, heat, manipulate measure, mend, mix, organize, and sketch [ here there are adverbs and adjectives that indicates the performance is quicker, more better and accurate]
Adaptation
Skills are well developed and the individual can modify movement patterns to fit special requirements.
Example. Responding effectively to unexpected experience, modify instruction to fit the needs of the learners
Key words. adapt, adjust, change, rearrange, reorganize, revise,
Origination
Creating new movement pattern to fit a particular situation or specific problem. Learning outcomes emphasizes creativity based on highly developed skills
Examples. Construct a new theory, develop a new comprehensive training programme,
Key words.  Arrange, build, combine, compose, construct, create, design, initiate, make, originate.

Table 1.  Examples of action verbs used to write educational objectives for each category of cognitive domain.
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Count
Classify
Compute
Break down
Arrange
Appraise
Define
Compare
Construct
Differentiate
Combine
Conclude
Identify
Contrast
Demonstrate
Discriminate
Compile
Criticize
Label
Convert
Illustrate
Outline
Create
Critique
List
Discuss
solve
Separate
Design
Grade
Match
Distinguish

Subdivide
Formulate
Judge
Name
Estimate


Generalize
Recommend
Quote
Explain


Generate
support
Recite
Generalize


Group

Repeat
Give examples


Integrate

Reproduce
Infer


Organize

Select
Interpret


Relate

state
Paraphrase


summarize


Rewrite





Summarize





Translate





Below are sample instructional objectives derived from Blooms Taxonomy with the taxonomic categories.
1.     Knowledge:         At the end of the lesson students should be able tolist three types of family.
2.     Comprehension: At the end of the lesson students should be able to distinguish between democratic states from authoritarian states.
3.     Application: At the end of eighty (80) minutes lesson students should be able to construct a bar graph by using information available in the table.
4.     Analysis:At the end of two hour lecture, students should be able to analyze the main features of monopoly capitalism.
5.     Synthesis: At the end of two hour lecture, students should be able to integrate information from the science experiment into lab report.
6.     Evaluation:At the end of two hour lecture, students should be able to judge the quality of varied persuasive essays
NB. Instructional objectives should be SMART
S=Specific
M= Measurable
A= Attainable
R= Realistic
T= Time bound                                                                      

Assessment of the learning domains
Different assessment approach characterizes the different behavioral domains. For example.
Cognitive Domain
The most commonly taught and assessed educational objectives are those in the cognitive domain [ Airasian and Russell, 2008]. Nearly all tests that students take in schools are intended to measure one or more of these cognitive activities. Teacher’s instruction is usually focused on helping students to attain cognitive mastery of some contents or subject area.
Cognitive domain is measured through paper and pencil test of various type as well as oral questioning ( e.g comprehensive examination, research defense).

Affective Domain
Affective domain is assessed by observation and questionnaires e.g rating scale
 Affective behaviors are rarely assessed formally in school and classroom.  Teachers assess affective behavior informally especially when grouping up students. For Example to determine students who can work under no supervision and who cannot.
 Most classroom teachers can describe their students affective characteristics based on their informal observations and interactions with their students.
Psychomotor Domain
Psychomotor domain is assessed by observing students carrying out the desired physical activity.
It is measured in terms of speed, precision, distance, procedures, or techniques in execution.
1.6. Table of specification of instructional objectives
What is table of specification?
TOS is the table that helps teacher align objectives,instruction and assessment when teaching. 
Table of specification has two dimensions:
i.                    The content dimension which includes the main topic of instruction and assessment
ii.                  The process dimensions which include the six categories cognitive domain related to each topic content or objectives.

Content dimension
                                            Process dimension/objectives
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
Stages of writing
X (L)
Mention the three stages of writing process
X(M)
Explain the purposes of the three stages of writing process




Topic sentences


X(M)
Write down the topic sentences            

X(L)
Differentiate topic sentences from other types of sentences

X (H)
How useful is the topic sentence in writing an essay?
Writing essays




X (H)
Write an essays  on the stages of writing process

Note:
                             L=Low ( time
                            M= Middle ( time)
                             H= high  amount of time 
                            X = Objective(s)

The intersection between process dimension and the  content dimension is referred to as objective. 
For example: X= Students should be able to mention the three stages of writing (prewriting, writing and editing).i.e students will remember the three stage of writing process.
L= Refers to the amount of time allotted to this objectives. Since it is a simple memorization task, a low amount of time is spent teaching it.
For example. Student should be able to explain in his or her own words the purposes of the three stages of the writing process. Stages of writing relate to two different objectives
Once objectives are identified and organized the next step is to develop a lesson plans for them. In selecting activities to be done so as to achieve the objectives the following should be taken into consideration:
i.                    The ability level of your students
ii.                  Their attention spans
iii.                Suggestions made in the textbook.
iv.               Additional resources available to supplement and reinforce the textbook.
In the first objective the table reminds teacher that he/she needs both remembering and explaining activities to attain the first two objectives.  For example specific objectives could read as follows;
a.     At the end of the lesson students should be able to mention the three stages of writing process ( knowledge)
b.     At the end of the lesson students should be able to explain the purposes of the three stages of writing process (comprehension).
Instruction
Once objectives and planned activities are identified, the next step is to instruct the student basing on objectives developed.
            Assessment
When constructing a test teacher need to be concerned with the class content/ the subject matter and the kind of thinking/responses required on the test.
A test should also align with the level of thinking required of students during instruction and assessment. For example if teacher taught about knowledge (recall information) the test should require students to recall information learned.

Decision in planning a test.
When deciding what should be included in the assessment and the type of tasks students should take, four most important questions teacher need to answer.
1.     What should I test?  
In deciding what to test it is important to focus on both the objectives and actual classroom instruction that took place.
2.     What type of assessment items or tasks should be given
The type of assessment item or tasks should base on learning objectives. For example from table above. The type of assessment procedure chosen depends on the nature of the objective being assessed.
a.     Comprehension( write in one’s own words)
b.     Applying ( write a topic sentence)
c.     Synthesize ( integrate and write an essay) 

3.     How long should the test take?
a.     The age of the students, the subject being tested and the length of the class period all affect the length of a test.
b.     The number of questions per objective depends on the instructional time spend on each objective and its importance.

Stages in planning classroom test
Tests are used to evaluate students learning. Testing involves determining the behavior to be measured and designing test items that will elicit desired performance. The goal of classroom testing is to improve teaching and learning. Things to consider when planning for a test.
i.             Determine the purpose of the test.
The purpose of the test is to:
a). to determine if student have perquisite skills needed for instruction and to find out what they know about the lesson to come ( pre-testing).
b). to monitor learning process, to detect learning problems and to provide feedback to students and teacher. ( formative testing)
c). to measure the extent to which instructional objectives have been achieved. (Summative testing).
      ii.         Developing test specification
Test specification entails making a test a representative sample of the instructional objectives (the content that was taught) by using a table of specification. The table of specification is made up of instructional objectives / behavioral objectives and the content
Table 1. Specification of concepts in measurement and evaluation
OBJECTIVES
Knowledge
Comprehension
Application
Analysis
Synthesis
Evaluation
No. Of test items
%
CONTENT








testing
2
1
2
1
1
2
9
18
measurement
4
2
2
2
3
2
15
30
evaluation
2
2
4
3
1
1
13
26
assessment
2
3
2
2
2
2
13
26
Total items
10
8
10
8
7
7
50

% of items
20
16
20
16
14
14

100




iii.              Selecting Appropriate Test Items
There are two types of test items namely objectives(multiple choice, matching and true/false itemsand subjective test items (essay items).


According to Thungu et al (2010) the allocation of marks  for cognitive skills can be as follows:
Skills
Percentage
Knowledge
12%
Comprehension
16%
Application
32%
Analysis
20%
Synthesis
12%
Evaluation
8%
Total
100%

iv.                Preparing a set of relevant items
The intended learning outcome will dictate the type of items to be used. If the intended learning outcome is to mention, name, list, the selection items will be appropriate. If the intended outcome is to identify, a supply type test items will be used. An item should be included only if it can measure a sample of the intended learning outcome.




















PRINCIPLES OF TEST CONSTRUCTION

Purpose of Testing
i). to identify what students have learned after the completion of a lesson or unit of instruction. These tests are also important when discussing student progress at parent-teacher conferences.
ii). to identify student strengths and weaknesses.
 This is effective when teachers use pretests at the beginning of units in order to find out what students already know and where the teacher's focus needs to be.
iii). It is used for placing students
iv). Tests can be used as a way to determine who will receive awards and recognition
iv). to assess teacher and/or School's Effectiveness
v). Show the depth of understanding of an idea or mastery of a skill
vi). Show student growth over time in a particular area of knowledge.
vi). Compare one student’s or group’s achievement to another’s on the same task.
vii). Predict students’ future performance
Construction of test items
General Rules for Writing Test Items
i). Use examination format as a guide to item writing. Examination format describes the scope and content coverage to be measured and the sample of tasks to include.
ii). Write more items than needed for a particular examination so as to allow the weaker items to be discarded during later review.
iii). Write items well in advance of the submission date. Setting items aside for several days and then review will help reveal any lack of clarity and ambiguity that was overlooked.
iv). each test item should call forth the performance described in the intended learning outcome.
v). Write each item so that the task to be performed is clearly defined. In formulating questions use simple and direct language, correct punctuation and grammar and avoid unnecessary wording.
vi). Write each item at an appropriate reading level. Pupil’ responses should be determined by the performance being measured and not by some factor the item was not designed to measure.
vii). Write each item so that it does not provide help in answering other items.  For example, a name, date, or fact called for in a short – answer item might be unintentionally included in the stem of a multiple –choice item in another part of the test.
viii). Write each item so that the answer is one that would be agreed upon by experts.  This rule is easy when measuring factual knowledge but more complex measuring complex outcome calling for the best answer such as the best reason, the best method, the best interpretation and the like. Be sure that experts would agree that the answer is clearly the best.
ix). Write each item so that it is at the proper level of difficulty. That is to say the difficulty of the item matches the performance to be measured and the purpose of the test.
x). whenever the item is revised, recheck its relevance i. e to be sure that it still provide a relevant measure of the intended learning outcome.


Classification of Tests
There are two types of tests
Objective tests/ selection items are those in which examinee select the correct answer from among a number of choices presented in them. Objective items include multiple choice items, true and false items and matching items.Objective tests are distinguished from subjective test in that the task is highly structured and limit the type of response. Students are not free to redefine and organize and present the answer in their own words.
Subjective / supply item test.It is an item format that requires the student to structure a rather long written response up to several paragraphs. Supply items require the students to supply or construct his or her own answer. Subjective/supply or constructed - response items include restricted response items, short answer items, completion items and essay items.







Objective/selection tests
Multiple Choice Items
Multiple choice items consist of a stem which presents   the problem or question to the student (premise) and a set of option or choices from which students select an answer. Options could be 4-5 and there should be only one correct answer. Incorrect but reasonable options in the multiple choice questions are called distracters. The problem may be stated as a question or as an incomplete statement. For example
i.                    Direct question
            Which of the following is not an element of weather?
a)    Humidity
b)    Leaching
c)    Sunshine
d)    temperature

    2. Incomplete statement
       …………………is one of the elements of weather
a). rainfall
        b). erosion
        c). deforestation
        d). leaching

Multiple choice items are used to measure simple and complex learning outcomes.
a). knowledge of terminology/vocabulary/terms
Example   An organism living in or on another organism is
a.     A predator.
b.     Prey.
c.     A parasite.
d.     A host.
b). Knowledge of specific facts  Multiple choice can be used to assess students’ grasp of discipline based factual knowledge it deals with what who where and when
Example: Which of the following states does not border Oklahoma?
a.     Colorado
b.     Missouri
c.     Nebraska
d.     New Mexico
c). Knowledge of Procedure
Example: The correct procedure for combining acid and water is
a.     Add acid to large amount of water
b.     Add water to large amount of acid
c.     Add acid to water, cool and swirl
d.     Add water to acid, cool and swirl
d). Knowledge of Principles
Example. The principle of capillary action helps to explain how fluids:
a.     Enter solutions of lower concentration
b.     Escapes through small openings
c.     Pass through small semi-permeable membranes
d.     Rise in fine fluids
Multiple choice questions also measure higher level outcome such as:
i.             Application ( Faradays law can be used to explain
a).
b).
c).
d).

ii.           Interpretation
Majimaji war occurred  in the southern part of Tanzania because.
a).
b).
c).
d).
iii.         Justification of methods and procedure
Why do farmers rotate their crops?
a).
b).
c).
d).



Guidelines for constructing multiple choice type of item
i.             State the problem clearly in the stem
For example:
The components of a multiple-choice item are 
ii.           Include one correct or most defensible answer
For example
According to …….the most serious aspect of the energy crisis is the
iii.                Select attractive distracters. Distracters should be attractive to examinees
iv.               Options should be presented in a logical, systematic order. For example dates of events should be arranged chronologically, numerical quantities in ascending order and names in alphabetical order.
v.           Options should be grammatically parallel and consistent with the stemother wise they can provide clue to the correct alternative.
Example 1
A test which can be scored by untrained person in the content area of the test is an
a.     Diagnostic test
b.     Criterion-referenced test
c.     Objective test
d.     Reliable test
e.     Subjective test
Examinees take advantage of inconsistent stem and options to get the correct answer. They respond in terms of verbal skill possibly quite different from the skills intended to measure.
The item might be rewritten as follows
A test which can be scored by untrained person in the content area of the test is said to be
a.     Diagnostic
b.     Criterion-referenced
c.     Objective
d.     Reliable
e.     Subjective
vi.               Options should be mutually exclusive i.e it should contain one option which is the most correct or the best answer.
vii.             Ensure that correct responses are not consistently shorter or longer than other distracters. The difference in length might give clue o the correct answer.
viii.           The options such as “none of these”, “none of the above”, “all of these”, “ all of the above” should not be used when the examinee is to select the best but not necessarily absolutely correct answer.
ix.                Correct answers in a test should appear randomly

What are the advantages and disadvantages ofmultiple choicetest items?





















Matching Items
Matching items consists of two columns;
a.     A  column for the stem/problem to be answered called Premise( Column A)
b.   A column of responses(column B).
Normally a column of stem is placed on the left hand side and the column of responses is placed on the right. Matching items often measure recognition of factual knowledge based on simple associations that may include:
·         Persons who are associated with events
·         Dates with historical events
·         Terms with definition
·         Rules with examples
·         Symbols with concepts
·         Parts with functions
·         Plants/animals with classification
Guidelines for writing matching items
i.                    Include homogenous materials in each exercise
ii.                  Include at least three to five but no more than eight to ten items in a matching set why? Long set of matching items require examinee to do a             good deal of work in keeping track of stems and searching for options. Furthermore it is difficult to write long matching items which are homogenous. Thus three to eight items per matching set is a reasonable compromise.
iii.                Eliminate irrelevant clues. There should not be verbal association clues, plural and singular clues between the stem and the correct option pair.
iv.               Place each set of matching items on a single page
v.                 Reduce the influence of clues and thereby increase the difficulty of matching item. This can be accomplished through
a.   Using a different number of options than there are items
b.   Allowing each option to be used more than once.
vi.               Compose the response list of single word or very short phrases
vii.             Arrange the responses in systematic order: alphabetical, chronological. This order enables examinees to find correct responses more quickly.
viii.           A column of response should have more items than the other.

ix.                Items in the columns should be grouped homogeneously
E.g. LIST A                                                  LIST B
1. Leonardo Da Vinci                               a. American Gothic
2. Edward hopper                                     b. The Thinker
3. Michelangelo                                         c. Mona Lisa
4. Auguste Rodin                                      d. The last Supper
5. Grant Wood

What are the advantages and disadvantages of matching test items?

True-False Or Alternative Response Items

These are test items with only two possible answers.It consists of declarative statement that the pupil/student is asked to mark true or false, right/wrong; correct/incorrect; yes/no, agree/disagree etc. Because of these different responses they are called alternative responses.
The alternative response or true and false items are used in measuring.
a.     The ability to identify the correctness of statements or definitions of terms.
b.     The ability to distinguish facts from opinion.
c.     The ability to recognize cause and effects relationship.

Guidelines for Constructing True-False/Alternative Items
i.                    Include only one idea in each item
ii.                  Eliminate partly true-partly false  items
iii.                Ensure that true and false items are approximately equal in length.
iv.               Balance the number of true items and false items
v.                 Eliminate vague terms of degree or amount.E.g words like frequently, seldom are open to interpretation in the true-false items.
vi.         Use caution in writing negative item statements.

What are the advantages and disadvantages oftrue-false or alternative response test items.

Subjective/Supply Test Items
Subjective items consist of completion (fill in the blank) items, short answer items, essay type items.
1.     Short answer  present the problem with the direct question which require the students to answer using their own constructed responses.e.g What is the name of the first president of Tanzania?
What are the main parts of human body?
2.     The completion items present the problem as an incomplete sentence. E.g. the name of the first president of Tanzania is ……………………………………..
Short answer and completion items assess primarily factual knowledge-recall-dates, places, specific person and comprehension.
           The main parts of human body are i……………………….ii……………iii………………iv………….
          Guidelines for writing good short answer items
i.                    Construct the stem so that the answer is definite and brief.
ii.                  Make sure that there is only one correct answer
iii.                Avoid lifting sentences from textbook
iv.               For completion and fill-in-blank formatting
-          Make response blanks equal length
-          Avoid grammatical clues preceding the blank.
-          Do not use too many blanks in one item-usually no more than two
-          Include enough information in the stem to ensure the desired response
Essay Items
Essay items allow students to communicate a unique constructed answer to a question.
There are two categories of essay type questions. These are:
i.             The restricted-response questions are essay questions that limit  content in terms of scope and response. A student is required to state or list factors, reasons, differences, similarities, merits and demerits. Such questions limit the student in terms of content of the answer and length of the response.

Example-
Limited content
List the types of leadership style
Limited response
Briefly the advantages and disadvantages of each style.

ii.           The extended-response items
These are test questions that require the students to select factual information, organize the answer in a way they like, integrate idea as they deem appropriate. Extended response questions are used to measure the ability of student to select information, organize, integrate and evaluate ideas.
Example
Describe the influence of climate change on agricultural development in Africa today
Evaluate the significance of participatory teaching techniques at primary school level in Tanzania.

Principles for construction of better essay questions/items
·         Essay questions should measure learning outcomes that cannot be satisfactorily measured by objective test items.
·         They should measure the achievement of instructional objectives.
·         Each question should indicate clearly the task to be undertaken by students
·         They should indicate the time limit for each question


What are the advantages and disadvantages of essay test items.
ASSEMBLING, ADMINISTRATION AND ANALYSIS OF TEST RESULTS
Classroom testing process
i.                    Identify the learning outcomes to be tested and measured
ii.                  Selection of appropriate test format
iii.                Construct test items that are relevant to learning outcomes specified
iv.                Assembling of the  test questions
Assembling Classroom Test
Assembling classroom test refers to the process of grouping test items by type such as multiple choice, true-false etc.
The importance of grouping test by type is
i.                    To avoid the necessity of students shifting from one response mode to another as they move from item to item.
ii.                  To help students cover more items in a given time
iii.                Makes scoring easier
Organization of test items.
One of the important considerations in assembling the test is the order in which the item types are presented. In most tests selection items come first and supply items come last.
Guidelines for assembling test items
1.      Record test items in a special way e.g on a paper
2.      Review test items several times so as to make items appropriate to learners’ outcomes that are intended to be measured.
3.      Arrange items in a logical manner according to the examination format
i.                    Organize test items by type selection before  and supply items last
ii.                  Do not split multiple choice or matching items across two pages of the test
iii.                Separate multiple choice option from the stem by beginning the options on a new line.
iv.               Number the test items
v.                 Space items for easy reading and writing responses.
vi.               Make sure that you have enough copies of examination.
vii.             Provide enough questions to ensure reliability
4.      Prepare instructions to be followed by students in answering the test items
5.      Each section of a test should have instructions that direct students what to do
Test Administration and Marking
Test administration involves establishment of a conducive physical and psychological setting that allow students to demonstrate their best performance as well as to manage time.
Guidelines for administering test
i.                    Creating a quiet comfortable Physical and psychological setting. 
Physical setting
Examination environment should be quiet and comfortable. This can be achieved through minimizing interruption of any kind. Some of the ways to minimize interruption in the examination room are:
a.                   Posting a sign on the door indicating that testing/examination is in progress.
b.                  Proofreading the test items and directions before administering it
c.                   Ensuring that enough facilities such as desks, chairs, clock are available
d.                  Making sure that there is enough ventilation and light
Psychological setting
This involves creating psychological setting that reduces student anxiety. Test anxiety is diminished through informing students on test, giving students good instruction, a good unit of review.
ii.                  Keeping track of time by informing student on the remaining time
While administering test, teacher should be aware of cheating. Cheating is a common disease in school. Students cheat in the examination  for various reasons such as:
a.       Pressure from parents/teachers
b.      Failure to prepare and study for the  test
c.       Internal pressure from being in an course that gives a limited number of high grades
d.      Danger of losing a scholarship
Forms of Cheating
a.       Copying from another student’s examination/test answers.
b.      Dropping a test paper so that others can copy from it.
c.       Writing test information on an eraser or a  small piece of paper and passing it to another student or using it.
d.      Developing codes, formula, key words on object for use in the test
e.       Changing answers when teacher allow students to grade each other.
f.       Keeping test information in a toilet room
g.      Writing test information on the arms or thighs to cheat.

h.      Use programmed material in watches or calculator in the test room.
i.        Look at another student’s paper    during a test
How can we discourage cheating?
a.       Search students while they enter they enter a test room.
b.      Providing students good instruction and information about a test
c.       Before testing, students’ books and other materials should be kept away from the test room
d.      Observing students during testing
e.       Knowing the common methods of students’ cheating
f.       Students’ seat should be spread out in the test room
g.      Discouraging students’ to wear caps in the test room.
h.      Using different test forms.
i.         Assigning students seats for a test
j.        Giving more in-class test and fewer take home test.
Scoring of Tests
The process of scoring a test involves measurement that is assigning a number to represent a student’s performance. It provides a summary of student’sperformance. The complexity of scoring varies with the type of test. Selection test is easier to score than the supply item test.
Scoring the selection test
Selection test consists of multiple choice, matching and true-false test items. Scoring selection test is objective because they are brief and have only one correct answer. There are different methods of scoring objective items. One common method is to put a tick to the correct answer and a cross to the wrong answer. However it is advised to indicate the correct answer instead of a cross to the wrongly answered item and the score instead of a tick to the correctly answered item.
Scoring Short Answer Test
Short answer and completion test items call for short responses like word, phrase, date, name etc. Therefore scoring is not difficult and can be quite objective
Scoring essay test items 
Essay item is the most complex item to score because essay questions allow each student to construct a unique and lengthy answer/response to the question posed. Therefore there is no single answer key uniform to all responses. Thus the interpretation of responses is necessary.

Factors which undermine teacher’s ability to evaluate essay fairly and reliably are:
a.       Halo effect i.e. irrelevant factors canattract the attention of the marker  making an essay  appear better than it really is. Such factors include:
i.                    Hand writing
ii.                  Style of writing such as sentence structure
iii.                Spelling and grammar
iv.                Neatness
b.      Identity of the student
c.       Location of one’s paper in the pile of test papers
d.      Teacher’s dislike for a student
e.       Teacher’s mood
How to minimize biases
a.       Develop a scoring guide (rubric)/ marking scheme.  Scoring guide lists the key components in the essay that will be graded as well as the level of performance that will receive points or it refers to a short description that defines each level of performance along with the number of points that level will receive.
E.g accuracy of the content; language/vocabulary; sources/citations; spelling/ grammar, organization of essay
b.      Teacher should identify students by number when scoring essay responses.  
c.       Score student on the basis of present performance, not on the ability, interests or past performance of student.
d.      Inform the students on the demands of essay questions such as good hand writing, proper punctuation, spelling, accuracy and organization of the essay
e.       Score the first essay for all students before moving to the next essay in order to be consistent and to do justice to students when scoring.
f.       Describe in advance how you are going to handle factors that are not relevant to the learning outcome being measured. Such factors include: spelling, handwriting, sentence structures, punctuation and neatness.
g.      Re-read essay answer a second time after scoring so as to check objectivity.
Approaches to scoring essays.
There are two approaches to scoring essays
a.       Holistic scoring which provides a single overall score/ grade of the complete essay. Holistic score is useful when the overall impression of student achievement is made.
b.      Analytical scoring. This provides separate score for each components of a test e.g. score for accuracy, organization, supporting arguments, grammar and spelling. Analytical score provides students with detailed feedback that can help students improve different aspects of their essays. It is useful when determining the strengths and weaknesses in student’s work or to assess multiple objectives that are integrated in the essay.


 ITEM ANALYSIS
 Item analysis refers to the process of judging the quality of selected response test item.  It is a set of procedures designed to evaluate the quality of test items used for assessment.Item analysis is done after a test has been done and scored so as to determine whether
·         each item in the test functioned as it was intended
·         the item was capable of discriminating between the best and weak student in terms of achievement
·         the item was able to measure the effect of teaching and learning process
·         The item was of appropriate difficulty.
Individual item for assessment can have unique characteristics namely:
i.                    Item difficulty ( how hard a test item is)
ii.                  Item discrimination (tells us how frequently an item is answered correctly by those who perform well on a total test). Item discrimination reflects the relationship between student’s responses for the total test and their responses to a particular test item.
Item Difficulty
Item difficulty is the ratio or percentage of individuals who answered an item correctly.
Item difficulty index = number of correct answers
Total no. of students who answered the item
The easier the item, the larger the item difficulty index.If item 1 is answered correctly by 15 out of 20 students then the item difficulty index is 15 which is 0.75 or 75%
20
Item difficulty is used as a measure of how hard an item is for all students, those who performed well overall and those who performed poorly. A good assessment is one that balances the difficulty of items to provide information about a range of student abilities and performance.

Item discrimination
Is the degree to which an item differentiate those who have higher level of achievement from those who have lower level of achievement. The discriminating power of an item is a measure of the ability of an item to distinguish between those students who performed well overall on a test and those who did not.


Procedures for analyzing test items
h.      Identify the three groups of students in the classroom, the higher, middle and lower performing students. Ranks order all of the test papers from the highest score to the lowest score.
iii.                 select about 25% of papers from the top and 25% of the papers from the bottom
iv.                Put aside the middle papers as they will  not be used for analysis
v.                  For each test item tabulate the number of students in the upper and lower group who selected each alternative
vi.                Compute the difficulty index of each item for the upper and lower group using the following formula

Item difficulty index = number of correct answers
Total no. of students who answered the item
Example. Item number 1
Tanganyika attained its independence in
                           High Low
10             6           a. 1961

                              8              4          b. 1962
00c. 1965
2             10d. 1967

Total           ( 20 )       (20 )


Option (a) is the correct answer of an item number one (question 1)
Calculate the item difficulty index on each item for the high and low groups.
High group. Item difficulty = No. of students who answered the item correctly
No. of students in that group
                                                       =   10
20
= 0.5 or 50%
Low group. Item difficulty = No. of students who answered the item correctly
No. of students in that group
                                                =   6
                                                =   20 
                         =   0.3 or 30 %

Item discrimination


Take the item difficulty for the high group – the item difficulty for the low group= item discrimination.

Item discrimination index = 0.5 – 0.3= 0.2
item discrimination values range from -1.00 to + 1.00
The discriminator can either be positive or negative
Positive discriminator is an item that is answered correctly by majority of students who did well on the test compared to those who performed poorly. The more positive the discriminator the better the item is functioning in differentiating among the varying levels of achievement. Such item is said to be precise, useful and effective test item.
Negative discriminator is an item that is answered correctly by the majority of poor performing students compared to those who did well overall. Such kind of item is undesirable.
Non discriminator is an item which does not differentiate between the higher performing and the low performing students.
The purpose of item discrimination is to compare the response rate of the high-performing students to the low performing students on individual items.

vii.              Evaluate the effectiveness of distracters in each item (the effectiveness of incorrect alternatives. This is achieved by inspecting the number of students in the upper and lower group who selected the distracter being evaluated.
For example the result in item number 1 of a test was as follows:
Example. Item number 1
                                                              Tanganyika attained its independence in
High Low
                             10             6              a. 1961

                              8              4              b. 1962
                              0              0              c. 1965
2             10             d. 1967

Total            (20)       (20)


Option (a) is the correct answer of an item number one(question 1)
Interpretations
a.       Option A is a good option and it functions as intended because it attracted more students from the upper group.
b.      Distracter B is a poor distracter because it attracted more students from the upper group than students from the lower group
c.       Distracter C is ineffective because it attracted no student
d.      Distracter D is a good distracter because it functions as intended by attracting more students from the lower group than from the upper group
Effects of item analysis
i.            Provide a base for efficient  discussion of test results
iii.                Provide a base for improving classroom instruction by revising the curriculum on the part that seemed to be difficulty
iv.                Provide a base for improving skills in test construction
v.                  Provide a base for carrying out remedial teaching in area that are difficulty
Grading
Grading is the process of holistically evaluating student’s performance and assigning evaluative symbols to represent what learners know and can do or may not know or be able to do as evidenced by various assessments( Airassian, 2008).  They represent teachers’ summary judgment about how well students have mastered the contents and processes taught in the subject area during a particular term or grading period. Grades are based on two dimensions such as:
i.        Analysis of assessment data such as quizzes, homework, tests, assignments and others.
     ii). Interpretation and communication of grades. Having gathered data from your student’s assessment, teacher need to make judgment about the meaning of these data. The interpretation should base on a set of criteria your school has established. Thomas Guskey and Jane Bailey (2003) identified three types of learning criteria used in grading and reporting. These are:
a) Product criteria. This is a type of grading which is based on the final examination report (summative evaluation)
b)      Process criteria this is the process of grading and reporting which is based on the course work and final examination.
c)      Progress criteria. This is the grading system which deals with how much students have gained from their learning experiences (e.g. oral comprehensive examination).

Why do we grade?
The purpose of grading includes:
i). to communicate students’ academic achievement to students, parents and others. However grades become distorted when non academic factors such as attendance, efforts, attitudes, class participation, group work, class discussion or behavior are included.
ii) Administratively, grades are used
a). to determine the students’ ranks in class,
             b). to credit for graduation
              c). to determine the  suitability for promotion or graduation or employment.

iii). they are used to determine the strength and weaknesses of the different teaching approaches for teachers.
iv). they are used to motivate students and parents to improve students efforts.
vii)              They are used for guidance. They help teachers, students and counselors to choose appropriate courses and course level.
viii)            They help teachers to identify students who are in need of special services.
vi) They are used to sort out the best student from the rest.
     How Do We Grade?
There are different forms of grading namely
i)                    Letter grade e.g. A, B, C, D, E, F,
ii)                  Using standard based achievement categories such as excellent, good, fair, poor
iii)                Using percentage or numerical grades such as 100%, 90% or 100, 80, 70
iv)                Using pass/fail system
v)                  Using point system( tracking grades by adding the points received during the term e.g quiz = 6/10; group work = 8/10 etc
vi)                Use of teachers written comments
Classroom grading is based on teacher’s judgment. Teachers’ judgment is based on 
i). information about the performance being judged (test scores, book report, performance assessment).
ii). a basis of comparison that can be used to translate that information into grading judgments ( e.g what level of performance  is worth A, B, C, D, etc.
Approaches to Comparison for Grading.
Grade is a judgment about the quality of a student’s performance. Several bases of comparison can be used to assign grades to students. The most commonly used classroom grading compare a student’s performance to:
i.                    The performance of other students
ii.                  Predefined standards of good or poor performance
iii.                Student’s own ability
Comparing student’s performance with other students (Norm referenced Grading).
It  refers to the process of assigning grades by comparing the performance of one student with the performance of other students For example when teacher says that Helen has performed better than the rest of students in the class he/she is making norm referenced grading
Comparing student’s performance with the pre-established performance standards) Criterion referenced grading).
The performance standards define the level or score that a student must attain so as to receive a particular grade. All students who reach a given level get the same grade regardless of how many students reach that level. For example, students’ assessment contain two parts the course work and final examination. Passing the course depends on getting 50 percent of the total marks. Thus 50% is the performance standard. Pass or failure will depend on how you compare yourself to the performance standards of 50 percent.
Comparison with student’s own ability/Ability based grading approach
It involves comparing student’s actual performance with the performance they expect based on teachers’ judgment of the student’s ability. The term overachiever and underachiever describe students who do better or worse than teachers’ expectations for what they should be doing. Normally many teachers assign grades to students by comparing a student’s actual performance with their perception of the student’s ability.
Disadvantages of perception based grading system are;
a)      The approach depends on teacher having an accurate perception of each student’ ability. In the real sense teachers do not know the reality about student’s ability
b)      Teachers get difficulty time differentiating a student’s ability from other characteristics such as self-assurance, motivation or responsiveness. Currently several studies have revealed  multiple abilities that help students learn and perform in different modalities such as visual, oral, written etc hence, which one should a teacher focus on to judge student ability?
c)      Perception based approach confuses parents and outsider. For example a high ability student might attain 80% mastery of instruction and might receive a C grade if perceived to be underachieving while a low ability student who attains 60% mastery might receive an A grade for exceeding expectations. An outsider might think that the low ability student has mastered more of the course because he/she got high grade.
Grades are regarded as prize that you receive when you study hard or a punishment you get when you do not work hard.Some negative effects of grading are
i.                    Student’s getting low grade may lose their self - esteem
ii.                  Failure to graduate if you receive low marks
iii.                Detainment if you get low grade
iv.                School dropout























SUMMARIZING TEST RESULTS
Summarizing involves synthesis of assessment information into a single grade. The steps involved in summarization are
i.                    Combine information from various assessments into a single grade
ii.                  Each type of assessment information should be expressed in terms of the same scale so that all information can be combined into a composite one
iii.                Compute the overall scores by:
a). giving each kind of assessment the weight it deserves
b). sum the scores
c). divide the total scores by the number of assessment information.
SUMMARIZING TEST RESULTS

INTERPRETATION OF TEST RESULTS
Once you have scored and graded students’ tasks you need to interpret in order to get meaning from it. Scores on an assessment tells only part of the story. To be meaningful the scores must be interpreted with respect to other variables such as:
i.                    The scores of other students
ii.                  The student’s prior performance on similar assessments
iii.                The contents of the items answered correctly.
statistics provide a picture about individual, group performance as well as the effectiveness of instructional method. This is because statistics helps us to know the typical/average student’s performance on assessment, the overall performance and the spread of scores i.e the lowest and the highest scores
Ways of Showing the Distribution of Scores
The distribution of scores shows the pattern or organization of data so as to detect meaning from it. The distribution of scores can be indicated through:
i.                    Frequency table. It is developed by arranging scores from the lowest to the highest score. Then tallying the number of times a particular score occurred. From the table one can:
a.       Compare the performance of individual against the others
b.      See the distribution of scores i.e the highest the middle and the  lowest scores
c.       See the poorly and well performed students.
ii.                  Histogram. This is a pictorial representation of data in the form of bar graphs. It is used to display frequency distribution. It has two axes, the X-axis (horizontal line) which displays the scores and the Y-axis (the vertical line) which display the frequency of each score.
iii.                Frequency polygon. It is the line graph similar to bar graph.
 Measures of central tendency
This is a numerical summary of a set of scores. There are three measures of central tendency; mean, median and mode. Each of these is a different ways of summarizing scores into a single number.
i.The mean (X) it is an arithmetic average of a set of scores. it is calculated by summing up individual scores and dividing by the total number of scores. The formula is:
Mean (X) = total number of individual scores
Number of students (N) or Scores
The mean uses all scores in the set of data. Every assessment score is used to calculate the mean including those who did extremely well and those who did extremely poorly. . Scores that are quite different from the majority (either higher or lower) are called outliers. Outliers can distort the mean by pulling it lower or higher than what might be the typical or average performance on the test. A skewed distribution that is pulled lower by outliers is called a negatively skewed distribution. A distribution that is pulled higher by the outlier is a positively skewed distribution
What is the importance of knowing the shape of the distribution? It helps teacher to know how students have been growing or not growing from the entry point (test done before instruction) to the end of instruction


Advantages of the mean
a.            It takes all of the scores into account. None of the scores is left out
b.            It is simple to calculate
Disadvantages of the mean

a.            It is affected by extreme values or outliers. Outliers tend to pull the mean lower than we might expect. When thereis no outlier the mean is high.
b.            The mean may not exist in the data set


j.        The median. This is the middle score in a set of scores. it is calculated by following certain procedures such as
a.       Arranging the scores from the lowest to the highest
b.      Determine the middle score/s
c.       If there are odd number the median is the middle score and if there are even numbers the median is obtained by adding the two middle numbers and then divide by two. The median is best used when you are concerned that outliers might be affecting the mean making it less representative of a group of scores.
Advantages of the median
a.            It is not affected by outliers
b.            It is easy to compute and comprehend.
c.            It is useful when comparing a set of data
Disadvantages of the median 
a.            Sometimes the median is the number that is not actually present in a data set
b.            It consumes a lot of time to sort outscores from the smallest to the highest.
c.            Does not take into account all the data in a data set/does not use all information available
The mode.  Mode is the most frequently occurring score in a set of scores (Popham, 2008; Musial et al, 2009).  In a set of scores there can be two frequently occurring numbers, and then we call this a bimodal distribution. In case there are more than two modes we would call these multimodal distributions.
Example:  1. Scores   35, 56, 73.67.43.62.70, 39.45, 51.56.61.56, 71, 82, 80, 66, 58.64, 54.
                                    The mode is 56 (the frequently occurring number in a set of scores)
Example 2 Scores   35, 56, 73.67.43.62.61, 39.45, 51.56.61.56, 71, 82, 80, 61, 58.64, 54.
                                    The modes are 56 and 61 (bimodal distribution)
Example:  3. Scores   35, 56, 73.67.43.62.61, 39.45, 51.56.61.56, 73, 82, 80, 61, 58.73, 54.
                                    The modes are 56, 61 and 73 (multimodal distributions).
In a set of score where there is no frequently occurring number there is no mode.
.
Advantages of the mode
a.            It is simple to determine
b.            It is not affected by extreme large or or small values
c.            It is useful for qualitative data

Disadvantages of the mode

a.            It focuses only on the most frequent number in a data set leaving other scores
b.             
Measures of Variability
Measures of variability tells us about
i.                    The variability of student learning and the overall effectiveness of instruction.
ii.                  The consistency of student performance
iii.                Whether the scores are spreadout or bunched together
The measures of variability include: range, standard deviation and variance
Range
Range is the differences between the highest scores and the lowest scores. It deals with the consistency or diverse of a set of scores (Musial et al, 2009).
       How to calculate:  Range = Highest score – Lowest score
       Example  56,67,63,38,62,66,45,51,53,43,52,44,77,58,69.
       Range   the highest score = 77; the lowest score = 38
Range = 77 – 38 = 39
Advantage   =   it is easier and quicker to estimate
Disadvantage= it is greatly influenced by outliers i.e.  higher or lower scores.
Standard Deviation
Standard Deviation is a measure of the average distance each individual score is from the mean.  It indicates how spread out the scores around the mean. If the standard deviation is relatively small compared to the mean then the scores are more homogeneous that is they are grouped together.This means that on average individual scores do not deviate much from the mean. When the standard deviation is large we say individual scores are heterogeneous (they are spread out) meaning that on average the individual scores do deviate quite a bit from the mean.
SD tells how spread out or clustered a set of scores are from the mean. This helps teachers to see how variable student performance is in a classroom.
How to calculate the SD
SD =
(X-X)2  =  represents each individual scores minus the mean, squared;
   N    = the number of scores
EXAMPLE
Scores: 20, 20, 25, 25, 30, 30,
i.                    To calculate the mean

Individual Scores  X

The Average  (Mean) X

Deviation (X-X)

Squared Deviation ( X-X)2
20
25
20-25 = -5
25
20
25
20-25 = -5
25
25
25
25-25 = 0
0
25
25
25-25 = 0
0
30
25
30-25 = 5
25
30
25
30-25 = 5
25




  X = 20+20+25+25+30+30=150


 ( X-X)2 = 25+25+25+25=100
N = 6


N = 6
X =  X  = 150  = 25
         N      6


  ( X-X)2= 100  = 16.7
      N            6

16.7     

Variance = sum of the squared deviations
SD =
The average distance of each score from the mean of 25 is 4.1, this means that on average the scores are approximately 4 points above or below the mean. SD is an indicator of how spread is the scores from the mean. The larger the SD the more the spread the scores.
SD uses all the scores in a set thus it is likely to be representative of the spread of scores. It is used as a unit of measuring. It could tell which student scored two SD higher or lower than the mean.
Conclusion
1.      A teacher can display test scores in a meaningful ways by using frequent tables, histogram or a frequent polygon.
2.       Measures of central tendency such as the mean, mode and median can be used to extract meaning from test scores
3.      A teacher can determine the variability among scores by calculating the range and standard deviation
Generally measures of central tendency and variability can be used to judge whether students met learning objectives and how effective instruction was.



























ASSESSMENT OF NON-COGNITIVE OUTCOMES AND INTELLIGENCE QUOTIENT (IQ)

Classroom observation techniques

Teaching is driven by what we observe in the classroom. Observation is the process of gaining information by watching and listening to students, it can be used to evaluate student’s knowledge, skills, disposition, and behavior.  Through simple observation teacher know when students follow directions to complete an assigned task or they do not.
What actually don teachers need to observe?
Teachers observe both appropriate behaviors so as to increase as well as inappropriate behaviors in order to decrease.   Teacher’s classroom observations are based on
Academic skills such as reading, mathematics, science, social studies, language. Academic skill is assessed and stated in observable and measurable terms.
Psychomotor skill such as physical movement in various sports, dance, performing arts, singing, playing and so on.

Prosocial skills which involves attitudes, feeling, belief and disposition.

Approaches to Classroom Observation
Observation tools such as anecdotal notes, observation checklist and rating scales are widely used for observing and assessing student’s learning.

i.                    Anecdotal notes or records

This is a technique which is used to document observations of significant skills or behavior of students.  It records factual description of incidents that teacher has observed personally.
It provides a purposefully and detailed description of the strength and weaknesses of a student  performance based on pre-specified performance criteria such a student’s ability to: transition to a new activity, follow instructions, focus on the task at hand.
The records can be
i.                    Anecdotal notes which consists of date, name of student, setting and incident/s or what happened.
Anecdotal Notes
Student name  Grace Luis                                                                        Date 2/1/2015
Setting: Group poster project
What happened/incident
Today during group project, Grace complained about the marker colors she was given. I reminded her of the rule but she grabbed a marker and scribed on the poster, ruining it.
Anecdotal A,B,C records

Date/time
Context/activity
Antecedent
Behavior
Consequences
Student reaction
2/1/2015
12.00 Noon

Students were working on a group poster project
Grace was the material manager for Grayson’s group. She gave him 4 light colored markers
Grayson said these colors stink. he then grabbed a black marker from grace and scribed all over the poster
I stated the rule and punished him by taking him out of class for some time.
Grayson returned into the classroom and joined the workgroup.
NOTE:
Context= the setting
 Antecedent= what happened before the behavior
Behavior     = what the behavior looks like
Consequences= what happened after challenging behavior

The usefulness of anecdotal notes/records
It is useful when writing report cards comments in parent or student conference
They are useful if intervention such as acceleration or remedial teaching is needed for the students

Observation Checklist
Is a list of behavior that is used to asses a student’s skills such as academic, psychomotor and prosocialskills. Teacher observes the skills and marks them as present or absent, correct or incorrect. Each skill should be written in such a way that it is observable and measurable.

Observation checklist for school facilities (put a tick where appropriate)
School Name
District
Date
FACILITY
AVAILABLE
NOT AVAILABLE
Classrooms


Toilets


Teachers Houses


Library


Laboratories


Sports grounds


Text books


Reference books





Rating scales
Is a form of checklist which consists of a list of qualities that are judged according to a scale that indicate the degree to which each quality is present. Each characteristic can be observed according to some underlying degrees of accomplishment. 
Descriptive Rating Scale
 It is a rating scale which is based on a series of adjectives or thumbnail sketches. They allow teacher to rate the adequacy or inappropriateness of a student behavior on the scale. In constructing a descriptive rating scale
1.      Specify the observable behaviors that are important in your case.
2.      Write adjectives that describe point on a scale. The best way to develop adjectives is to determine the best and the worst likely performances and then choose in between levels to create the full scale.








Example of rating scale designed for a student working math problem individually.
Student………………………Date……………………..Assignment: Math skills




Work on problems
Doesn’t start problems
Starts problems abandons some without finishing
Works  each problem until completed
Checking work
Doesn’t check work
Checks some work
Checks all work
Correct mistakes
Doesn’t correct mistakes
Correct some mistakes
Correct all mistakes
Stays on task
Is distracted several times
Is distracted once
Stays on task during work time

Numerical rating scales
It is a rating scale which associates number with descriptions along the scale. The higher the number the greater the accomplishment and the lower the number the lower the accomplishment. It is used when summarizing observations across some period of time. The number of points within a rating scale could be based on the number of times a particular behavior has been noted. This kind of rating could look as follow:
1.      Never   = behavior is not observed
2.      Occasionally = behavior has been performed but repeated instances of nonperformance  are observed
3.      Usually = behavior is performed but a small number of instances of nonperformance are observed
4.      Always = behavior is consistently and regularly performed

EXAMPLE. Rating scale for a group project
Group work Rating scale
Project……………………………………………..
Rating scale
1.=seldom or never
2. =some/only part of the time
3.= usually
4.=Always


Group 1
Group 2
Group 3
Stays on work



Makes progress



Participates in group



Respect other groups



Cleans up












Rating scale for a individual project

Observed student……………………………………
Date…………………………………………
Activity……………………………………………

The student activities
1 Never
3.      Seldom
4.      Often
5.      Always
Work with a wide range of peers not just with close friends




Shares materials and ideas with others




Participates in discussions




Fulfills his or her responsibilities




Show respect for others by listening and considering others points of views




Contribute ideas to the group discussions





Advantages of assessing through observation
1.      Allows teacher to assess and monitor progress and behavioral skills as part of normal teaching
2.      Allows teacher to discover unique information such as skills and problems that would be difficult to discover  by through other means
3.      Observation method permits teacher to adapt other assessment methods so as to meet the needs of students.
4.      Information gathered through observation can be used together with formal methods such as paper and pencil test to assess students
Disadvantages
1.      Faults can occur when judgment is based on single observation
2.      It is time consuming to obtain information through observation
3.      If teacher is not focused on the specific skill he /she will end up observing unrelated behavior.
Peer Appraisal and Self Assessment
Peer appraisal
This is one of the ways in which students internalize the characteristics of quality work by evaluating the work of their peers. However, in order to offer helpful feedback, students must be given instructions of what they are to look for in their peers' work. The instructor must explain expectations clearly to them before they begin. One way to make sure students understand this type of evaluation is to give students a practice session with it. The instructor provides a sample writing or speaking assignment. As a group, students determine what should be assessed and how criteria for successful completion of the communication task should be defined. Then the instructor gives students a sample completed assignment. Students assess this using the criteria they have developed, and determine how to convey feedback clearly to the fictitious student. Students can also benefit from using rubrics or checklists to guide their assessments. At first these can be provided by the instructor; once the students have more experience, they can develop them themselves. The checklist asks the peer evaluator to comment primarily on the content and organization of the essay. 
For peer evaluation to work effectively, the learning environment in the classroom must be supportive. Students must feel comfortable and trust one another in order to provide honest and constructive feedback. Instructors who use group work and peer assessment frequently can help students develop trust by forming them into small groups early in the semester and having them work in the same groups throughout the term. This allows them to become more comfortable with each other and leads to better peer feedback.



Comments

Popular posts from this blog

International Law

KATIBA YA KIKUNDI