Automating Cognitive Model Improvement by A*Search and Logistic Regression

Learning from Learning Curves: Item Response Theory & Learning Factors Analysis

Ken Koedinger

Human-Computer Interaction
Institute

Carnegie Mellon University

Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method for Cognitive Model Evaluation and Improvement. 8th International Conference on Intelligent Tutoring Systems. 2006.

Cen, H., Koedinger, K., Junker, B. Is Over Practice Necessary? Improving Learning Efficiency with the Cognitive Tutor. 13th International Conference on Artificial Intelligence in Education. 2007.

Domain-Specific Cognitive Models

Question: How do students represent knowledge in a given domain?
Answering this question involves deep domain analysis
The product is a cognitive model of students’ knowledge

Recall cognitive models drive ITS behaviors & instructional design decisions

Knowledge Decomposibility Hypothesis

Human acquisition of academic competencies can be decomposed into units, called knowledge components (KCs), that predict student task performance & transfer
Performance predictions

If item I1 only requires KC1
& item I2 requires both KC1 and KC2,
then item I2 will be harder than I1
If student can do I2, then they can do I1

Transfer predictions

If item I1 requires KC1,
& item I3 also requires KC1,
then practice on I3 will improve I1
If item I1 requires KC1,
& item I4 requires only KC3, then practice on I4 will not improve I1

Fundamental EDM idea:

We can discover KCs (cog models) by working these predictions backwards!

KC1 add

KC2 carry

KC3 subt

I1: 5+3

I2: 15+7

I3: 4+2

I4: 5-3

Example of Items & KCs

Student Performance As They Practice with the LISP Tutor

Mean Error Rate

Evidence for Production Rule as an appropriate unit of knowledge acquisition

Production Rule Analysis

Using learning curves to evaluate a cognitive model

Lisp Tutor Model

Learning curves used to validate cognitive model
Fit better when organized by knowledge components (productions) rather than surface forms (programming language terms)

But, curves not smooth for some production rules

“Blips” in leaning curves indicate the knowledge representation may not be right

Corbett, Anderson, O’Brien (1995)

Let me illustrate …

Curve for “Declare Parameter” production rule

How are steps with blips different from others?
What’s the unique feature or factor explaining these blips?

What’s happening on the 6th & 10th opportunities?

Can modify cognitive model using unique factor present at “blips”

Blips occur when to-be-written program has 2 parameters
Split Declare-Parameter by parameter-number factor:

Declare-first-parameter
Declare-second-parameter

(defun second (lst)
(first (rest lst)))

(defun add-to (el lst)
(append lst (list lst)))

Can learning curve analysis be automated?

Learning curve analysis

Identify blips by hand & eye
Manually create a new model
Qualitative judgment

Need to automatically:

Identify blips by system
Propose alternative cognitive models
Evaluate each model quantitatively

Learning Factors Analysis

Learning Factors Analysis (LFA): A Tool for KC Analysis

LFA is a method for discovering & evaluating alternative cognitive models

Finds knowledge component decomposition that best predicts student performance & learning transfer

Inputs

Data: Student success on tasks in domain over time
Codes: Factors hypothesized to drive task difficulty & transfer

A mapping between these factors & domain tasks

Outputs

A rank ordering of most predictive cognitive models
For each model, a measure of its generalizability & parameter estimates for knowledge component difficulty, learning rates, & student proficiency

Learning Factors Analysis (LFA) draws from multiple disciplines

Machine Learning & AI

Combinatorial search (Russell & Norvig, 2003)
Exponential-family principal component analysis (Gordon, 2002)

Psychometrics & Statistics

Q Matrix & Rule Space (Tatsuoka 1983, Barnes 2005)
Item response learning model (Draney, et al., 1995)
Item response assessment models (DiBello, et al., 1995; Embretson, 1997; von Davier, 2005)

Cognitive Psychology

Learning curve analysis (Corbett, et al 1995)

Steps in Learning Factors Analysis

Representing Knowledge Components as factors of items

Problem: How to represent KC model?
Solution: Q-Matrix (Tatsuoka, 1983)

Items X Knowledge Components (KCs)

Single KC item = when a row has one 1

2*8 above

Multi-KC item = when a row has many 1’s

2*8 – 3

Item | Skills:

Add

Sub

Mul

Div

2*8

2*8 - 3

What good is a Q matrix? Can predict student accuracy on items not previously seen, based on KCs involved

The Statistical Model

Generalized Power Law to fit learning curves

Logistic regression (Draney, Wilson, Pirolli, 1995)

Assumptions

Some skills may easier from the start than others

=> use an intercept parameter for each skill

Some skills are easier to learn than others

=> use a slope parameter for each skill

Different students may initially know more or less

=> use an intercept parameter for each student

Students generally learn at the same rate

=> no slope parameters for each student

These assumptions are reflected in a statistical model …

Prior Summer School project!

Simple Statistical Model of
Performance & Learning

Problem: How to predict student responses from model?
Solutions: Additive Factor Model (Draney, et al. 1995)

Comparing Additive Factor Model to other psychometric techniques

Instance of generalized linear regression, binomial family
or “logistic regression”

R code: glm(success~student+skill+skill:opportunity, family=binomial,…)

Extension of item response theory

IRT has simply a student term (theta-i) + item term (beta-j)
R code: glm(success~student+item, family=binomial,…)
The additive factor model behind LFA is different because:

It breaks items down in terms of knowledge component factors
It adds term for practice opportunities per component

Model Evaluation

How to compare cognitive models?

A good model minimizes prediction risk by balancing fit with data & complexity (Wasserman 2005)

Compare BIC for the cognitive models

BIC is “Bayesian Information Criteria”
BIC = -2*log-likelihood + numPar * log(numOb)
Better (lower) BIC == better predict data that haven’t seen

Mimics cross validation, but is faster to compute

Item Labeling & the “P Matrix”: Adding Alternative Factors

Problem: How to improve existing cognitive model?
Solution: Have experts look for difficulty factors that are candidates for new KCs. Put these in P matrix.

Item | Skill

Add

Sub

Mul

2*8

2*8 – 3

2*8 - 30

3+2*8

Q Matrix

P Matrix

Item | Skill

Deal with negative

Order of Ops

…

2*8

2*8 – 3

2*8 - 30

3+2*8

Using P matrix to update Q matrix

Create a new Q’ by using elements of P as arguments to operators

Add operator: Q’ = Q + P[,1]
Split operator: Q’ = Q[, 2] * P[,1]

Item | Skill

Add

Sub

Mul

Div

neg

2*8

2*8 – 3

2*8 - 30

Q- Matrix after add P[, 1]

Item | Skill

Add

Sub

Mul

Div

Sub-neg

2*8

2*8 – 3

2*8 - 30

Q- Matrix after splitting P[, 1], Q[,2]

LFA: KC Model Search

Problem: How to find best model given Q and P matrices?
Solution: Combinatorial search

A best-first search algorithm (Russell & Norvig 2002)

Guided by a heuristic, such as BIC

Goal: Do model selection within logistic regression model space

Steps:

Start from an initial “node” in search graph using given Q
Iteratively create new child nodes (Q’) by applying operators with arguments from P matrix
Employ heuristic (BIC of Q’) to rank each node
Select best node not yet expanded & go back to step 2

Learning Factors Analysis: Example in Geometry Area

Area Unit of Geometry Cognitive Tutor

15 skills:

Circle-area

Circle-circumference

Circle-diameter

Circle-radius

Compose-by-addition

Compose-by-multiplication

Original cognitive model in tutor:

Parallelogram-area

Parallelogram-side

Pentagon-area

Pentagon-side

Trapezoid-area

Trapezoid-base

Trapezoid-height

Triangle-area

Triangle-side

Log Data Input to LFA

Student

Step (Item)

Skill (KC)

Opportunity

Success

p1s1

Circle-area

p2s1

Circle-area

p2s2

Rectangle-area

p2s3

Compose-by-addition

p3s1

Circle-area

Items = steps in tutors with step-based feedback

Q-matrix in single column: works for single KC items

Opportunities Student has had to learn KC

AFM Results for original KC model

Skill

Intercept

Slope

Avg Opportunties

Initial Probability

Avg Probability

Final Probability

Parallelogram-area

2.14

-0.01

14.9

0.95

0.94

0.93

Pentagon-area

-2.16

0.45

4.3

0.2

0.63

0.84

Student

Intercept

student0

1.18

student1

0.82

student2

0.21

Model Statistics

AIC

3,950

BIC

4,285

MAD

0.083

Higher intercept of skill -> easier skill

Higher slope of skill -> faster students learn it

Higher intercept of student -> student initially knew more

The AIC, BIC & MAD statistics provide alternative ways to evaluate models

MAD = Mean Absolute Deviation

Application: Use Statistical Model to improve tutor

Some KCs over-practiced, others under
(Cen, Koedinger, Junker, 2007)

initial error rate 76%
reduced to 40%
after 6 times of practice

initial error rate 12%
reduced to 8%
after 18 times of practice

“Close the loop” experiment

In vivo experiment: New version of tutor with updated knowledge tracing parameters vs. prior version
Reduced learning time by 20%, same robust learning gains
Knowledge transfer: Carnegie Learning using approach for other tutor units

Example in Geometry of split based on factor in P matrix

Student

Step

Skill

Opportunity

p1s1

Circle-area-alone

p2s1

Circle-area-embed

p2s2

Rectangle-area

p2s3

Compose-by-add

p3s1

Circle-area-alone

Student

Step

Skill

Opportunity

Embed

p1s1

Circle-area

alone

p2s1

Circle-area

embed

p2s2

Rectangle-area

p2s3

Compose-by-add

p3s1

Circle-area

alone

After Splitting Circle-area by Embed

Factor in P matrix

Original Q matrix

New Q matrix

Revised Opportunity

LFA –Model Search Process

Automates the process of hypothesizing alternative KC models & testing them against data

Search algorithm guided by a heuristic: BIC
Start from an existing KC model (Q matrix)

LFA Results 1: Applying splits to original model

Common results:

Compose-by-multiplication split based on whether it was an area or a segment being multiplied
Circle-radius is split based on whether it is being done for the first time in a problem or is being repeated

Made sense, but less than expected …

Model 1

Model 2

Model 3

Number of Splits:3

Number of Splits:2

Binary split compose-by-multiplication by figurepart segment
Binary split circle-radius by repeat repeat
Binary split compose-by-addition by backward backward

Binary split compose-by-multiplication by figurepart segment
Binary split circle-radius by repeat repeat
Binary split compose-by-addition by figurepart area-difference

Binary split compose-by-multiplication by figurepart segment
Binary split circle-radius by repeat repeat

Number of Skills: 18

Number of Skills: 17

BIC: 4,248.86

BIC: 4,251.07

Automating Cognitive Model Improvement by A*Search and Logistic Regression

Recent Documents:

Recent Search: