Learning from Learning
Curves: Item Response Theory & Learning Factors Analysis
Ken Koedinger
Human-Computer Interaction
Institute
Carnegie Mellon University
Cen, H., Koedinger, K., Junker, B. Learning
Factors Analysis - A General Method for Cognitive Model Evaluation and
Improvement. 8th International Conference on Intelligent Tutoring
Systems. 2006.
Cen, H., Koedinger, K., Junker, B. Is Over Practice Necessary?
Improving Learning Efficiency with the Cognitive Tutor. 13th
International Conference on Artificial Intelligence in Education.
2007.
Domain-Specific Cognitive
Models
- Question: How do students
represent knowledge in a given domain?
- Answering this question involves deep domain analysis
- The product is a cognitive
model of students’ knowledge
- Recall cognitive models drive
ITS behaviors & instructional design decisions
Knowledge Decomposibility
Hypothesis
- Human acquisition of academic
competencies can be decomposed into units, called knowledge components
(KCs), that predict student task performance & transfer
- Performance predictions
- If item I1 only requires KC1
& item I2 requires both KC1 and KC2,
then item I2 will be harder than I1
- If student can do I2, then
they can do I1
- Transfer predictions
- If item I1 requires KC1,
& item I3 also requires KC1,
then practice on I3 will improve I1
- If item I1 requires KC1,
& item I4 requires only KC3, then practice on I4 will not
improve I1
- Fundamental EDM idea:
- We can discover KCs (cog
models) by working these predictions backwards!
KC1 add
KC2 carry
KC3 subt
I1: 5+3
1
0
0
I2: 15+7
1
1
0
I3: 4+2
1
0
0
I4: 5-3
0
0
1
Example of Items & KCs
Student Performance
As They Practice with the LISP Tutor
Mean Error Rate
Evidence for Production Rule as an
appropriate unit of knowledge acquisition
Production Rule Analysis
Using learning curves
to evaluate a cognitive model
- Lisp Tutor Model
- Learning curves used to validate
cognitive model
- Fit better when organized
by knowledge components (productions) rather than surface forms (programming
language terms)
- But, curves not smooth for
some production rules
- “Blips” in leaning curves
indicate the knowledge representation may not be right
- Corbett, Anderson, O’Brien
(1995)
- Let me illustrate …
Curve for “Declare
Parameter” production rule
- How are steps with blips different
from others?
- What’s the unique feature
or factor explaining these blips?
What’s happening on the 6th & 10th
opportunities?
Can modify cognitive
model using unique factor present at “blips”
- Blips occur when to-be-written
program has 2 parameters
- Split
Declare-Parameter by parameter-number factor:
- Declare-first-parameter
- Declare-second-parameter
(defun second (lst)
(first (rest lst)))
(defun add-to (el lst)
(append lst (list lst)))
Can learning curve
analysis be automated?
- Learning curve analysis
- Identify blips by hand &
eye
- Manually create a new model
- Qualitative judgment
- Need to automatically:
- Identify blips by system
- Propose alternative cognitive
models
- Evaluate each model quantitatively
Learning Factors Analysis
Learning Factors Analysis
(LFA): A Tool for KC Analysis
- LFA is a method for discovering
& evaluating alternative cognitive models
- Finds knowledge component
decomposition that best predicts student performance & learning
transfer
- Inputs
- Data: Student success on tasks
in domain over time
- Codes: Factors hypothesized
to drive task difficulty & transfer
- A mapping between these factors
& domain tasks
- Outputs
- A rank ordering of most predictive
cognitive models
- For each model, a measure
of its generalizability & parameter estimates for knowledge component
difficulty, learning rates, & student proficiency
Learning Factors Analysis
(LFA) draws from multiple disciplines
- Machine Learning & AI
- Combinatorial search (Russell
& Norvig, 2003)
- Exponential-family principal
component analysis (Gordon, 2002)
- Psychometrics & Statistics
- Q Matrix & Rule Space
(Tatsuoka 1983, Barnes 2005)
- Item response learning model
(Draney, et al., 1995)
- Item response assessment
models (DiBello, et al., 1995; Embretson, 1997; von Davier, 2005)
- Cognitive Psychology
- Learning curve analysis (Corbett,
et al 1995)
Steps in Learning Factors
Analysis
Representing Knowledge
Components as factors of items
- Problem: How to represent
KC model?
- Solution: Q-Matrix (Tatsuoka,
1983)
- Single KC item = when a row
has one 1
- Multi-KC item = when a row
has many 1’s
Item | Skills:
Add
Sub
Mul
Div
2*8
0
0
1
0
2*8 - 3
0
1
1
0
What good is a Q matrix? Can predict
student accuracy on items not previously seen, based on KCs involved
The Statistical Model
- Generalized Power Law to fit
learning curves
- Logistic regression (Draney,
Wilson, Pirolli, 1995)
- Assumptions
- Some skills may easier from
the start than others
- Some skills are easier to
learn than others
- Different students may initially
know more or less
- Students generally learn at
the same rate
- These assumptions are reflected
in a statistical model …
Prior Summer School project!
Simple Statistical
Model of
Performance & Learning
- Problem: How to predict student
responses from model?
- Solutions: Additive Factor
Model (Draney, et al. 1995)
Comparing Additive
Factor Model to other psychometric techniques
- Instance of generalized linear
regression, binomial family
or “logistic regression”
- R code: glm(success~student+skill+skill:opportunity,
family=binomial,…)
- Extension of item response
theory
- IRT has simply a student term
(theta-i) + item term (beta-j)
- R code: glm(success~student+item,
family=binomial,…)
- The additive factor model
behind LFA is different because:
- It breaks items down in terms
of knowledge component factors
- It adds term for practice
opportunities per component
18
Model Evaluation
- How to compare cognitive
models?
- A good model minimizes prediction
risk by balancing fit with data & complexity (Wasserman 2005)
- Compare BIC for the cognitive
models
- BIC is “Bayesian Information
Criteria”
- BIC = -2*log-likelihood +
numPar * log(numOb)
- Better (lower) BIC == better
predict data that haven’t seen
- Mimics cross validation, but
is faster to compute
Item Labeling &
the “P Matrix”: Adding Alternative Factors
- Problem: How to improve existing
cognitive model?
- Solution: Have experts look
for difficulty factors that are candidates for new KCs. Put these
in P matrix.
Item | Skill
Add
Sub
Mul
2*8
0
0
1
2*8
– 3
0
1
1
2*8 - 30
0
1
1
3+2*8
1
0
1
Q Matrix
P Matrix
Item | Skill
Deal with negative
Order of Ops
…
2*8
0
0
2*8
– 3
0
0
2*8 - 30
1
0
3+2*8
0
1
Using P matrix to update
Q matrix
- Create a new Q’ by using
elements of P as arguments to operators
- Add operator: Q’ =
Q + P[,1]
- Split operator: Q’ = Q[,
2] * P[,1]
Item | Skill
Add
Sub
Mul
Div
neg
2*8
0
0
1
0
0
2*8
– 3
0
1
1
0
0
2*8 - 30
0
1
1
0
1
Q- Matrix after add P[, 1]
Item | Skill
Add
Sub
Mul
Div
Sub-neg
2*8
0
0
1
0
0
2*8
– 3
0
1
1
0
0
2*8 - 30
0
0
1
0
1
Q- Matrix after splitting P[, 1], Q[,2]
LFA: KC Model Search
- Problem: How to find best
model given Q and P matrices?
- Solution: Combinatorial search
- A best-first search algorithm
(Russell & Norvig 2002)
- Guided by a heuristic, such
as BIC
- Goal: Do model selection
within logistic regression model space
- Start from an initial “node”
in search graph using given Q
- Iteratively create new child
nodes (Q’) by applying operators with arguments from P matrix
- Employ heuristic (BIC of Q’)
to rank each node
- Select best node not yet expanded
& go back to step 2
Learning Factors Analysis:
Example in Geometry Area
Area Unit of Geometry
Cognitive Tutor
15 skills:
Circle-area
Circle-circumference
Circle-diameter
Circle-radius
Compose-by-addition
Compose-by-multiplication
- Original cognitive model
in tutor:
Parallelogram-area
Parallelogram-side
Pentagon-area
Pentagon-side
Trapezoid-area
Trapezoid-base
Trapezoid-height
Triangle-area
Triangle-side
Log Data Input to LFA
Student
Step (Item)
Skill (KC)
Opportunity
Success
A
p1s1
Circle-area
0
0
A
p2s1
Circle-area
1
1
A
p2s2
Rectangle-area
0
1
A
p2s3
Compose-by-addition
0
0
A
p3s1
Circle-area
2
0
Items = steps in tutors with step-based
feedback
Q-matrix in single column: works for
single KC items
Opportunities Student has had to learn
KC
AFM Results for original
KC model
Skill
Intercept
Slope
Avg Opportunties
Initial Probability
Avg Probability
Final Probability
Parallelogram-area
2.14
-0.01
14.9
0.95
0.94
0.93
Pentagon-area
-2.16
0.45
4.3
0.2
0.63
0.84
Student
Intercept
student0
1.18
student1
0.82
student2
0.21
Model Statistics
AIC
3,950
BIC
4,285
MAD
0.083
Higher intercept of skill -> easier
skill
Higher slope of skill -> faster students
learn it
Higher intercept of student ->
student initially knew more
The AIC, BIC & MAD statistics provide
alternative ways to evaluate models
MAD = Mean Absolute Deviation
Application: Use Statistical
Model to improve tutor
- Some KCs over-practiced,
others under
(Cen, Koedinger, Junker, 2007)
26
initial error rate 76%
reduced to 40%
after 6 times of practice
initial error rate 12%
reduced to 8%
after 18 times of practice
“Close
the loop” experiment
- In vivo experiment:
New version of tutor with updated knowledge tracing parameters vs. prior
version
- Reduced learning time by
20%, same robust learning gains
- Knowledge transfer: Carnegie
Learning using approach for other tutor units
27
Example in Geometry
of split based on factor in P matrix
Student
Step
Skill
Opportunity
A
p1s1
Circle-area-alone
0
A
p2s1
Circle-area-embed
0
A
p2s2
Rectangle-area
0
A
p2s3
Compose-by-add
0
A
p3s1
Circle-area-alone
1
Student
Step
Skill
Opportunity
Embed
A
p1s1
Circle-area
0
alone
A
p2s1
Circle-area
1
embed
A
p2s2
Rectangle-area
0
A
p2s3
Compose-by-add
0
A
p3s1
Circle-area
2
alone
After Splitting Circle-area by Embed
Factor in P matrix
Original Q matrix
New Q matrix
Revised Opportunity
LFA –Model Search Process
Automates the process of hypothesizing
alternative KC models & testing them against data
- Search algorithm guided by
a heuristic: BIC
- Start from an existing KC
model (Q matrix)
LFA Results 1: Applying
splits to original model
- Common results:
- Compose-by-multiplication
split based on whether it was an area or a segment being multiplied
- Circle-radius is split based
on whether it is being done for the first time in a problem or is being
repeated
- Made sense, but less than
expected …
Model 1
Model 2
Model 3
Number of Splits:3
Number of Splits:3
Number of Splits:2
- Binary split compose-by-multiplication
by figurepart segment
- Binary split circle-radius
by repeat repeat
- Binary split compose-by-addition
by backward backward
- Binary split compose-by-multiplication
by figurepart segment
- Binary split circle-radius
by repeat repeat
- Binary split compose-by-addition
by figurepart area-difference
- Binary split compose-by-multiplication
by figurepart segment
- Binary split circle-radius
by repeat repeat
Number of Skills: 18
Number of Skills: 18
Number of Skills: 17
BIC: 4,248.86
BIC: 4,248.86
BIC: 4,251.07
Other Geometry problem
examples
Example of Tutor Design
Implications
- LFA search suggests distinctions
to address in instruction & assessment
With these new distinctions, tutor
can
- Generate hints better directed
to specific student difficulties
- Improve knowledge tracing
& problem selection for better cognitive mastery
- Example: Consider Compose-by-multiplication
before LFA
Intercept
slope
Avg Practice Opportunties
Initial Probability
Avg Probability
Final Probability
CM
-.15
.1
10.2
.65
.84
.92
With final probability .92, many students
are short of .95 mastery threshold
Making a distinction
changes assessment decision
- CM-area and CM-segment look
quite different
- CM-area is now above .95 mastery
threshold (at .96)
- But CM-segment is only at
.60
- Implications:
- Original model penalizes students
who have key idea about composite areas (CM-area) -- some students solve
more problems than needed
- CM-segment is not getting
enough practice
- Instructional design choice:
Add instructional objective & more problems or not?
Intercept
slope
Avg Practice Opportunties
Initial Probability
Avg Probability
Final Probability
CM
-.15
.1
10.2
.65
.84
.92
CMarea
-.009
.17
9
.64
.86
.96
CMsegment
-1.42
.48
1.9
.32
.54
.60
Perhaps original model
is good enough -- Can LFA recover it?
- Merge some skills in original
model, to produce 8 skills:
- Circle-area, Circle-radius
=> Circle
- Circle-circumference, Circle-diameter
=> Circle-CD
- Parallelogram-area, Parallelogram-side
=> Parallelogram
- Pentagon-area, Pentagon-side
=> Pentagon
- Trapezoid-area, Trapezoid-base,
Trapezoid-height => Trapezoid
- Triangle-area, Triangle-side
=> Triangle
- Compose-by-addition
- Compose-by-multiplication
- Does splitting by “backward”
(or otherwise) yield a better model? Closer to original?
LFA Results 2: Recovery
Model 1
Model 2
Model 3
Number of Splits: 4
Number of Splits: 3
Number of Splits: 4
Circle*area
Circle*radius*initial
Circle*radius*repeat
Compose-by-addition
Compose-by-addition*area-difference
Compose-by-multiplication*area-combination
Compose-by-multiplication*segment
All skills are the same as those in model
1 except that
1. Circle is split into Circle *backward*initial,
Circle *backward*repeat, Circle*forward,
2. Compose-by-addition is not split
All skills are the same as those in model
1 except that
1. Circle is split into Circle *backward*initial,
Circle *backward*repeat, Circle *forward
2. Compose-by-addition is split into
Compose-by-addition and Compose-by-addition*segment
Number of skills: 12
Number of skills: 11
Number of skills: 12
BIC: 4,169.315
BIC: 4,171.523
BIC: 4,171.786
- Only 1 recovery: Circle-area
vs. Circle-radius
- More merged model fits better
- Why? More transfer going on
than expected or not enough data to make distinctions?
Other relevant data sets …
Research Issues &
Summary
Open Research Questions:
Technical
- What factors to consider?
P matrix is hard to create
- Enhancing human role: Data
visualization strategies
- Other techniques: Principal
Component Analysis +
- Other data: Do clustering
on problem text
- Interpreting LFA output can
be difficult
- LFA outputs many models with
roughly equivalent BICs
- How to select from large equivalence
class of models?
- How to interpret results?
=> Researcher can’t just “go by
the numbers”
1) Understand the domain, the tasks
2) Get close to the data
DataShop Case Study
video
- “Using DataShop to discover
a better knowledge component model of student learning”
Summary of Learning
Factors Analysis (LFA)
- LFA combines statistics, human
expertise, & combinatorial search to discover cognitive models
- Evaluates a single model in
seconds,
searches 100s of models in hours
- Model statistics are meaningful
- Improved models suggest tutor
improvements
- Other applications of LFA
& model comparison
- Used by others:
- Individual differences in
learning rate (Rafferty et. al., 2007)
- Alternative methods for error
attribution (Nwaigwe, et al. 2007)
- Model comparison for DFA data
in math (Baker; Rittle-Johnson)
- Learning transfer in reading
(Leszczenski & Beck, 2007)
Open Research Questions:
Psychology of Learning
- Test statistical model assumptions:
Right terms?
- Is student learning rate really
constant?
- Does a Student x Opportunity
interaction term improve fit?
- What instructional conditions
or student factors change rate?
- Is knowledge space “uni-dimensional”?
- Does a Student x KC interaction
term improve fit?
- Need different KC models for
different students/conditions?
- Right shape: Power law or
an exponential?
- Long-standing hot debate
- Has focused on “reaction
time” not on error rate!
- Other predictor & outcome
variables (x & y of curve)
- Outcome: Error rate =>
Reaction time, assistance score
- Predictor: Opportunities =>
Time per instructional event
Open Research Questions:
Instructional Improvement
- Do LFA results generalize
across data sets?
- Is BIC a good estimate for
cross-validation results?
- Does a model discovered with
one year’s tutor data generalize to a next year?
- Does model discovery work
in ill-structured domains?
- Use learning curves to compare
instructional conditions in experiments
- Need more
“close the loop” experiments
- EDM => better model =>
better tutor => better student learning
END