<
Optimization
fo
r
Economics,
a
Visual
App
roach
Mik
e
Ca
rr
Contents
0
Calculus
Reference
3
0.1
Graphs
of
F
unctions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
0.2
Limits
and
Derivatives
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
0.3
Multiva
riable
Functions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
1
Unconstrained
Optimization
31
1.1
Single-V
ariable
Optimization
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
1.2
Concavit
y
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
48
1.3
Multiva
riable
Optimization
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
2
Constrained
Optimization
89
2.1
Equalit
y
Constraints
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
90
2.2
Inequalit
y
Constraints
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
106
2.3
The
Kuhn-T
uck
er
Conditions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
116
3
Compa
rative
Statics
135
3.1
The
Implicit
F
unction
Theorem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
136
3.2
The
Envelop
e
Theo
rem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
154
4
Sufficient
Conditions
163
4.1
The
Extreme
V
alue
Theorem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
164
4.2
The
Bo
rdered
Hessian
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
173
4.3
Sepa
ration
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
178
4.4
Concave
Programming
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
186
4.5
Quasiconcavit
y
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
201
1
Note
to
the
Reader
F
or
the
last
several
yea
rs
I
have
taught
a
course
in
optimization
at
Emory
University
for
seniors
majo
ring
in
mathematics
and
economics.
While
mathematics
and
economics
is
the
standard
p
reparation
fo
r
a
do
ctoral
p
rogram
in
economics,
the
vast
majorit
y
of
students
in
this
course
are
bound
for
careers
in
finance.
One
of
the
challenges
in
designing
this
course
has
b
een
balancing
the
interests
and
needs
of
students
on
b
oth
tracks.
These
notes
have
evolved
from
my
o
wn
teaching
materials.
Their
goal
is
to
p
resent
metho
ds
for
optimization
and
the
reasoning
behind
them.
This
reasoning
makes
students
resilient
to
the
myriad
mo
difications
that
these
to
ols
endure
at
the
hands
of
economists.
As
an
outsider
to
economics
education,
I
w
as
impressed
to
see
the
extent
to
which
economics
has
emb
raced
visual
metho
ds
of
instruction.
More
than
any
other
field
whose
foundation
lies
in
mathematics,
the
standa
rd
economics
curriculum
provides
students
not
only
with
descriptions
and
formulas,
but
with
diagrams
to
depict
core
principles.
Micro
economics
students
are
inundated
with
supply
curves,
budget
lines,
indifference
curves,
and
graphs
in
marginal
space.
This
is
certainly
a
lifeline
for
visual
learners,
but
I
susp
ect
it
p
ro
duces
deeper
understanding
for
all.
As
a
geometer,
my
first
inclination
is
to
lean
heavily
up
on
visual
reasoning
when
p
resenting
the
metho
ds
of
optimization.
In
our
efforts
to
find
a
textb
o
ok
for
our
admittedly
niche
curriculum,
we
have
not
found
a
ec
onomics-focused
text
that
takes
this
approach.
I
have
made
these
notes
available
in
the
hop
e
that
this
visual
viewp
oint
can
b
ecome
an
effective
supplement
to
existing
techniques.
Lik
e
with
the
graphical
arguments
in
undergraduate
micro
economics,
I
think
it
can
b
ring
a
complete
and
rigorous
understanding
of
optimization
metho
ds
to
a
b
roader
set
of
lea
rners.
The
audience
for
this
text
is
anyone
who
wants
to
understand
the
mathematical
methods
for
finding
maximizers
in
economic
theo
ry
.
The
b
est-prepa
red
reader
will
have
mastered
the
techniques
of
differ-
entiation,
including
partial
derivatives.
They
will
also
be
familiar
with
the
foundations
of
mathematical
logic:
set
notation,
functions,
and
methods
of
p
ro
of
lik
e
contradiction
and
the
contrap
ostive.
Finally
,
they
will
need
a
basic
proficiency
in
vector
and
matrix
op
erations:
sums,
products
and
determinants.
A
comp
etent
high-scho
ol
treatment
ma
y
suffice
fo
r
this
requirement.
Any
one
who
sets
out
to
teach
or
lea
rn
the
metho
ds
of
optimization
from
this
text
should
b
e
aw
are
of
its
limitations.
It
do
es
not
contain
the
economic
applications
that
my
economist
colleagues
present
each
semester.
These
must
be
provided
in
o
rder
to
create
genuine
enthusiasm
for
the
material.
This
is
a
deficiency
that
I
w
ould
lik
e
to
rectify
eventually
,
if
there
is
some
consensus
ab
out
which
examples
to
include.
W
e
have
used
one
widely-available
source
of
economic
examples:
Optimization
for
Economic
Theory
b
y
Avinash
Dixit.
This
excellent
b
o
ok
has
a
rigorous,
traditional
treatment
of
most
of
the
material
here
and
go
o
d
explanations
of
some
advanced
applications.
Dixit’s
b
o
ok
has
p
roved
to
o
difficult
for
all
but
the
strongest
undergraduates
to
read
independently
,
but
the
examples
are
compelling
and
comprehensible
with
some
extra
exp
osition.
Finally
I
w
ould
like
to
acknowledge
my
economist
colleagues
Blak
e
Allison
and
T
eddy
Kim
who
w
ere
essential
pa
rtners
in
developing
this
course.
Three
students,
Alexia
Witthaus,
Jacob
Suga
rmann
and
Aa
rya
Aamer,
ca
refully
read
my
ea
rly
drafts
and
ask
ed
many
questions.
Their
feedback
allo
wed
me
to
identify
the
most
opaque
exposition
(and
cla
rify
it,
I
hop
e).
There
is
still
a
long
road
of
revision
and
imp
rovement
ahead
for
this
doc
ument.
I
would
appreciate
any
feedback
you
have.
Mik
e
Carr
Ma
y
2023
Chapter
0
Calculus
Reference
0.1
Graphs
of
F
unctions
Goals:
1
Graph
algebraic
functions.
2
Graph
transformations
of
functions.
0.1.1
Graphs
of
Basic
F
unctions
Definition
0.1
The
graph
of
a
function
f
(
x
)
is
the
set
of
p
oints
(
x,
y
)
whose
co
ordinates
satisfy
the
equation
y
=
f
(
x
)
.
In
this
section
w
e
review
several
basic
functions
and
their
graphs.
These
will
b
e
important
as
examples
and
counterexamples
for
our
metho
ds
of
optimization.
In
addition,
knowing
the
shap
e
of
a
graph
is
an
efficient
w
ay
to
memo
rize
the
b
ehavior
of
functions
that
app
ear
frequently
in
economic
mo
dels.
Definition
0.2
Linea
r
functions
can
b
e
written
in
slop
e
-intercept
fo
rm
:
f
(
x
)
=
mx
+
b.
The
graph
of
a
linea
r
function
is
a
line.
m
is
the
slop
e,
which
is
the
change
in
y
over
the
change
in
x
b
etw
een
any
t
wo
p
oints
on
the
line.
(0
,
b
)
is
the
y
intercept.
If
we
have
the
slop
e
and
a
known
p
oint
(
a,
b
)
on
a
line.
We
can
write
its
equation
in
p
oint-slop
e
fo
rm.
y
−
b
=
m
(
x
−
a
)
4
Definition
0.3
A
monomial
is
a
function
of
the
fo
rm:
f
(
x
)
=
x
n
where
n
is
an
integer
greater
than
0
.
F
or
n
≥
2
the
graph
y
=
x
n
curves
up
wa
rd
over
the
p
ositive
values
of
x
.
Greater
values
of
n
have
lo
w
er
values
of
f
(
x
)
when
0
<
x
<
1
but
higher
values
when
x
>
1
.
F
or
even
values
of
n
the
graph
is
symmetric
across
the
y
-axis,
curving
up
when
x
is
negative.
F
or
o
dd
values
of
n
the
graph
curves
down
when
x
is
negative.
It
is
anti-symmetric
across
the
x
=
0
.
x
2
x
4
x
6
x
y
Figure
1:
Graphs
of
even-p
ow
ered
monomials
x
x
3
x
5
x
y
Figure
2:
Graphs
of
o
dd-p
ow
ered
monomials
Monomials
of
Negative
P
ow
er
Monomials
of
negative
p
o
wer
have
the
fo
rm
f
(
x
)
=
x
−
n
.
They
are
also
commonly
written
f
(
x
)
=
1
x
n
.
The
graph
y
=
1
x
n
has
a
vertical
asymptote
at
x
=
0
.
The
graph
app
roaches
the
x
-axis,
y
=
0
as
x
gets
la
rge.
F
or
even
values
of
n
,
the
graph
is
ab
ove
the
x
-axis.
F
or
odd
values
of
n
,
the
graph
is
ab
ove
the
x
-axis
for
p
ositive
x
and
b
elow
it
for
negative
x
.
A
greater
choice
of
n
mak
es
the
function
app
roach
the
x
-axis
more
quickly
.
5
0.1.1
Graphs
of
Basic
F
unctions
x
y
Figure
3:
Graphs
of
y
=
x
−
2
,
y
=
x
−
4
and
y
=
x
−
6
x
y
Figure
4:
Graphs
of
y
=
x
−
1
,
y
=
x
−
3
and
y
=
x
−
5
Definition
0.4
A
ro
ot
function
is
a
function
of
the
form:
f
(
x
)
=
n
√
x
where
n
is
an
integer
greater
than
0
.
The
domain
of
n
√
x
is
[0
,
∞
)
if
n
is
even
and
all
real
numb
ers
if
n
is
o
dd.
The
x
and
y
intercept
of
y
=
n
√
x
is
at
(0
,
0)
.
Ro
ot
functions
a
re
increasing.
A
t
x
=
0
,
they
travel
straight
up.
√
x
3
√
x
x
y
Figure
5:
Graphs
of
ro
ot
functions
6
Definition
0.5
An
exp
onential
function
has
the
form:
f
(
x
)
=
a
x
where
a
is
a
numb
er
greater
than
0
.
a
is
called
the
base
of
the
exp
onential
function.
The
graph
y
=
a
x
passes
through
(0
,
1)
.
If
a
>
1
then
f
(
x
)
increases
quickly
as
x
takes
on
p
ositive
values.
Greater
values
of
a
give
a
steep
er
increase.
f
(
x
)
approaches
0
as
x
go
es
to
−∞
.
Greater
values
of
a
give
a
faster
approach.
The
graph
do
es
not
touch
o
r
cross
the
x
-axis.
If
a
<
1
,
then
the
ab
ove
is
reversed.
e
is
a
commonly
used
base.
e
is
appro
ximately
2
.
718
.
x
y
Figure
6:
Graphs
of
y
=
2
x
,
y
=
e
x
and
y
=
3
x
Definition
0.6
A
loga
rithmic
function
has
the
form:
f
(
x
)
=
log
a
x
where
the
base
a
is
a
numb
er
greater
than
1
.
log
a
x
is
the
numb
er
b
such
that
a
b
=
x
.
The
natural
loga
rithm
is
the
logarithm
with
base
e
.
It
is
denoted
f
(
x
)
=
ln
x
.
a
b
can
never
b
e
0
o
r
le
ss.
The
domain
of
f
(
x
)
=
log
a
x
is
(0
,
∞
)
.
As
x
app
roaches
0
,
log
a
x
go
es
to
−∞
.
y
=
log
a
x
has
an
x
intercept
at
(1
,
0)
.
7
0.1.1
Graphs
of
Basic
F
unctions
y
=
log
a
x
grows
more
and
mo
re
slo
wly
as
x
increases.
This
effect
is
more
p
ronounced
fo
r
la
rger
values
of
a
.
x
y
Figure
7:
Graphs
of
y
=
log
2
x
,
y
=
ln
x
and
y
=
log
10
x
Loga
rithms
and
exp
onents
are
inverse
functions.
We
solve
exponential
equations
b
y
applying
a
loga
rithm
to
b
oth
sides.
W
e
solve
logarithm
equations
by
exp
onentiating
b
oth
sides.
a
x
=
c
x
=
log
a
c
log
a
x
=
c
x
=
a
c
0.1.2
Graphs
of
T
ransfo
rmations
Supp
ose
w
e
would
lik
e
to
transform
the
graph
y
=
f
(
x
)
.
Here
are
four
wa
ys
we
can.
The
graph
of
y
=
af
(
x
)
is
stretched
by
a
factor
of
a
in
the
y
direction.
The
graph
of
y
=
f
(
x
)
+
b
is
shifted
by
b
in
the
p
ositive
y
direction.
The
graph
of
y
=
f
(
cx
)
is
compressed
by
a
factor
of
c
in
the
x
direction.
The
graph
of
y
=
f
(
x
+
d
)
is
shifted
by
d
in
the
negative
x
direction.
W
e
can
p
erform
multiple
transfo
rmations
on
a
single
function.
8
Click to Load Applet
Figure
8:
The
graph
of
y
=
f
(
x
)
and
its
transfo
rmation
y
=
af
(
cx
+
d
)
+
b
9
0.2
Limits
and
Derivatives
Goals:
1
Verify
that
a
function
is
continuous.
2
Compute
derivatives.
3
Use
derivatives
to
understand
graphs
and
vice
versa.
0.2.1
The
Limit
of
A
F
unction
Calculus
is
the
study
of
change.
Our
most
imp
ortant
rate
of
change
cannot
b
e
computed
directly
,
but
exists
only
as
a
limit
of
rates.
Definition
0.7
The
limit
as
x
app
roaches
a
(or
x
→
a
)
of
a
function
f
(
x
)
is
denoted
lim
x
→
a
f
(
x
)
.
lim
x
→
a
f
(
x
)
=
L
means
that
we
can
make
f
(
x
)
arbitra
rily
close
to
L
by
restricting
x
to
a
small
enough
neighb
o
rho
o
d
surrounding
a
.
If
there
is
no
L
such
that
lim
x
→
a
f
(
x
)
=
L
,
we
say
that
lim
x
→
a
f
(
x
)
do
es
not
exist
.
Rema
rk
a
rbitrarily
close
means
any
amount
of
closeness
demanded.
W
e
need
to
b
e
able
to
mak
e
f
(
x
)
within
1
10
of
L
,
within
1
1000
of
L
,
within
1
10000000
of
L
and
so
on.
When
proving
that
a
limit
exists,
mathematicians
traditionally
mo
del
this
closeness
with
the
va
riable
ϵ
.
We
indicate
or
verify
a
rbitrary
closeness
with
the
inequalit
y
|
f
(
x
)
−
L
|
<
ϵ
.
By
a
neighb
orhoo
d
we
mean
an
op
en
interval
that
contains
a
.
The
set
{
a
}
is
not
a
neighb
orhoo
d.
If
it
were,
then
every
function
would
limit
to
f
(
a
)
as
x
→
a
.
Mathematicians
generally
restrict
to
neighb
o
rho
o
ds
of
the
form
(
a
−
δ,
a
+
δ
)
,
then
they
need
a
wa
y
to
produce
a
valid,
p
ositive
δ
for
any
given
p
ositive
ϵ
.
10
Click to Load Applet
Figure
9:
A
neighb
orhoo
d
of
a
that
keeps
f
(
x
)
within
ϵ
of
L
0.2.2
Continuit
y
Limits
give
us
a
formal
approach
to
defining
continuit
y
.
Many
of
our
results
will
rely
on
the
fact
that
a
function
is
continuous.
Definition
0.8
A
function
f
(
x
)
is
continuous
at
a
,
if
lim
x
→
a
f
(
x
)
=
f
(
a
)
.
f
(
x
)
can
also
be
continuous
on
an
interval
o
r
other
set
of
p
oints
if
it
is
continuous
at
each
a
in
that
set.
If
it
is
continuous
on
R
,
we
say
f
(
x
)
is
a
continuous
function.
Proving
that
a
function
is
continuous
requires
us
to
verify
its
limit
at
every
p
oint
a
.
This
is
to
o
much
wo
rk
for
a
case-by-case
basis.
Instead
mathematics
adopts
a
constructive
app
roach.
First
we
show
that
a
few
basic
functions
are
continuous.
Next
w
e
p
rove
that
sums,
differences,
p
ro
ducts
and
other
combinations
p
reserve
continuity
.
11
0.2.2
Continuit
y
Theo
rem
0.9
The
follo
wing
functions
are
continuous
on
their
domains
1
Constant
functions
2
Linear
functions
3
Polynomials
4
Ro
ots
5
Exp
onential
functions
6
Logarithms
7
T
rigonometric
functions
8
f
(
x
)
=
|
x
|
Theo
rem
0.10
If
f
(
x
)
and
g
(
x
)
are
continuous
on
their
domains,
then
the
following
are
also
continuous
on
their
domains
1
f
(
x
)
+
g
(
x
)
2
f
(
x
)
−
g
(
x
)
3
f
(
x
)
g
(
x
)
4
f
(
x
)
g
(
x
)
(note
that
any
x
where
g
(
x
)
=
0
is
not
in
the
domain)
5
f
(
x
)
g
(
x
)
as
long
as
f
(
x
)
>
0
6
f
(
g
(
x
))
W
e
can
use
these
theo
rems
together
to
argue
that
complicated
functions
are
continuous.
12
Example
The
function
f
(
x
)
=
4
√
3
x
2
−
17
x
+
2
−
e
x
log
5
x
is
the
difference
of
tw
o
functions.
The
first
is
a
comp
o-
sition
of
a
ro
ot
function
and
p
olynomial
(b
oth
continuous
on
their
domains).
The
second
is
a
quotient
of
an
exp
onential
and
a
loga
rithm
(b
oth
continuous
on
their
domains).
Thus
f
(
x
)
is
the
difference
of
t
wo
continuous
functions
and
is
continuous
on
its
domain.
Rema
rk
Just
about
any
function
w
e
can
write
using
algebraic
expressions
is
continuous
on
its
domain.
This
do
es
not
mean
it
is
continuous
everywhere.
f
(
x
)
=
1
x
is
not
continuous
at
x
=
0
,
for
example.
0.2.3
The
Intermediate
V
alue
Theorem
One
ea
rly
intuition
for
continuity
is
that
the
graph
of
a
continuous
function
can
b
e
drawn
without
any
breaks.
There
a
re
many
w
a
ys
to
fo
rmalize
this
idea.
One
of
the
most
important
is
the
following
theo
rem.
Theo
rem
0.11
[The
Intermediate
V
alue
Theorem]
If
f
is
a
continuous
function
on
[
a,
b
]
and
K
is
a
number
b
etw
een
f
(
a
)
and
f
(
b
)
,
then
there
is
some
numb
er
c
b
et
ween
a
and
b
such
that
f
(
c
)
=
K
.
Intuitively
,
a
continuous
graph
cannot
get
from
one
side
of
the
line
y
=
K
to
the
other
without
intersecting
y
=
K
.
Notice
that
this
theo
rem
do
es
not
sa
y
exactly
where
this
intersection
must
o
ccur,
only
that
it
must
o
ccur
somewhere
in
the
interval
(
a,
b
)
.
It
also
do
es
not
rule
out
the
p
ossibility
of
mo
re
than
one
such
c
existing.
Example
Sho
w
that
f
(
x
)
=
e
x
−
3
x
has
a
ro
ot
b
etw
een
0
and
1
.
13
0.2.3
The
Intermediate
Value
Theorem
Solution
A
ro
ot
is
a
numb
er
c
such
that
f
(
c
)
=
0
.
T
o
prove
such
a
root
exists,
we
check
the
conditions
of
the
intermediate
value
theo
rem.
f
(
x
)
is
a
sum
of
continuous
functions,
so
it
is
continuous
on
its
domain.
f
(0)
=
1
f
(1)
=
e
−
3
<
0
0
is
b
et
ween
f
(0)
and
f
(1)
W
e
conclude
there
is
some
c
b
etw
een
0
and
1
such
that
f
(
c
)
=
0
.
−
1
1
2
−
1
1
2
y
=
e
x
−
3
x
(0
,
1)
(1
,
3
−
e
)
c
x
y
Figure
10:
A
ro
ot
of
y
=
e
x
−
3
x
0.2.4
The
Derivative
The
derivative
is
a
metho
d
fo
r
measuring
the
rate
of
change
of
a
function.
14
Definition
0.12
Given
a
function
f
(
x
)
,
the
derivative
of
f
(
x
)
at
a
is
the
numb
er
lim
h
→
0
f
(
a
+
h
)
−
f
(
a
)
h
.
The
derivative
function
of
f
(
x
)
is
the
function
lim
h
→
0
f
(
x
+
h
)
−
f
(
x
)
h
.
Here
a
re
tw
o
different
notations
for
the
derivative
at
a
.
1
d
f
dx
(
a
)
(Leibniz)
2
f
′
(
a
)
(Lagrange)
The
ratio
f
(
a
+
h
)
−
f
(
a
)
h
can
b
e
interp
reted
tw
o
wa
ys
1
The
average
rate
of
change
of
f
b
etw
een
a
and
a
+
h
2
The
slop
e
of
a
secant
line
of
y
=
f
(
x
)
from
(
a,
f
(
a
))
to
(
a
+
h,
f
(
a
+
h
))
Since
the
derivative
is
the
limit
of
these,
we
interpret
f
′
(
a
)
as
1
The
instantaneous
rate
of
change
of
f
at
a
2
The
slop
e
of
a
tangent
line
to
y
=
f
(
x
)
at
(
a,
f
(
a
))
Click to Load Applet
Figure
11:
A
secant
line
and
a
tangent
line
to
y
=
f
(
x
)
15
0.2.4
The
Derivative
W
e
can
tak
e
higher
order
derivatives
b
y
taking
derivatives
of
derivatives.
The
derivative
function
of
f
in
this
context
is
called
the
first
derivative
.
Its
derivative
function
is
the
second
derivative
.
The
second
derivative’s
derivative
function
is
the
third
derivative
and
so
on.
Notation
The
follo
wing
notations
are
used
fo
r
higher
order
derivatives
name
Lagrange
notation
Leibniz
notation
first
derivative
f
′
(
x
)
d
f
dx
second
derivative
f
′′
(
x
)
d
2
f
dx
2
third
derivative
f
′′′
(
x
)
d
3
f
dx
3
fourth
derivative
f
(4)
(
x
)
d
4
f
dx
4
fifth
derivative
f
(5)
(
x
)
d
5
f
dx
5
The
sign
of
a
higher
o
rder
derivative
tells
us
how
the
derivative
of
one
o
rder
lo
wer
is
changing.
F
or
example
if
d
5
f
dx
5
<
0
,
then
d
4
f
dx
4
is
decreasing.
0.2.5
Computing
Derivatives
The
limit
definition
of
a
derivative
is
to
o
unwieldy
to
use
every
time.
Instead
calculus
students
learn
the
derivatives
of
some
basic
functions.
They
then
use
theo
rems
to
compute
derivatives
when
those
functions
a
re
combined.
16
Derivatives
of
Basic
F
unctions
d
dx
c
=
0
(derivative
of
a
constant
is
0
)
d
dx
x
n
=
nx
n
−
1
fo
r
any
n
=
0
(The
P
ow
er
Rule)
d
dx
e
x
=
e
x
d
dx
a
x
=
a
x
ln
a
for
a
>
0
d
dx
ln
x
=
1
x
d
dx
log
a
x
=
1
x
ln
a
The
follo
wing
rules
allow
us
to
differentiate
functions
made
of
simpler
functions
whose
derivative
we
already
kno
w.
Differentiation
Rules
Sum
Rule:
(
f
(
x
)
+
g
(
x
))
′
=
f
′
(
x
)
+
g
′
(
x
)
Constant
Multiple
Rule:
(
cf
(
x
))
′
=
cf
′
(
x
)
Pro
duct
Rule:
(
f
(
x
)
g
(
x
))
′
=
f
′
(
x
)
g
(
x
)
+
g
′
(
x
)
f
(
x
)
Quotient
Rule:
f
(
x
)
g
(
x
)
′
=
f
′
(
x
)
g
(
x
)
−
g
′
(
x
)
f
(
x
)
(
g
(
x
))
2
unless
g
(
x
)
=
0
Chain
Rule:
(
f
(
g
(
x
))
′
=
f
′
(
g
(
x
))
g
′
(
x
)
0.2.6
Computing
a
Derivative
Compute
d
2
dx
2
(4
√
75
−
3
x
2
+
10)
17
0.2.6
Computing
a
Derivative
Solution
W
e
b
egin
b
y
computing
the
first
derivative.
We
use
the
chain
rule
with
4
x
1
/
2
+
10
as
the
outer
function
and
75
−
3
x
2
as
the
inner
function.
d
dx
(4
p
75
−
3
x
2
+
10)
=
2(75
−
3
x
2
)
−
1
/
2
(
−
6
x
)
=
−
12
x
√
75
−
3
x
2
W
e
compute
the
second
derivative
by
differentiating
the
first
derivative.
We
use
the
quotient
rule.
We
need
the
chain
rule
again
when
w
e
differentiate
the
denominato
r.
d
2
dx
2
(4
p
75
−
3
x
2
+
10)
=
−
d
dx
12
x
√
75
−
3
x
2
=
−
d
dx
(12
x
)
√
75
−
3
x
2
−
d
dx
(
√
75
−
3
x
2
)(12
x
)
(
√
75
−
3
x
2
)
2
=
−
12
√
75
−
3
x
2
+
3
x
√
75
−
3
x
2
(12
x
)
75
−
3
x
2
=
−
12
√
75
−
3
x
2
+
36
x
2
√
75
−
3
x
2
75
−
3
x
2
W
e
can
obtain
a
comm
on
denominato
r
to
simplify
this
expression.
d
2
dx
2
(4
p
75
−
3
x
2
+
10)
=
−
12(
√
75
−
3
x
2
)
2
√
75
−
3
x
2
+
36
x
2
√
75
−
3
x
2
75
−
3
x
2
=
−
12(75
−
3
x
2
)
+
36
x
2
(75
−
3
x
2
)
3
/
2
=
−
900
(75
−
3
x
2
)
3
/
2
18
0.3
Multiva
riable
F
unctions
Goals:
1
Compute
partial
derviatives.
2
Recognize
continuous
multivariable
functions.
3
Apply
the
chain
rule
to
comp
ositions
of
multivariable
functions.
0.3.1
Multi-V
ariable
F
unctions
Most
interesting
phenomena
are
not
describ
ed
by
a
single
va
riable.
We
will
need
to
develop
metho
ds
fo
r
optimizing
multiva
riable
functions.
There
are
many
w
ays
to
denote
multivariable
domains
and
the
functions
on
them.
This
is
how
we
will
denote
them.
Notation
An
n
-vecto
r
is
an
o
rdered
set
of
n
numbers
called
components
.
Fo
r
instance
a
=
(5
,
−
√
17
,
12
.
31)
is
a
3
-vecto
r.
W
e
add
a
vector
a
rrow,
as
in
a
,
to
indicate
that
a
is
a
vector.
The
comp
onents
of
a
vector
can
b
e
denoted
abstractly
by
subscripts:
x
=
(
x
1
,
x
2
,
.
.
.
x
n
)
.
The
x
i
do
not
have
a
rrows
bec
ause
they
a
re
numb
ers,
not
vectors.
n
-space
,
the
set
of
all
n
-dimensional
vecto
rs,
is
denoted
R
n
.
0
is
the
zero
vecto
r
:
(0
,
0
,
0
,
.
.
.
,
0)
.
The
dimension
should
b
e
clear
from
context.
The
vectors
e
1
,
e
2
,
.
.
.
,
e
n
a
re
the
standard
basis
vecto
rs
of
R
n
.
e
i
has
1
in
the
i
th
comp
onent
and
0
in
the
others.
Fo
r
example
e
2
=
(0
,
1
,
0
,
.
.
.
,
0)
.
An
n
-va
riable
function
f
(
x
1
,
x
2
,
.
.
.
,
x
n
)
can
b
e
written
f
(
x
)
.
19
0.3.1
Multi-V
ariable
F
unctions
Rema
rk
Using
a
common
letter
with
an
index
variab
le
x
1
,
x
2
,
x
3
.
.
.
is
a
go
o
d
choice
fo
r
a
large
o
r
an
unknown
number
of
variables.
W
e
will
use
this
notation
when
developing
the
theory
of
multiva
riable
optimization.
In
many
economics
problems,
there
is
a
fixed,
small
numb
er
of
variables.
In
these
problems,
it
is
more
convenient
to
use
different
letters
fo
r
each
variable,
to
avoid
k
eeping
track
of
subscripts.
x,
y
,
z
,
.
.
.
Even
b
etter,
w
e
should
try
to
cho
ose
descriptive
variable
names
like
q
fo
r
quantity
or
p
for
price.
W
e
visualize
functions
with
their
graphs.
The
height
of
the
graph
over
a
point
in
the
domain
rep
resents
the
value
of
the
function
at
that
p
oint.
This
allows
us
to
detect
visually
where
the
function
is
la
rge
or
small,
increasing
o
r
decreasing.
Definition
0.13
Given
an
n
-variable
function
f
(
x
)
,
the
graph
of
f
is
the
set
of
p
oints
(
x
1
,
x
2
,
.
.
.
,
x
n
,
y
)
in
R
n
+1
that
satisfy
y
=
f
(
x
1
,
x
2
,
.
.
.
,
x
n
)
.
Click to Load Applet
Figure
12:
The
graph
y
=
p
36
−
4
x
2
1
−
x
2
2
and
the
height
sho
wing
f
(1
,
4)
.
20
Rema
rk
In
general,
a
graph
of
the
form
y
=
f
(
x
1
,
x
2
)
will
b
e
hard
to
visualize.
F
or
more
than
three
va
riables,
this
visualization
b
ecomes
imp
ossible.
So
why
bother?
Graphs
are
a
useful
visual
aid
to
the
reasoning
b
ehind
our
metho
ds.
As
we
p
rogress,
it
is
useful
to
have
an
protot
ypical
tw
o-variable
graph
in
your
head.
Y
ou
can
apply
our
metho
ds
to
that
graph,
whether
o
r
not
you
have
an
algebraic
expression
to
go
with
it.
0.3.2
P
artial
Derivatives
Our
optimization
tools
rely
on
the
abilit
y
to
measure
rates
of
change.
Fo
r
a
function
of
multiple
va
riables,
there
a
re
many
rates
of
change,
b
ecause
there
a
re
many
wa
ys
in
which
the
input
va
riables
can
change.
The
simplest
are
those
where
only
one
variable
is
changing
while
the
others
remain
fixed.
Definition
0.14
The
partial
derivative
of
f
with
resp
ect
to
x
i
is
a
function
of
x
.
It
measures
the
rate
of
change
of
f
at
x
as
x
i
increases
but
the
other
co
o
rdinates
remain
constant.
The
formula
is
lim
h
→
0
f
(
x
+
h
e
i
)
−
f
(
x
)
h
Here
a
re
tw
o
different
notations
for
the
partial
derivative.
1
∂
f
∂
x
i
(
x
)
(Leibniz)
2
f
x
1
(
x
)
(Lagrange)
Each
has
advantages,
so
we
will
use
b
oth.
When
it
will
not
cause
confusion,
w
e
can
shorten
Lagrange’s
notation
from
f
x
i
(
x
)
to
f
i
(
x
)
.
In
the
tw
o
variable
case,
w
e
can
interpret
f
1
(
x
)
as
the
slop
e
of
the
tangent
line
to
the
graph
y
=
f
(
x
1
,
x
2
)
in
the
x
1
-direction.
Higher-dimensional
partial
derivatives
are
also
slop
es,
but
are
harder
to
visualize.
21
0.3.2
P
artial
Derivatives
Click to Load Applet
Figure
13:
The
tangent
line
to
y
=
f
(
x
1
,
x
2
)
in
the
x
1
-direction
Computing
a
partial
derivative
requires
us
to
treat
the
non-changing
variables
as
constants.
Then
w
e
can
p
erfo
rm
ordina
ry
single-variable
differentiation
with
the
resp
ect
to
the
variable
that
is
changing.
0.3.3
Computing
a
P
artial
Derivative
The
p
rofit
of
a
firm
with
a
Cobb-Douglas
production
function
might
b
e
mo
deled
by
π
(
L,
K
)
=
pL
α
K
β
−
w
L
−
r
K.
W
e
can
compute
the
partial
derivative
π
L
(
L,
K
)
,
which
measures
the
ma
rginal
effect
of
hiring
more
lab
o
r.
Solution
Since
π
L
(
L,
K
)
is
a
partial
derivative,
we
can
treat
K
as
a
constant.
That
means
that
neither
K
β
no
r
r
K
is
changing.
W
e
treat
K
β
as
a
constant
multiple
of
the
monomial
function
L
α
.
W
e
treat
r
K
as
a
constant
term
with
derivative
0
.
π
L
(
L,
K
)
=
pαL
α
−
1
K
β
−
w
22
0.3.4
Multiva
riable
Limits
and
Continuity
Definition
0.15
A
multiva
riable
function
f
(
x
)
is
continuous
at
a
,
if
lim
x
→
a
f
(
x
)
=
f
(
a
)
In
order
to
verify
this
limit,
we
must
check
that
f
(
x
)
can
b
e
made
a
rbitra
rily
close
to
f
(
a
)
b
y
restricting
to
a
sufficiently
small
neighb
orhoo
d
of
a
.
This
neighb
orhoo
d
allows
for
travel
in
infinitely
many
directions
from
a
,
rather
than
just
fo
rw
ards
and
backwa
rds
lik
e
a
one-va
riable
limit.
This
makes
multiva
riable
limits
difficult
to
compute
rigorously
and
multivariable
continuity
difficult
to
verify
directly
.
F
ortunately
,
we
can
use
the
same
app
roach
we
used
for
single-variable
functions.
V
ariant
of
Theo
rem
0.9
The
follo
wing
multivariable
functions
a
re
continuous
on
their
domains
1
Constant
functions
2
Linear
functions
3
Polynomials
4
Ro
ots
5
Exp
onential
functions
6
Logarithms
7
T
rigonometric
functions
23
0.3.4
Multivariable
Limits
and
Continuity
V
ariant
of
Theo
rem
0.10
If
f
(
x
)
and
g
(
x
)
are
continuous
on
their
domains,
and
c
is
a
constant,
then
the
following
are
also
continuous
on
their
domains
1
f
(
x
)
+
g
(
x
)
2
f
(
x
)
−
g
(
x
)
3
f
(
x
)
g
(
x
)
4
f
(
x
)
g
(
x
)
(note
that
any
x
where
g
(
x
)
=
0
is
not
in
the
domain)
5
f
(
x
)
g
(
x
)
as
long
as
f
(
x
)
>
0
6
f
(
g
(
x
))
where
f
(
x
)
is
a
one-variable
function
Multiva
riable
continuit
y
b
ecomes
imp
ortant
when
discussing
derivatives.
Pa
rtial
derivatives
do
not
use
multiva
riable
limits.
They
use
a
limit
as
a
single
variable
h
go
es
to
0
.
Fo
r
this
reason,
we
a
re
not
gua
ranteed
that
partial
derivatives
reliably
mo
del
the
shap
e
of
a
function.
Example
Consider
the
function
f
(
x
1
,
x
2
)
=
(
0
if
x
2
≤
0
x
1
if
x
2
>
0
This
function
is
0
when
x
1
=
0
o
r
x
2
=
0
.
Thus
the
partial
derivatives
at
(0
,
0)
a
re
f
1
(0
,
0)
=
0
f
2
(0
,
0)
=
0
If
we
increase
x
1
while
holding
x
2
constant
or
increase
x
2
while
holding
x
1
constant,
then
the
function
sta
ys
constant
at
0
.
This
do
es
not
reflect
the
fact
that
if
w
e
increase
b
oth
x
1
and
x
2
at
(0
,
0)
,
the
function
will
have
a
p
ositive
slop
e.
Many
theorems
rely
on
a
function
b
ehaving
consistently
with
its
partial
derivatives
no
matter
which
direction
w
e
travel.
The
following
property
will
usually
serve
that
purp
ose.
Definition
0.16
A
function
f
(
x
)
is
continuously
differentiable
,
if
all
the
pa
rtial
derivative
functions
f
i
(
x
)
are
continuous
functions.
If
instead
they
are
all
continuous
at
a
p
oint
a
,
we
say
f
(
x
)
is
continuously
differentiable
at
a
.
24
0.3.5
The
Chain
Rule
Ho
w
do
mo
del
the
change
of
a
multivariable
function
when
mo
re
than
one
input
va
riable
is
changing?
W
e
write
each
input
variable
as
a
function
of
a
parameter.
F
o
r
instance,
if
x
1
and
x
2
a
re
b
oth
changing
w
e
can
write
each
as
a
function
of
a
parameter
t
.
W
e
can
combine
these
to
define
a
vector
function:
x
(
t
)
=
(
x
1
(
t
)
,
x
2
(
t
))
.
f
(
x
(
t
))
is
a
comp
osition
of
functions.
If
w
e
have
defined
x
1
(
t
)
and
x
2
(
t
)
to
correctly
mo
del
the
change
w
e
want
in
x
1
and
x
2
,
then
the
derivative
of
f
(
x
(
t
))
will
tell
us
how
f
is
changing
as
well.
Notice
f
(
x
(
t
))
is
a
single
variable
function.
The
value
of
t
determines
its
value
completely
.
The
multivariable
chain
rule
computes
its
derivative
using
x
(
t
)
and
the
gradient
of
f
(
x
)
.
Definition
0.17
Given
a
function
f
(
x
)
,
the
gradient
vecto
r
of
f
at
x
is
∇
f
(
x
)
=
(
f
1
(
x
)
,
f
2
(
x
)
,
.
.
.
,
f
n
(
x
))
Theo
rem
0.18
[The
Multivariable
Chain
Rule]
Supp
ose
f
(
x
)
is
a
continuously
differentiable,
n
-variable
function.
If
x
(
t
)
is
differentiable,
then
the
derivative
of
the
comp
osition
f
(
x
(
t
))
with
resp
ect
to
t
is
d
f
dt
(
t
)
=
∇
f
(
x
(
t
))
·
x
′
(
t
)
o
r
d
f
dt
(
t
)
=
n
X
i
=1
f
i
(
x
(
t
))
x
′
i
(
t
)
W
e
should
avoid
using
the
notations
f
′
(
t
)
and
f
t
(
t
)
fo
r
derivatives
of
comp
ositions.
Instead,
we
use
Leibniz
notation.
This
makes
the
variable
of
differentiation
clear
without
implying
that
w
e
are
computing
a
pa
rtial
derivative.
25
0.3.6
Applying
the
Chain
Rule
Supp
ose
f
(
x
1
,
x
2
)
=
ln
x
1
x
2
.
If
x
1
(
t
)
=
t
2
and
x
2
(
t
)
=
e
t
,
compute
d
dt
f
(
x
1
(
t
)
,
x
2
(
t
))
.
Solution
Acco
rding
to
the
chain
rule
d
dt
f
(
x
1
(
t
)
,
x
2
(
t
))
=
f
x
1
(
x
1
(
t
)
,
x
2
(
t
))
x
′
1
(
t
)
+
f
x
2
(
x
1
(
t
)
,
x
2
(
t
))
x
′
2
(
t
)
=
1
x
1
(
t
)
x
2
(
t
)
x
′
1
(
t
)
−
ln
x
1
x
2
2
x
′
2
(
t
)
=
1
t
2
e
t
2
t
−
ln
t
2
e
2
t
e
t
=
2
te
t
−
ln
t
2
e
t
=
2
−
t
ln
t
2
te
t
Rema
rk
The
multivariable
chain
rule
is
not
useful
for
direct
calculations.
Substituting
the
expressions
fo
r
x
1
(
t
)
and
x
2
(
t
)
w
ould
have
given
us
f
(
x
1
(
t
)
,
x
2
(
t
))
=
ln
t
2
e
t
.
W
e
can
differentiate
this
using
single-va
riable
methods
to
obtain
the
same
answer.
The
multivariable
chain
rule
will
instead
serve
us
w
ell
in
mo
re
abstract
situations.
0.3.7
Alternate
Notations
and
the
Chain
Rule
The
multivariable
chain
rule
is
easiest
to
state
when
w
e
give
the
comp
onent
functions
names
that
match
the
va
riables
of
f
.
When
this
is
not
the
case,
we
need
to
take
care
with
our
notation.
26
Example
Let
f
(
x,
y
)
b
e
a
continuously
differentiable
function.
Let
x
and
y
b
e
defined
by
differentiable
functions
g
(
t
)
and
h
(
t
)
resp
ectively
.
The
chain
rule
states
that
d
dt
f
(
g
(
t
)
,
h
(
t
))
=
f
x
(
g
(
t
)
,
h
(
t
))
g
′
(
t
)
+
f
y
(
g
(
t
)
,
h
(
t
))
h
′
(
t
)
.
W
e
do
not
write
f
g
o
r
f
h
in
this
example.
f
is
defined
as
a
function
of
x
and
y
,
so
those
are
the
only
pa
rtial
derivatives
it
has.
Some
applications
use
one
of
the
variables
of
f
as
the
parameter.
The
simplest
example
gives
an
alternate
fo
rmulation
for
the
pa
rtial
derivative.
Example
Let
f
(
x
1
,
x
2
)
b
e
a
continuously
differentiable
function.
Let
x
1
b
e
the
identity
function
of
itself
and
let
x
2
to
b
e
a
constant
function
of
x
1
.
x
1
=
x
1
dx
1
dx
1
=
1
x
2
=
c
dc
dx
1
=
0
The
rate
of
change
with
this
parameterization
should
reflect
that
x
1
is
changing
and
x
2
is
not.
That
is
exactly
what
the
chain
rule
tells
us.
d
dx
1
f
(
x
1
,
c
)
=
f
x
1
(
x
1
,
c
)
dx
1
dx
1
+
f
x
2
(
x
1
,
c
)
dc
dx
1
=
f
x
1
(
x
1
,
c
)(1)
+
f
x
2
(
x
1
,
c
)(0)
=
f
x
1
(
x
1
,
c
)
W
e
will
build
up
on
this
formulation
when
w
e
compute
comparative
statics.
Finally
,
we
should
note
that
the
chain
rule
applies
when
the
x
i
a
re
multiva
riable
functions
of
a
vector
t
.
In
this
case,
f
(
x
(
t
))
is
a
function
of
t
and
thus
w
e
can
compute
its
pa
rtial
derivatives.
27
0.3.7
Alternate
Notations
and
the
Chain
Rule
Generalization
of
Theo
rem
0.18
Supp
ose
f
(
x
)
is
a
continuously
differentiable,
n
-variable
function.
If
x
i
(
t
)
are
differe
ntiable
functions
of
the
va
riables
t
j
,
then
the
pa
rtial
derivative
of
the
comp
osition
f
(
x
(
t
))
with
resp
ect
to
t
k
is
∂
f
∂
t
k
(
t
)
=
∇
f
(
x
(
t
))
·
x
k
(
t
)
o
r
∂
f
∂
t
k
(
t
)
=
n
X
i
=1
f
i
(
x
(
t
))
∂
x
i
∂
t
k
(
t
)
This
generalization
follo
ws
immediately
from
treating
each
t
j
except
t
k
as
a
constant.
0.3.8
Proving
the
Multiva
riable
Chain
Rule
The
proof
of
the
multivariable
chain
rule
uses
the
same
to
ols
as
the
single
variable
chain
rule
(and
p
ro
duct
rule).
How
ever,
multivariable
limits
a
re
much
more
difficult
to
verify
than
single
va
riable
limits.
T
o
check
that
lim
x
→
a
f
(
x
)
=
L
,
we
have
to
consider
values
of
x
in
every
direction
from
a
,
not
just
fo
rw
ard
o
r
backwa
rds
along
a
line.
We
will
sk
etch
the
proof
for
the
case
where
f
is
a
tw
o-variable
function.
Even
the
sk
etch
is
quite
technical.
It
contains
no
arguments
that
are
important
enough
to
commit
to
memo
ry
.
Pro
of
W
e
apply
the
definition
of
a
derivative
d
f
dt
=
lim
h
→
0
f
(
x
(
t
+
h
))
−
f
(
x
(
t
))
h
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
h
W
e
b
reak
up
this
limit
into
a
sum
of
t
wo
limits
b
y
adding
and
subtracting
a
term
and
regrouping
the
result
(assuming
the
limit
of
each
summand
exists)
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
+
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
h
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
h
+
lim
h
→
0
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
h
28
Next
w
e
multiply
each
limit
by
1
,
represented
as
a
quotient
of
an
expression
divided
by
itself.
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
h
x
1
(
t
+
h
)
−
x
1
(
t
)
x
1
(
t
+
h
)
−
x
1
(
t
)
+
lim
h
→
0
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
h
x
2
(
t
+
h
)
−
x
2
(
t
)
x
2
(
t
+
h
)
−
x
2
(
t
)
Naturally
x
i
(
t
+
h
)
−
x
i
(
t
)
x
i
(
t
+
h
)
−
x
i
(
t
)
evaluates
to
0
0
when
h
=
0
.
This
is
not
a
problem
for
a
limit.
How
ever,
if
it
also
evaluates
to
0
0
at
other
values
of
h
,
no
matter
how
small
a
neighborhoo
d
of
h
=
0
we
choose,
then
another
approach
is
needed.
In
this
case,
the
entire
term
will
limit
to
0
,
but
we
omit
the
formal
a
rgument
from
this
sketch.
Instead
we
assume
all
is
well,
and
we
reo
rganize
each
p
ro
duct
b
y
sw
apping
denominato
rs.
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
x
1
(
t
+
h
)
−
x
1
(
t
)
x
1
(
t
+
h
)
−
x
1
(
t
)
h
+
lim
h
→
0
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
x
2
(
t
+
h
)
−
x
2
(
t
)
x
2
(
t
+
h
)
−
x
2
(
t
)
h
W
e
break
up
the
result
as
a
p
ro
duct
of
limits
(assuming
the
limit
of
each
factor
exists)
=
lim
h
→
0
f
(
x
1
(
t
+
h
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
x
1
(
t
+
h
)
−
x
1
(
t
)
lim
h
→
0
x
1
(
t
+
h
)
−
x
1
(
t
)
h
+
lim
h
→
0
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
))
x
2
(
t
+
h
)
−
x
2
(
t
)
lim
h
→
0
x
2
(
t
+
h
)
−
x
2
(
t
)
h
The
second
factor
of
each
product
now
looks
lik
e
a
derivative.
T
o
make
first
factors
look
more
like
derivatives,
we
let
j
=
x
1
(
t
+
h
)
−
x
1
(
t
)
and
k
=
x
2
(
t
+
h
)
−
x
2
(
t
)
.
Thes
e
quantities
go
to
0
as
h
→
0
.
Our
limits
can
b
e
rewritten
as
=
lim
(
h,j
)
→
(0
,
0)
f
(
x
1
(
t
)
+
j,
x
2
(
t
+
h
))
−
f
(
x
1
(
t
)
,
x
2
(
t
+
h
))
j
lim
h
→
0
x
1
(
t
+
h
)
−
x
1
(
t
)
h
+
lim
(
h,k
)
→
(0
,
0)
f
(
x
1
(
t
)
,
x
2
(
t
)
+
k
)
−
f
(
x
1
(
t
)
,
x
2
(
t
))
k
lim
h
→
0
x
2
(
t
+
h
)
−
x
2
(
t
)
h
A
t
this
p
oint
w
e
have
four
limits,
each
of
which
lo
oks
lik
e
the
definition
of
a
derivative.
We
can
replace
each
one
with
its
derivative
notation.
In
general,
w
e
cannot
evaluate
a
multivariable
limit
b
y
handling
one
variable
at
a
time,
but
the
fact
that
the
partial
derivatives
a
re
continuous
allows
us
to
do
so
here.
The
details
of
this
kind
of
argument
are
covered
in
an
analysis
course.
=
lim
h
→
0
f
1
(
x
1
(
t
)
,
x
2
(
t
+
h
))
x
′
1
(
t
)
+
lim
h
→
0
f
2
(
x
1
(
t
)
,
x
2
(
t
))
x
′
2
(
t
)
=
f
1
(
x
1
(
t
)
,
x
2
(
t
))
x
′
1
(
t
)
+
f
2
(
x
1
(
t
)
,
x
2
(
t
))
x
′
2
(
t
)
This
is
the
dot
p
ro
duct
=
∇
f
(
x
(
t
))
·
x
′
(
t
)
One
can
adapt
this
proof
to
a
higher
dimension
by
breaking
the
limit
into
mo
re
summands.
The
theo
rem
we
gave
is
even
mo
re
general,
b
ecause
applies
to
an
n
-va
riable
function.
T
o
p
rove
that
version,
w
e
would
use
a
p
ro
of
by
induction.
29
0.3.8
Proving
the
Multivariable
Chain
Rule
30
Chapter
1
Unconstrained
Optimization
1.1
Single-V
a
riable
Optimization
Goals:
1
Know
the
definition
of
a
lo
cal
or
global
maximizer.
2
Apply
the
first-
and
second-order
conditions
to
calculate
maximizers.
3
Distinguish
b
etw
een
necessa
ry
and
sufficient
conditions
and
recognize
the
role
of
each
in
optimiza-
tion.
4
Understand
the
role
of
the
derivative
in
proving
the
first-
and
second-order
conditions.
Some
of
the
most
imp
o
rtant
metho
ds
in
calculus
are
those
that
identify
maximizers
and
minimizers
of
functions.
This
section
gives
precise
theo
rems
to
describ
e
those
methods.
W
e
will
also
examine
the
distinct
but
complementary
roles
play
ed
by
necessa
ry
conditions
and
sufficient
conditions.
Finally
,
w
e
will
give
a
reasonably
compact
fo
rmal
basis
to
prove
the
theorems
of
this
section
and
supp
ort
the
theo
rems
in
the
sections
that
follow.
1.1.1
The
First-Order
Condition
Given
a
function,
we
a
re
interested
in
what
inputs
of
that
function
will
p
ro
duce
the
la
rgest
or
smallest
values
of
that
function.
These
inputs
are
called
maximizers
and
minimizers.
In
o
rder
to
identify
maximizer
and
minimizers,
w
e
need
to
have
a
rigorous
definition
that
we
can
verify
.
Definition
1.1
Supp
ose
a
lies
in
the
domain
of
a
function
f
(
x
)
.
1
a
is
a
maximizer
of
f
if
f
(
a
)
≥
f
(
x
)
for
all
other
x
in
the
domain
of
f
.
In
this
case
f
(
a
)
is
called
the
maximum
of
f
.
2
a
is
a
lo
cal
maximizer
of
f
if
f
(
a
)
≥
f
(
x
)
fo
r
all
other
x
in
some
neighb
orhoo
d
of
a
.
It
is
a
strict
lo
cal
maximizer
if
f
(
a
)
>
f
(
x
)
instead.
In
either
case
f
(
a
)
is
called
a
lo
cal
maximum
of
f
.
3
a
is
a
minimizer
of
f
if
f
(
a
)
≤
f
(
x
)
for
all
other
x
in
the
domain
of
f
.
In
this
case
f
(
a
)
is
called
the
minimum
of
f
.
4
a
is
a
local
minimizer
of
f
if
f
(
a
)
≤
f
(
x
)
fo
r
all
other
x
in
some
neighborhoo
d
of
a
.
It
is
a
strict
lo
cal
minimizer
if
f
(
a
)
<
f
(
x
)
(than
0)
to
the
left
of
a
and
smaller
values
(than
0)
to
the
right.
Thus,
travelinstead.
In
either
case
f
(
a
)
is
called
a
lo
cal
minimum
of
f
.
32
Rema
rk
When
we
a
re
trying
to
draw
a
contrast
with
a
lo
cal
maximizer,
sometimes
we
use
global
maximizer
or
absolute
maximizer
to
refer
to
a
maximizer
of
a
function.
The
difficulty
in
finding
maximizers
is
that
the
domain
has
infinitely
many
points.
Using
only
the
definition,
w
e
would
need
to
evaluate
them
all,
one
b
y
one,
to
find
a
maximizer.
This
is
obviously
imp
ossible.
Thankfully
,
calculus
gives
us
a
wa
y
to
narro
w
do
wn
the
sea
rch.
Derivatives
measure
the
rate
of
change
of
a
function.
Kno
wing
whether
the
rate
is
positive
or
negative
should
tell
us
how
f
(
a
)
compares
to
nearb
y
values.
Here
is
a
wa
y
to
formally
state
that
relationship.
Lemma
1.2
If
f
′
(
a
)
>
0
,
then
for
all
x
in
some
neighb
orhoo
d
of
a
,
1
f
(
x
)
>
f
(
a
)
if
x
>
a
2
f
(
x
)
<
f
(
a
)
if
x
<
a
.
Figure
1.1:
The
graph
y
=
f
(
x
)
and
a
neighborhoo
d
where
it
stays
close
to
its
tangent
line
Rema
rk
A
lemma
is
a
statement
that
is
not
particula
rly
interesting
on
its
o
wn.
It
is
used
as
a
step
to
proving
a
mo
re
imp
ortant
result.
W
e
will
provide
a
formal
proof
for
this
lemma
later.
The
main
argument
is
that
near
a
,
y
=
f
(
x
)
lies
close
enough
to
the
tangent
line
at
a
to
mimic
its
b
ehavior,
attaining
larger
values
for
x
>
a
and
33
1.1.1
The
First-Order
Condition
smaller
values
for
x
<
a
.
Notice
that
this
b
ehavior
do
es
not
need
to
p
ersist
for
all
x
.
We
cannot
say
ho
w
long
y
=
f
(
x
)
will
stay
close
to
its
tangent
line.
It
may
b
e
that
the
neighb
o
rho
o
d
where
it
do
es
is
quite
small.
W
e
can
make
the
same
a
rgument
for
functions
with
a
negative
derivative,
except
the
b
ehavior
is
backw
ards.
Aside
from
switching
the
direction
of
some
inequalities,
the
p
ro
of
is
identical.
Rather
than
treat
this
result
as
its
o
wn
lemma,
w
e
present
it
as
a
va
riant.
V
ariant
of
Lemma
1.2
If
f
′
(
a
)
<
0
,
then
for
all
x
in
some
neighb
orhoo
d
of
a
,
1
f
(
x
)
<
f
(
a
)
if
x
>
a
2
f
(
x
)
>
f
(
a
)
if
x
<
a
.
The
existence
of
nearb
y
x
such
that
f
(
x
)
>
f
(
a
)
is
inconsistent
with
the
definition
of
a
local
maximizer.
Lemma
1.2
and
its
variant
guarantee
such
x
,
so:
If
f
′
(
a
)
>
0
,
then
a
is
not
a
lo
cal
maximizer.
If
f
′
(
a
)
<
0
,
then
a
is
not
a
lo
cal
maximizer.
W
e
convert
these
statements
to
their
contrap
ositives.
If
a
is
a
lo
cal
maximizer,
then
f
′
(
a
)
is
neither
p
ositive
no
r
negative.
This
gives
us
the
following
condition.
Theo
rem
1.3
[The
First-Order
Condition
(FOC)]
Let
a
b
e
a
lo
cal
maximizer
o
r
lo
cal
minimizer
of
f
(
x
)
.
Either
f
′
(
a
)
do
es
not
exist
o
r
f
′
(
a
)
=
0
.
Definition
1.4
The
values
of
x
that
satisfy
the
first-o
rder
condition
a
re
called
critical
p
oints
.
34
1.1.2
Applying
the
First-Order
Condition
What
do
es
the
first-o
rder
condition
tell
us
ab
out
f
(
x
)
=
8
x
3
−
x
4
?
Solution
The
first-order
condition
tells
us
that
a
local
maximizer
or
minimizer
only
o
ccurs
where
the
derivative
is
0
or
undefined.
The
derivative
of
this
function
is
24
x
2
−
4
x
3
.
It
is
defined
fo
r
all
x
,
so
we
solve
for
where
it
is
0
.
f
′
(
x
)
=
0
24
x
2
−
4
x
3
=
0
4
x
2
(6
−
x
)
=
0
x
=
0
or
x
=
6
This
means
that
no
p
oint
except
x
=
0
or
x
=
6
can
b
e
a
lo
cal
maximizer
or
minimizer.
−
4
−
2
2
4
6
8
10
−
200
200
400
600
y
=
f
(
x
)
x
y
Figure
1.2:
The
graph
of
f
(
x
)
=
8
x
3
−
x
4
Rema
rk
The
first-o
rder
condition
does
not
tell
us
that
either
of
x
=
0
or
x
=
6
must
a
lo
cal
maximizer
or
a
local
minimizer.
In
fact,
x
=
6
is
a
lo
cal
maximizer
but
x
=
0
is
neither.
35
1.1.3
The
Second-Order
Condition
As
the
previous
example
shows,
the
first-order
condition
is
limited
in
its
conclusion.
Kno
wing
the
value
of
f
′
(
a
)
is
not
enough
to
give
us
certaint
y
ab
out
the
shap
e
of
the
graph
near
a
.
Fo
r
that
w
e
need
to
kno
w
how
the
first
derivative
is
changing
at
a
.
The
change
in
the
first
derivative
is
measured
by
the
second
derivative.
The
second
derivative
function,
denoted
f
′′
(
x
)
,
is
the
derivative
of
the
function
f
′
(
x
)
.
The
sign
of
the
second
derivative
allows
us
to
classify
some
critical
p
oints.
Theo
rem
1.5
[The
Second-Order
Condition
(SOC)]
If
f
′
(
a
)
=
0
and
f
′′
(
a
)
<
0
,
then
a
is
a
strict
lo
cal
maximizer
of
f
.
−
2
−
1
1
2
3
4
5
−
10
10
20
y
=
f
(
x
)
y
=
f
′
(
x
)
(2
,
f
(2))
y
=
f
(
x
)
x
y
Figure
1.3:
A
neighb
orhoo
d
where
a
=
2
is
the
maximizer
of
f
(
x
)
The
intuition
behind
this
result
relies
on
the
fact
that
f
′′
(
a
)
is
the
derivative
of
f
′
(
x
)
at
a
.
If
f
′′
(
a
)
<
0
,
then
f
′
(
a
)
tak
es
on
larger
values
(than
0)
to
the
left
of
a
and
smaller
values
(than
0)
to
the
right.
T
raveling
left
to
right,
the
function
increases
until
it
reaches
a
.
After
passing
a
,
it
decreases.
Naturally
,
w
e
have
the
following
va
riant.
V
ariant
of
Theo
rem
1.5
If
f
′
(
a
)
=
0
and
f
′′
(
a
)
>
0
,
then
a
is
a
strict
lo
cal
minimizer
of
f
.
Rema
rk
This
is
sometimes
called
the
lo
cal
second-order
condition,
since
it
gives
info
rmation
ab
out
lo
cal
maximizers.
W
e
cannot
conclude
anything,
if
f
′′
(
a
)
=
0
.
a
may
b
e
a
maximizer,
a
minimizer
or
neither.
36
1.1.4
Applying
the
Second-Order
Condition
What
do
es
the
second-o
rder
condition
tell
us
ab
out
f
(
x
)
=
8
x
3
−
x
4
?
Solution
The
second-o
rder
condition
requires
f
′
(
x
)
=
0
.
We’ve
already
shown
that
this
only
o
ccurs
at
x
=
0
and
x
=
6
.
The
other
part
of
the
condition
requires
us
to
compute
the
second
derivative.
f
′
(
x
)
=
24
x
2
−
4
x
3
f
′′
(
x
)
=
48
x
−
12
x
2
f
′′
(0)
=
0
f
′′
(6)
=
−
144
<
0
This
means
that
x
=
6
is
strict
local
maximizer.
The
second-order
condition
do
es
not
tell
us
anything
ab
out
x
=
0
.
1.1.5
The
Global
Second-Order
Condition
A
firm
seeking
to
increase
its
p
rofits
does
not
have
much
use
fo
r
a
lo
cal
maximizer.
The
excuse:
“Our
strategy
w
as
superior
to
all
numerically
simila
r
strategies”
will
not
imp
ress
a
b
oard
of
directors.
No
r
would
any
rational
acto
r
settle
for
a
mere
lo
cal
maximizer
of
their
utilit
y
function.
Utilit
y
maximizers
and
p
rofit
maximizers
want
to
find
the
global
maximizer.
If
we
know
mo
re
ab
out
the
second
derivative
of
the
utilit
y
function,
we
can
identify
such
a
value.
Theo
rem
1.6
[The
Global
Second-Order
Condition
(GSOC)]
If
f
′
(
x
∗
)
=
0
and
f
′′
(
x
)
<
0
for
all
x
,
then
x
∗
is
the
only
critical
p
oint
of
f
and
is
the
unique
global
maximizer
of
f
.
37
1.1.5
The
Global
Second-Order
Condition
Rema
rk
Unlik
e
the
(lo
cal)
second-order
condition,
this
theo
rem
requires
that
the
second
derivative
is
negative
everywhere,
not
just
at
x
∗
.
In
return
fo
r
a
stronger
requirement,
we
obtain
a
much
stronger
conclusion.
Economists
traditionally
use
x
∗
to
denote
a
global
maximizer.
Thus
w
e
will
use
x
∗
to
denote
a
kno
wn
maximizer
or
any
point
that
will
imminently
b
e
identified
as
a
maximizer.
−
2
2
4
−
2
−
1
1
2
3
(2
,
f
(2))
y
=
f
(
x
)
y
=
f
′
(
x
)
x
y
Figure
1.4:
A
function
and
its
derivative
near
a
maximizer
Unlik
e
in
the
lo
cal
case,
w
e
can
make
some
use
of
a
zero
second
derivative
here.
V
ariants
of
Theo
rem
1.6
1
If
f
′
(
x
∗
)
=
0
and
f
′′
(
x
)
>
0
fo
r
all
x
,
then
x
∗
is
the
only
critical
p
oint
of
f
and
is
the
global
minimizer
of
f
.
2
If
f
′
(
x
∗
)
=
0
and
f
′′
(
x
)
≤
0
fo
r
all
x
,
then
x
∗
is
a
global
maximizer
of
f
(not
necessarily
unique).
3
If
f
′
(
x
∗
)
=
0
and
f
′′
(
x
)
≥
0
for
all
x
,
then
x
∗
is
a
global
minimizer
of
f
(not
necessarily
unique).
38
1.1.6
Sufficient
and
Necessa
ry
Conditions
The
conclusions
w
e
draw
from
the
first-order
condition
a
re
fundamentally
different
from
the
con-
clusions
that
we
dra
w
from
the
second-order
conditions.
Neither
of
them
allows
us
to
fully
answer
the
question
of
which
p
oints
a
re
local
maximizers
of
f
.
We
can
see
the
difference
in
their
structure
F
OC:
If
a
is
a
lo
cal
maximizer,
then
[condition]
SOC:
If
[condition],
then
a
is
a
lo
cal
maximizer
Mathematics
has
the
follo
wing
vocabulary
for
describing
this
difference.
Definition
1.7
Supp
ose
w
e
have
a
condition
P
that
we
are
using
to
detect
whether
a
statement
Q
is
true
or
false.
1
A
statement
of
the
form
“If
Q
then
P
”
indicates
that
P
is
a
necessa
ry
condition
fo
r
Q
.
2
A
statement
of
the
form
“If
P
then
Q
”
indicates
that
P
is
a
sufficient
condition
for
Q
.
Rema
rk
W
e
can
find
uses
fo
r
b
oth
necessa
ry
and
sufficient
conditions,
but
w
e
need
to
be
careful
to
interpret
their
conclusions
co
rrectly
.
If
you
want
to
show
that
Q
is
true,
you
need
to
use
a
sufficient
condition.
A
necessa
ry
condition
will
not
suffice.
If
you
want
to
rule
out
Q
b
eing
true,
one
wa
y
is
to
sho
w
that
a
necessa
ry
condition
is
not
satisfied.
Going
forw
ard,
it
is
imp
ortant
to
identify
each
new
result
as
b
eing
necessary
or
sufficient
(or
maybe
b
oth).
We
can
b
egin
by
seeing
how
these
terms
apply
to
the
first-
and
second-order
conditions.
The
first-order
condition
is
a
necessa
ry
condition.
Y
ou
can
not
have
a
lo
cal
maximizer
without
satisfying
it.
It
is
not
a
sufficient
condition.
A
p
oint
can
satisfy
the
first-order
condition
without
b
eing
a
lo
cal
maximizer.
Example
Let
f
(
x
)
=
x
3
.
The
value
a
=
0
satisfies
the
FOC
but
is
not
a
lo
cal
maximizer
(or
minimizer).
39
1.1.6
Sufficient
and
Necessa
ry
Conditions
This
means
that
the
first-order
condition
can
only
rule
out
values
that
are
not
maximizers.
We
cannot
conclude
that
a
p
oint
is
a
maximizer
just
by
showing
that
it
satisfies
the
first-order
condition.
The
second-order
condition
is
a
sufficient
c
ondition.
If
f
′
(
a
)
=
0
and
f
′′
(
a
)
<
0
then
a
must
b
e
a
lo
cal
maximizer.
It
is
not
a
necessa
ry
condition:
there
ma
y
b
e
lo
cal
maximizers
that
do
not
satisfy
the
second-o
rder
condition.
Example
Let
f
(
x
)
=
−
x
4
.
The
lo
cal
maximizer
a
=
0
do
es
not
satisfy
the
SOC,
b
ecause
f
′′
(0)
=
0
.
The
global
second-order
condition
is
also
a
sufficient
condition
and
not
a
necessary
one.
Fo
r
example,
a
function
can
have
a
global
maximizer
without
satisfying
it.
−
2
2
4
−
1
1
2
3
(1
,
f
(1))
y
=
f
(
x
)
x
y
Figure
1.5:
The
graph
of
f
(
x
)
=
2
x
2
−
2
x
+2
,
which
has
a
global
maximizer
and
unique
critical
p
oint
at
x
∗
=
1
,
but
do
es
not
satisfy
the
GSOC.
Rema
rk
Abstractly
,
there
is
no
difference
b
et
ween
P
and
Q
.
There
are
tw
o
wa
ys
to
view
the
statement:
If
P
,
then
Q
.
P
is
a
sufficient
condition
for
Q
.
Q
is
a
necessa
ry
condition
fo
r
P
Ho
wever,
we
like
to
think
of
conditions
as
statements
that
we
use
to
test
fo
r
what
we
really
ca
re
ab
out.
So
while,
abstractly
,
b
eing
a
lo
cal
maximizer
is
a
sufficient
condition
fo
r
f
′
(
a
)
=
0
,
this
do
es
not
reflect
the
w
ay
w
e
use
the
first-order
condition
in
practice.
Generally
w
e
will
want
to
have
both
necessary
and
conditions
to
test
fo
r
properties
that
we
ca
re
ab
out.
With
the
to
ols
we
have
so
far,
we
can
determine
that
certain
p
oints
are
lo
cal
or
global
maximizers.
W
e
40
can
determine
that
many
p
oints
are
not
lo
cal
maximizers.
Still,
there
may
b
e
p
oints
that
satisfy
the
first-
o
rder
condition
but
not
the
second-order
condition.
W
e
cannot
tell
whether
these
are
lo
cal
maximizers
o
r
not.
The
b
est
condition
would
b
e
a
condition
that
is
b
oth
necessary
and
sufficient.
No
such
tool
exists
fo
r
general
optimization,
so
we
are
on
the
lo
okout
for
additional
conditions
to
apply
when
the
ones
w
e
have
a
re
inconclusive.
Coming
up
with
new
conditions
is
usually
hard
wo
rk,
but
we
can
obtain
one
easily
,
if
w
e
exploit
the
relationship
betw
een
maximizers
and
minimizers.
A
sufficient
condition
fo
r
a
minimizer
can
b
e
turned
into
a
necessary
condition
for
a
maximizer.
If
f
′
(
a
)
=
0
and
f
′′
(
a
)
>
0
then
b
y
a
variant
of
Theo
rem
1.5,
a
is
a
strict
lo
cal
minimizer.
Thus
it
cannot
b
e
a
maximizer.
On
the
other
hand,
if
f
′′
(
a
)
exists
but
f
′
(
a
)
=
0
then
a
is
still
not
a
local
maximizer.
This
leaves
only
the
follo
wing
p
ossibilities
fo
r
the
second
derivative
at
a
lo
cal
maximizer
a
.
Theo
rem
1.8
If
a
is
a
lo
cal
maximizer
of
f
,
then
f
′′
(
a
)
≤
0
or
f
′′
(
a
)
do
es
not
exist.
1.1.7
Proving
the
First-Order
Condition
In
general,
we
exp
ect
a
p
ositive
derivative
to
mean
that
greater
values
of
x
p
ro
duce
greater
values
of
f
(
x
)
.
The
lemma
at
the
heart
of
the
first-order
condition
described
what
w
e
can
conclude
when
a
derivative
is
p
ositive
at
one
p
oint.
The
derivative
is
a
limit,
so
any
formal
argument
needs
to
start
there.
Pro
ofs
ab
out
limits
can
require
extensive
computations
and
creative
problem
solving.
Fo
rtunately
,
w
e
will
only
need
the
following
lemma
ab
out
limits.
Lemma
1.9
If
lim
x
→
a
f
(
x
)
=
L
and
L
>
0
,
then
there
is
a
neighb
orhoo
d
of
a
on
which
f
is
p
ositive.
Pro
of
Since
L
is
p
ositive,
L/
2
is
also
positive.
By
definition
of
a
limit
there
is
some
neighborhoo
d
of
a
in
which
f
(
x
)
is
within
L/
2
of
L
.
We
can
express
that
distance
with
an
absolute
value.
|
f
(
x
)
−
L
|
<
L/
2
−
L/
2
<
f
(
x
)
−
L
<
L/
2
L/
2
<
f
(
x
)
<
3
L/
2
Since
L/
2
is
p
ositive,
so
is
f
(
x
)
.
41
1.1.7
Proving
the
First-Order
Condition
V
ariant
of
Lemma
1.9
If
lim
x
→
a
f
(
x
)
=
L
and
L
<
0
,
then
there
is
a
neighb
orhoo
d
of
a
on
which
f
is
negative.
W
e
are
no
w
in
a
p
osition
to
prove
Lemma
1.2.
Lemma
1.2
If
f
′
(
a
)
>
0
,
then
for
all
x
in
some
neighb
orhoo
d
of
a
,
1
f
(
x
)
>
f
(
a
)
if
x
>
a
2
f
(
x
)
<
f
(
a
)
if
x
<
a
.
Pro
of
Since
f
′
(
a
)
=
lim
h
→
0
f
(
a
+
h
)
−
f
(
a
)
h
>
0
,
Lemma
1.9
gua
rantees
that
there
is
a
neighb
orhoo
d
of
h
values
nea
r
0
where
f
(
a
+
h
)
−
f
(
a
)
h
>
0
.
Let
x
=
a
+
h
.
In
the
corresponding
neighb
orhoo
d
of
x
values
a
round
a
,
we
can
mak
e
the
following
computations.
1
If
x
>
a
then
h
>
0
and
f
(
a
+
h
)
−
f
(
a
)
h
>
0
f
(
x
)
−
f
(
a
)
h
>
0
f
(
x
)
−
f
(
a
)
>
0
f
(
x
)
>
f
(
a
)
2
If
x
<
a
then
h
<
0
and
f
(
a
+
h
)
−
f
(
a
)
h
>
0
f
(
x
)
−
f
(
a
)
h
>
0
f
(
x
)
−
f
(
a
)
<
0
f
(
x
)
<
f
(
a
)
42
Rema
rk
This
p
ro
of
uses
an
ea
rlier
lemma.
If
y
ou
are
ca
refully
reading
this
p
ro
of
to
understand
the
a
rgument,
y
ou
may
need
to
lo
ok
back
at
the
lemma
and
think
ab
out
ho
w
it
is
b
eing
used
here.
Visually
,
the
argument
of
this
p
ro
of
is
that
the
secant
lines
from
a
have
positive
slop
e
in
some
neighb
o
rho
o
d
of
a
.
1.1.8
Proving
the
Second-Order
Conditions
In
o
rder
to
p
rove
the
second-order
condition,
we
need
a
stronger
conclusion
than
Lemma
1.2.
Calculus
teaches
that
a
function
with
p
ositive
derivatives
is
increasing,
while
a
function
with
negative
derivatives
is
decreasing.
In
o
rder
to
mak
e
a
rigorous
argument,
we
should
kno
w
formal
definitions
of
increasing
and
decreasing.
Definition
1.10
Let
f
(
x
)
b
e
a
function,
and
let
I
b
e
an
interval
in
its
domain.
f
(
x
)
is
increasing
if
for
any
numb
ers
a
<
b
in
the
domain
of
f
,
f
(
a
)
<
f
(
b
)
.
f
(
x
)
is
decreasing
if
for
any
numb
ers
a
<
b
in
the
domain
of
f
,
f
(
a
)
>
f
(
b
)
.
43
1.1.8
Proving
the
Second-Order
Conditions
Click to Load Applet
Figure
1.6:
The
graph
of
an
increasing
function
f
′
(
a
)
>
0
is
not
enough
to
guarantee
that
f
(
x
)
increases
in
a
neighborhoo
d
of
a
.
f
(
x
)
may
suffer
from
oscillations
that
p
ersist
arbitra
rily
close
to
a
(see
Figure
1.1).
If
we
want
to
use
a
derivative
to
show
that
a
function
is
increasing,
w
e
need
to
kno
w
the
derivative
is
p
ositive
at
every
p
oint
on
an
interval.
Lemma
1.11
If
f
′
(
x
)
>
0
for
all
x
on
an
interval
I
,
then
f
(
x
)
is
increasing
on
I
.
T
o
p
rove
this
lemma,
w
e
need
to
sho
w
that
when
f
′
(
x
)
>
0
,
the
function
satisfies
the
formal
definition
of
increasing
that
we
saw
ab
ove.
This
is
surp
risingly
a
rduous
to
prove
using
the
definition
of
a
derivative
as
our
sta
rting
point.
Here
are
tw
o
deceptively
short
proofs
Pro
of
Supp
ose
a
<
b
.
Apply
the
fundamental
theo
rem
of
calculus:
f
(
b
)
−
f
(
a
)
=
Z
b
a
f
′
(
x
)
dx
.
The
integrand
is
p
ositive
over
all
of
[
a,
b
]
,
so
the
integral
is
p
ositive
to
o.
Thus
f
(
b
)
>
f
(
a
)
.
44
Click to Load Applet
Figure
1.7:
The
fundamental
theorem
of
calculus
applied
to
a
p
ositive
derivative
Pro
of
Supp
ose
a
<
b
.
Apply
the
mean
value
theo
rem:
There
is
some
c
b
etw
een
a
and
b
such
that
f
(
b
)
−
f
(
a
)
b
−
a
=
f
′
(
c
)
.
Since
c
∈
I
,
we
know
that
f
′
(
c
)
>
0
.
Thus
f
(
b
)
>
f
(
a
)
.
Click to Load Applet
Figure
1.8:
The
p
oint
c
,
whose
tangent
line
has
the
same
slop
e
as
the
secant
from
a
to
b
Y
ou
should
not
b
e
satisfied
b
y
these
proofs.
Each
raises
a
new
question.
1
How
do
you
prove
the
fundamental
theorem
of
calculus?
2
How
do
you
prove
the
mean
value
theorem?
A
satisfactory
argument
would
either
p
rove
the
missing
result
or
find
another
metho
d
that
eliminates
the
need
fo
r
them.
Unfo
rtunately
,
any
of
these
approaches
takes
us
into
concepts
b
eyond
the
scop
e
of
these
notes.
Y
ou
can
exp
ect
to
find
their
formal
proofs
in
an
analysis
course.
Naturally
,
this
lemma
has
variants.
W
e
can
switch
the
direction
of
the
inequalit
y
,
but
we
can
also
relax
the
strictness
of
the
inequalit
y
.
45
1.1.8
Proving
the
Second-Order
Conditions
V
ariants
of
Lemma
1.11
1
If
f
′
(
x
)
<
0
on
an
interval
I
,
then
f
(
x
)
is
decreasing
on
I
.
2
If
f
′
(
x
)
≥
0
on
an
interval
I
,
then
f
(
x
)
is
non-decreasing
on
I
,
meaning
that
if
a
<
b
,
f
(
a
)
≤
f
(
b
)
.
3
If
f
′
(
x
)
≤
0
on
an
interval
I
,
t
hen
f
(
x
)
is
non-increasing
on
I
,
meaning
that
if
a
<
b
,
f
(
a
)
≥
f
(
b
)
.
Moving
forw
a
rd,
w
e
will
see
strict
and
non-strict
variants
so
frequently
that
when
we
don’t,
it
is
w
orth
pondering
why
not.
With
these
lemmas
in
hand,
w
e
a
re
ready
to
prove
the
second-o
rder
condition.
Theo
rem
1.5
If
f
′
(
a
)
=
0
and
f
′′
(
a
)
<
0
,
then
a
is
a
strict
lo
cal
maximizer
of
f
.
Arguments
about
the
second
derivative
usually
rely
on
the
fact
that
f
′′
(
x
)
is
the
derivative
of
f
′
(
x
)
.
W
e
will
apply
Lem
ma
1.9,
letting
f
′
(
x
)
play
the
role
of
the
o
riginal
function
with
f
′′
(
x
)
as
its
derivative.
W
e
can
thus
use
the
sign
of
f
′′
(
a
)
to
compare
the
values
of
f
′
(
x
)
and
f
′
(
a
)
.
Here
is
the
formal
a
rgument.
Pro
of
W
e
supp
ose
f
′
(
a
)
=
0
and
f
′′
(
a
)
<
0
.
f
′′
(
x
)
is
the
derivative
of
f
′
(
x
)
,
and
it
is
negative
at
a
.
A
va
riant
of
Lemma
1.9
applies.
There
is
some
neighb
orhoo
d
I
of
a
where
if
x
>
a
,
then
f
′
(
x
)
<
f
′
(
a
)
=
0
if
x
<
a
,
then
f
′
(
x
)
>
f
′
(
a
)
=
0
.
The
second
inequalit
y
shows,
by
Lemma
1.11,
that
f
(
x
)
is
increasing
to
the
left
of
x
in
I
.
By
definition
of
increasing
this
means
that
f
(
x
)
<
f
(
a
)
for
all
x
<
a
in
I
.
Similarly
,
the
first
inequalit
y
sho
ws
that
f
(
x
)
is
decreasing
to
the
right
of
a
in
I
.
By
definition
of
decreasing
this
means
that
f
(
x
)
<
f
(
a
)
for
all
x
>
a
in
I
.
Thus
f
(
a
)
>
f
(
x
)
for
all
other
x
in
I
.
This
satisfies
the
definition
of
a
strict
lo
cal
maximizer,
so
w
e
conclude
that
a
is
a
strict
lo
cal
maximizer
of
f
.
The
proof
of
the
global
second-order
condition
follows
the
same
logic.
The
only
difference
is
that
w
e
can
apply
Lemma
1.11
instead
of
Lemma
1.9
to
get
a
global
statement
ab
out
the
first
derivative
on
either
side
of
x
∗
.
Here
is
the
full
text
for
the
sake
of
completeness.
46
Pro
of
W
e
suppose
f
′
(
x
∗
)
=
0
and
f
′′
(
x
)
<
0
for
all
x
.
A
variant
of
Lemma
1.11
tells
us
that
f
′
(
x
)
is
decreasing
fo
r
all
x
.
This
means:
if
x
>
x
∗
,
then
f
′
(
x
)
<
f
′
(
x
∗
)
=
0
if
x
<
x
∗
,
then
f
′
(
x
)
>
f
′
(
x
∗
)
=
0
.
The
second
inequality
shows,
by
Lemma
1.11,
that
f
(
x
)
is
increasing
for
all
x
<
x
∗
.
By
definition
of
increasing
this
means
that
f
(
x
)
<
f
(
x
∗
)
for
all
x
<
x
∗
.
Simila
rly
,
the
first
inequality
shows
that
f
(
x
)
is
decreasing
for
all
x
>
x
∗
.
By
definition
of
decreasing
this
means
that
f
(
x
)
<
f
(
x
∗
)
fo
r
all
x
>
x
∗
.
Thus
f
(
x
∗
)
>
f
(
x
)
for
all
other
x
.
This
satisfies
the
definition
of
a
unique
maximizer,
so
we
conclude
that
x
∗
is
a
unique
maximizer
of
f
.
T
o
sho
w
that
x
∗
is
the
only
critical
p
oint,
we
simply
note
that
if
there
were
another
critical
p
oint,
it
would
also
satisfy
this
condition
and
b
e
the
unique
maximizer.
Since
there
can
b
e
only
one
unique
maximizer,
no
such
critical
p
oint
can
exist.
1.1.9
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
first-o
rder
condition
(Theorem
1.3)
The
second-o
rder
condition
(Theorem
1.5)
The
global
second-o
rder
condition
(Theorem
1.6)
Necessa
ry
and
sufficient
conditions
(Definition
1.7)
f
′′
(
x
∗
)
<
0
and
f
′
(
x
∗
)
=
0
f
′′
(
x
)
<
0
for
all
x
and
f
′
(
x
∗
)
=
0
x
∗
is
a
strict
lo
cal
max
x
∗
is
the
unique
global
max
and
only
critical
p
oint
f
′
(
x
∗
)
=
0
o
r
DNE
f
′′
(
x
∗
)
≤
0
or
DNE
SOC
GSOC
FOC
Thm
1.8
Figure
1.9:
Relationships
b
etw
een
the
conditions
of
single
variable
optimization
47
1.2
Concavit
y
Goals:
1
Recognize
convex
sets.
2
Recognize
concave
and
convex
functions
visually
.
3
Understand
the
difference
b
etw
een
strict
and
non-strict
convexity/concavit
y
.
4
Verify
concavity
using
inequalities,
tangent
lines
or
second
derivatives
as
approp
riate.
5
Use
concavity
to
find
maximizers
of
a
function.
1.2.1
Convex
Sets
Our
goal
is
to
understand
what
the
shap
e
of
a
graph
tells
us
ab
out
the
maximizer(s)
of
a
function.
W
e
w
ould
p
refer
methods
that
apply
to
b
oth
single-va
riable
and
multiva
riable
functions.
Fo
r
this,
we
need
to
b
e
able
to
describ
e
shap
es
in
any
dimension.
Most
students
lea
rn
how
to
recognize
a
convex
p
olygon,
but
we
can
define
convexity
in
a
large
va
riety
of
shap
es.
Convex
regions
can
b
e
angular
or
smo
oth.
They
can
exist
in
any
dimension.
T
o
see
this,
we
need
the
formal
definition
of
convexity
.
Definition
1.12
A
region
D
⊆
R
n
is
convex
,
if
the
line
segment
betw
een
any
t
wo
p
oints
a
and
b
in
D
entirely
lies
in
D
.
There
is
no
restriction
on
the
dimension
of
D
.
We
can
even
apply
it
to
regions
in
the
real
line.
Example
If
D
is
a
subset
of
R
1
,
then
D
is
convex
if
and
only
if
it
is
connected.
48
F
or
a
p
olygon,
this
definition
should
match
our
p
revious
intuition
of
convexit
y
.
Convince
y
ourself
that
any
tw
o
points
in
the
convex
p
olygons
y
ou
learned
ab
out
can
b
e
connected
with
a
segment.
In
a
nonconvex
p
olygon
there
is
at
least
one
pair
of
p
oints
whose
line
segment
leaves
the
p
olygon.
Y
ou
can
convince
y
ourself
by
picking
y
our
favorite
nonconvex
p
olygon
and
finding
such
a
segment.
a
b
a
b
Figure
1.10:
A
convex
p
olygon
and
a
nonconvex
p
olygon
W
e
cannot
mak
e
a
rigorous
case
for
convexit
y
b
y
dra
wing
segments.
There
are
to
o
many
to
check.
W
e
can
instead
make
convexit
y
more
algebraic
by
parameterizing
the
line
segment
from
a
to
b
.
x
(
t
)
=
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
Y
ou
can
check
that
this
is
a
line
b
y
writing
it
as
x
(
t
)
=
a
+
t
(
b
−
a
)
0
≤
t
≤
1
x
(0)
=
a
.
As
t
increases,
x
(
t
)
travels
in
the
direction
of
(
b
−
a
)
until
a
rriving
at
b
when
t
=
1
.
T
o
check
fo
r
convexity
of
D
,
w
e
just
need
to
test
that
the
p
oints
of
x
(
t
)
lie
in
D
.
Click to Load Applet
Figure
1.11:
The
line
segment
from
a
to
b
F
urthermore,
w
e
do
not
need
to
check
every
a
and
b
.
Any
a
and
b
whose
segment
leaves
D
will
leave
the
at
some
b
oundary
p
oint
c
and
reenter
at
some
b
ounda
ry
point
d
(Figure
1.12).
Thus
if
w
e
49
1.2.1
Convex
Sets
only
check
ed
segments
b
etw
een
b
oundary
points,
the
segment
b
etw
een
c
and
d
w
ould
b
e
sufficient
to
indicate
that
D
is
not
convex.
W
e
summarize
this
argument
in
the
following
theorem.
a
b
c
d
Figure
1.12:
A
segment
that
leaves
a
nonconvex
region
Theo
rem
1.13
A
region
D
is
convex
if
for
all
a
and
b
in
the
b
ounda
ry
of
D
and
all
t
in
[0
,
1]
,
(1
−
t
)
a
+
t
b
∈
D
W
e
visually
identify
convex
p
olygons
as
p
olygons
where
every
corner
“p
oints
out
wa
rd.”
How
ever,
the
co
rners
of
the
p
olygon
are
the
only
places
that
point
outw
ard.
The
rest
of
the
b
oundary
is
flat,
p
ointing
neither
inw
ard
nor
out
wa
rd.
With
non-p
olygonal
regions,
it
is
p
ossible
to
have
a
b
ounda
ry
that
p
oints
outw
a
rd
everywhere,
not
just
at
a
few
corners.
This
will
b
e
a
useful
property
to
k
eep
track
of,
so
w
e
have
a
name
fo
r
such
regions.
Definition
1.14
A
region
D
is
strictly
convex
,
if
the
segment,
not
including
the
endp
oints,
b
etw
een
any
tw
o
p
oints
a
and
b
in
D
entirely
lies
in
the
interior
of
D
.
Click to Load Applet
Click to Load Applet
Figure
1.13:
Two
convex
regions,
one
strictly
convex
and
the
other
not
strictly
convex
50
Rema
rk
Every
strictly
convex
region
is
also
convex,
but
some
convex
regions
a
re
not
strictly
convex.
We
say
that
strict
convexit
y
is
a
stronger
condition
and
convexity
is
a
weak
er
condition.
W
e
a
re
often
interested
in
the
intersection
of
t
wo
regions.
Fo
r
instance,
if
one
region
is
the
set
of
p
oints
satisfying
one
condition,
while
a
second
region
is
the
set
of
p
oints
satisfying
a
second
conditions,
then
their
intersection
is
the
p
oints
satisfying
b
oth
conditions.
Convexity
behaves
well
in
these
situations.
Theo
rem
1.15
If
D
1
and
D
2
a
re
convex
regions,
then
D
1
∩
D
2
is
convex.
The
p
ro
of
is
a
go
o
d
exercise.
1.2.2
Concave
F
unctions
Our
next
step
is
to
use
convexity
to
describ
e
the
shap
e
of
functions
or,
mo
re
precisely
,
their
graphs.
Most
students
encounter
convex
and
concave
functions
in
calculus,
though
they
a
re
sometimes
called
concave
up
and
concave
do
wn.
With
a
rigorous
definition
of
a
convex
region,
we
a
re
now
in
p
osition
to
define
what
it
means
to
b
e
a
convex
function.
Definition
1.16
Let
f
(
x
)
b
e
a
function
whose
domain
is
convex.
We
say
f
(
x
)
is
convex
,
if
y
≥
f
(
x
)
,
the
region
ab
ove
its
graph,
is
convex.
It
is
concave
if
y
≤
f
(
x
)
,
the
region
b
elo
w
its
graph,
is
convex.
51
1.2.2
Concave
F
unctions
Click to Load Applet
Figure
1.14:
A
concave
function
Rema
rk
Be
careful
when
learning
these
definitions.
Students
often
exp
ect
the
definition
of
concave
function
to
have
something
to
do
with
a
region
b
eing
nonconvex
(which
is
sometimes
called
concave),
but
it
does
not.
Nonconvex
regions
ab
ove
and
b
elo
w
graphs
are
unremark
able.
It
is
when
one
of
regions
is
convex
that
the
graph
is
sp
ecial.
Figure
1.15:
The
graph
of
a
function
that
is
neither
concave
nor
convex
and
the
nonconvex
regions
ab
ove
and
b
elo
w
it.
W
e
might
ask
whether
the
region
ab
ove
o
r
b
elo
w
y
=
f
(
x
)
is
strictly
convex,
rather
than
merely
convex.
If
it
is,
we
can
pass
the
“strictness”
onto
our
description
of
f
.
Definition
1.17
Let
f
(
x
)
b
e
a
function
whose
domain
is
convex.
f
(
x
)
is
strictly
convex
,
if
the
region
y
≥
f
(
x
)
is
strictly
convex.
It
is
strictly
concave
if
the
region
y
≤
f
(
x
)
is
strictly
convex.
52
Rema
rk
Strict
concavity
is
a
stronger
condition
than
concavit
y
.
Every
strictly
concave
function
is
also
concave,
but
some
concave
functions
a
re
not
strictly
concave.
Click to Load Applet
Figure
1.16:
A
function
that
is
concave
but
not
strictly
concave
Notice
that
the
region
b
elow
y
=
f
(
x
)
is
a
mirror
image
of
the
region
ab
ove
y
=
−
f
(
x
)
.
If
one
is
convex,
so
is
the
other.
We
can
make
the
following
connection
b
etw
een
a
function
and
its
negative.
Lemma
1.18
Let
f
(
x
)
b
e
a
function.
1
f
(
x
)
is
concave
if
and
only
if
−
f
(
x
)
is
convex.
2
f
(
x
)
is
strictly
concave
if
and
only
if
−
f
(
x
)
is
strictly
convex.
W
e
will
use
this
lemma
as
an
excuse
to
igno
re
convex
functions.
Any
a
rgument
ab
out
concave
functions
b
ecomes
an
a
rgument
ab
out
convex
functions
by
intro
ducing
a
negative
sign.
53
1.2.3
The
Inequalit
y
T
est
for
Concavit
y
W
e
verify
the
convexity
of
the
region
below
y
=
f
(
x
)
b
y
checking
line
segments
b
etw
een
p
oints
on
the
b
ounda
ry
.
The
b
oundary
in
this
case
is
the
graph
itself.
The
line
segments
are
called
secants
.
Click to Load Applet
Figure
1.17:
A
secant
b
elow
the
graph
of
y
=
5
−
x
2
1
−
x
2
2
W
e
take
tw
o
general
p
oints
(
a,
f
(
a
))
and
(
b,
f
(
b
))
on
y
=
f
(
x
)
.
We
parametrize
the
secant
b
etw
een
them.
x
(
t
)
=(1
−
t
)
a
+
t
b
y
(
t
)
=(1
−
t
)
f
(
a
)
+
tf
(
b
)
0
≤
t
≤
1
W
e
can
write
an
inequalit
y
to
express
the
condition
that
secant
lies
b
elow
the
graph.
Click to Load Applet
Figure
1.18:
A
secant
b
elow
the
graph
of
a
single-variable
function
54
Theo
rem
1.19
A
function
is
concave,
if
and
only
if
for
all
a
and
b
in
its
domain
and
any
0
≤
t
≤
1
we
have
(1
−
t
)
f
(
a
)
+
tf
(
b
)
|
{z
}
height
of
secant
≤
f
((1
−
t
)
a
+
t
b
)
|
{z
}
height
of
y
=
f
(
x
(
t
))
0
≤
t
≤
1
Rema
rks
This
is
an
“if
and
only
if
”
condition.
That
means
it
is
b
oth
necessa
ry
and
sufficient
for
establishing
concavit
y
.
Only
a
function
with
a
convex
domain
can
meet
this
condition.
Otherwise
we
cannot
alw
ays
evaluate
f
((1
−
t
)
a
+
t
b
)
,
b
ecause
(1
−
t
)
a
+
t
b
ma
y
lie
outside
the
domain.
The
theorem
follows
directly
from
the
definitions
we
have
develop
ed
so
far.
Here
is
a
representation
of
our
reasoning.
(1
−
t
)
f
(
a
)
+
tf
(
b
)
≤
f
((1
−
t
)
a
+
t
b
)
for
all
a
,
b
and
t
The
secant
b
etw
een
any
a
and
b
lies
under
the
graph
y
=
f
(
x
)
The
region
b
elow
y
=
f
(
x
)
is
convex
f
is
concave
Applying
the
same
reasoning
to
the
definitions
of
convexity
and
strict
concavit
y/convexity
gives
the
follo
wing
va
riants.
Note
that
we
cannot
demand
a
strict
inequalit
y
at
t
=
0
or
t
=
1
b
ecause
the
secant
will
alw
ays
intersect
the
graph
y
=
f
(
x
)
at
a
and
b
.
55
1.2.3
The
Inequality
T
est
for
Concavity
V
ariants
of
Theo
rem
1.19
1
A
function
is
convex,
if
and
only
if
for
all
a
and
b
in
its
domain
and
any
0
≤
t
≤
1
we
have
(1
−
t
)
f
(
a
)
+
tf
(
b
)
≥
f
((1
−
t
)
a
+
t
b
)
0
≤
t
≤
1
2
A
function
is
strictly
concave,
if
and
only
if
for
all
a
and
b
in
its
domain
and
any
0
<
t
<
1
we
have
(1
−
t
)
f
(
a
)
+
tf
(
b
)
<
f
((1
−
t
)
a
+
t
b
)
0
<
t
<
1
3
A
function
is
strictly
convex,
if
and
only
if
for
all
a
and
b
in
its
domain
and
any
0
<
t
<
1
we
have
(1
−
t
)
f
(
a
)
+
tf
(
b
)
>
f
((1
−
t
)
a
+
t
b
)
0
<
t
<
1
1.2.4
V
erifying
Concavity
with
an
Inequalit
y
V
erify
that
f
(
x
)
=
5
−
x
2
is
concave
using
the
inequalit
y
condition.
Solution
Let
a
and
b
b
e
any
t
w
o
real
numb
ers.
W
e
need
to
show
that
for
all
t
in
[0
,
1]
:
(1
−
t
)
f
(
a
)
+
tf
(
b
)
≤
f
((1
−
t
)
a
+
tb
)
T
o
prove
this
inequality
,
w
e
first
need
the
AM-GM
(arithmetic
mean-geometric
mean)
inequalit
y
from
algeb
ra.
Here
it
is.
(
a
−
b
)
2
≥
0
a
2
−
2
ab
+
b
2
≥
0
a
2
+
b
2
≥
2
ab
W
e
also
note
that
fo
r
t
in
[0
,
1]
,
t
(1
−
t
)
≥
0
.
We
now
prove
the
required
inequality
,
starting
with
the
right
side.
56
f
((1
−
t
)
a
+
tb
)
=
5
−
((1
−
t
)
a
+
tb
)
2
definition
of
f
=
5
−
(1
−
t
)
2
a
2
−
2
t
(1
−
t
)
ab
−
t
2
b
2
distribute
≥
5
−
(1
−
t
)
2
a
2
−
t
(1
−
t
)(
a
2
+
b
2
)
−
t
2
b
2
AM-GM
=
5
−
(1
−
t
)
2
a
2
−
t
(1
−
t
)
a
2
−
t
(1
−
t
)
b
2
−
t
2
b
2
distribute
=
5
−
(1
−
t
)(
t
+
1
−
t
)
a
2
−
t
(1
−
t
+
t
)
b
2
facto
r
=
5
−
(1
−
t
)
a
2
−
tb
2
simplify
=
5(1
−
t
)
−
(1
−
t
)
a
2
+
5
t
−
tb
2
b
reak
up
5
=
(1
−
t
)(5
−
a
2
)
+
t
(5
−
b
2
)
facto
r
=
(1
−
t
)
f
(
a
)
+
tf
(
b
)
definition
of
f
The
inequality
condition
is
difficult
to
verify
,
even
fo
r
this
simple
function.
It
is
much
wo
rse
for
a
function
involving
a
squa
re
ro
ot
or
an
exp
onential.
On
the
other
hand.
The
inequality
condition
can
b
e
convenient
fo
r
abstract
results.
Theo
rem
1.20
If
f
(
x
)
and
g
(
x
)
are
b
oth
concave,
then
so
is
their
sum:
h
(
x
)
=
f
(
x
)
+
g
(
x
)
Pro
of
Let
a
and
b
b
e
in
the
domain
of
h
.
Since
the
domain
of
h
is
the
intersections
of
the
domains
of
f
and
g
,
h
has
a
convex
domain
(Theo
rem
1.15).
We
know
from
the
concavit
y
of
f
and
g
that
for
all
t
in
[0
,
1]
:
(1
−
t
)
f
(
a
)
+
tf
(
b
)
≤
f
((1
−
t
)
a
+
t
b
)
(1
−
t
)
g
(
a
)
+
tg
(
b
)
≤
g
((1
−
t
)
a
+
t
b
)
W
e
now
verify
that
the
inequalit
y
holds
for
h
:
(1
−
t
)
h
(
a
)
+
th
(
b
)
=
(1
−
t
)
f
(
a
)
+
(1
−
t
)
g
(
a
)
+
tf
(
b
)
+
tg
(
b
)
≤
f
((1
−
t
)
a
+
t
b
)
+
g
((1
−
t
)
a
+
t
b
)
=
h
((1
−
t
)
a
+
t
b
)
W
e
conclude
that
h
is
concave.
57
1.2.4
V
erifying
Concavity
with
an
Inequalit
y
Main
Idea
The
inequality
condition
is
sometimes
useful
at
a
theo
retical
level,
but
is
painful
to
check
fo
r
any
but
the
simplest
sp
ecific
functions.
We
will
want
a
condition
that
is
easier
to
check.
1.2.5
Concavit
y
and
T
angent
Lines
W
e
sought
out
the
idea
of
concavity
b
ecause
it
applies
to
functions
of
many
variables.
Some
results
ab
out
concave
functions
are
related
to
their
derivatives.
We
will
develop
these
to
ols
in
the
setting
that
is
most
comfo
rtable
and
intuitive:
one-va
riable
functions.
Once
we
have
presented
the
theory
of
optimization
fo
r
multivariable
functions,
w
e
will
revisit
and
generalize
these
results.
Derivatives
tell
us
ab
out
tangent
lines,
and
tangent
lines
have
an
interesting
relationship
to
convex
sets.
If
y
ou
dra
w
a
nonconvex
set,
you
will
be
able
to
find
a
tangent
line
that
crosses
through
the
set.
A
convex
set,
on
the
other
hand,
lies
entirely
on
one
side
of
any
of
its
tangent
lines.
W
e
will
not
try
to
prove
this
fo
r
general
sets,
but
we
will
p
rove
it
for
the
strictly
convex
region
below
the
graph
of
a
strictly
concave
function.
Lemma
1.21
If
f
(
x
)
is
a
strictly
concave
function
that
is
differentiable
at
a
,
then
the
rest
of
the
graph
y
=
f
(
x
)
lies
b
elo
w
its
tangent
line
at
(
a,
f
(
a
))
.
y
=
f
(
x
)
x
y
Figure
1.19:
The
graph
of
a
strictly
concave
function
and
some
of
its
tangent
lines
This
lemma
describ
es
the
relationship
b
et
w
een
y
=
f
(
x
)
and
a
single
tangent
line.
It
can
apply
to
a
function
that
is
not
differentiable
at
other
p
oints
in
its
domain.
If
the
function
is
differentiable
on
its
entire
domain,
w
e
can
p
ro
duce
a
necessary
and
sufficient
condition
on
the
tangent
lines.
58
Theo
rem
1.22
Let
f
(
x
)
b
e
a
differentiable
function
on
a
convex
domain.
f
(
x
)
is
strictly
concave,
if
and
only
if
the
graph
y
=
f
(
x
)
lies
b
elow
all
of
its
tangent
lines
(except
at
the
p
oint
on
tangency).
The
p
ro
ofs
of
these
results
are
somewhat
technical.
We
will
p
rovide
them
after
we
have
discussed
applications.
1.2.6
Concavit
y
Conditions
for
a
Maximizer
Our
main
use
of
concavity
(for
now)
is
to
produce
sufficient
conditions
fo
r
maximizers.
The
following
co
rollary
follo
ws
from
Lemma
1.21.
Co
rollary
1.23
If
f
(
x
)
is
a
strictly
concave
function
and
f
′
(
x
∗
)
=
0
,
then
x
∗
is
the
unique
global
maximizer
of
f
.
Pro
of
The
assumption
that
f
′
(
x
∗
)
=
0
tells
us
that
f
is
differentiable
at
x
∗
,
and
the
tangent
line
at
x
∗
is
ho
rizontal.
By
Lemma
1.21,
the
rest
of
the
graph
lies
b
elow
this
line.
We
conclude
that
x
∗
is
the
unique
global
maximizer.
y
=
f
(
x
)
(
x
∗
,
f
(
x
∗
))
x
y
Figure
1.20:
The
graph
of
y
=
f
(
x
)
,
which
lies
b
elow
the
tangent
line
at
x
∗
.
Lik
e
in
the
global
second-o
rder
condition,
we
can
also
a
rgue
that
x
∗
is
the
only
critical
p
oint.
If
there
w
ere
another,
then
it
would
also
b
e
the
unique
global
maximizer,
which
is
nonsense.
59
1.2.7
A
Second
Derivative
T
est
fo
r
Concavit
y
W
e
w
ould
still
like
to
have
a
b
etter
w
ay
to
determine
when
a
function
is
concave.
In
calculus,
w
e
lea
rn
that
the
sign
of
the
second
derivative
determines
the
concavit
y
of
a
function.
We
are
now
in
a
p
osition
to
fo
rmally
argue
this
result.
Theo
rem
1.24
If
f
is
a
function
on
a
convex
domain
D
⊆
R
and
f
′′
(
x
)
<
0
for
all
x
in
D
,
then
f
(
x
)
is
strictly
concave.
This
means
that
a
negative
second
derivative
is
a
sufficient
condition
fo
r
strict
concavit
y
.
Is
it
necessa
ry?
No,
consider
the
following.
Example
Consider
f
(
x
)
=
−
x
4
.
This
function
is
strictly
concave,
but
f
′′
(0)
=
0
,
so
the
second
derivative
is
not
alw
ays
negative.
Putting
this
with
Co
rolla
ry
1.23
gives
a
string
of
conditions.
Each
is
sufficient
and
neither
is
necessary
,
so
the
implications
can
only
b
e
follo
w
ed
from
left
to
right.
f
′′
(
x
)
<
0
fo
r
all
x
f
strictly
concave
Any
x
∗
with
f
′
(
x
∗
)
=
0
is
a
maximizer
W
e
could
claim
that
Co
rollary
1.23
and
Theo
rem
1.24
have
combined
to
give
an
alternate
a
rgument
fo
r
the
global
second-o
rder
condition.
How
ever,
proving
Theo
rem
1.24
without
the
global
second-order
condition
would
require
us
to
redo
much
of
the
w
ork
that
w
ent
into
that
theorem.
Instead
we
will
use
the
global
second-o
rder
condition
to
keep
our
proof
of
Theorem
1.24
brief.
Pro
of
Our
goal
is
to
apply
Theo
rem
1.22.
Let
a
be
a
point
in
the
domain
of
f
.
Since
f
′′
(
x
)
exists,
f
′
(
a
)
exists.
Let
y
=
ℓ
(
x
)
be
the
tangent
line
to
y
=
f
(
x
)
at
a
.
Let
g
(
x
)
=
f
(
x
)
−
ℓ
(
x
)
.
We
kno
w
the
follo
wing
ab
out
g
:
g
′′
(
x
)
=
f
′′
(
x
)
−
ℓ
′′
(
x
)
<
0
for
all
x
,
since
ℓ
′′
(
x
)
=
0
.
g
′
(
a
)
=
f
′
(
a
)
−
ℓ
′
(
a
)
=
0
,
since
the
derivative
is
the
slop
e
of
the
tangent
line.
g
(
a
)
=
0
,
since
ℓ
(
a
)
=
f
(
a
)
.
g
and
a
satisfy
the
conditions
of
the
global
second-o
rder
condition.
Thus
a
is
the
unique
maximizer
of
g
,
meaning
g
(
x
)
<
g
(
a
)
=
0
for
all
x
=
a
.
Thus
y
=
f
(
x
)
lies
b
elo
w
y
=
ℓ
(
x
)
.
This
reasoning
holds
for
the
tangent
line
at
any
value
a
.
Theorem
1.22
tells
us
that
f
is
strictly
concave.
60
y
=
f
(
x
)
y
=
g
(
x
)
(
a,
f
(
a
))
x
y
Figure
1.21:
The
graph
of
f
(
x
)
,
the
tangent
line
at
a
,
and
their
difference
1.2.8
V
erifying
Concavity
with
the
Second
Derivative
V
erify
that
f
(
x
)
=
5
−
e
x
is
strictly
concave.
Solution
f
′′
(
x
)
=
−
e
x
<
0
for
all
x
.
Therefore
by
Theorem
1.24,
f
(
x
)
is
strictly
concave.
Main
Idea
The
second
derivative
is
an
easier
test
of
strict
concavity
than
the
inequality
we
used
earlier,
but
it
is
only
a
sufficient
condition.
It
do
es
not
detect
every
strictly
concave
function.
1.2.9
Proving
the
Relationship
Bet
ween
Concavit
y
and
T
angent
Lines
W
e
will
now
present
the
fo
rmal
reasoning
of
Lemma
1.21
and
Theorem
1.22.
We
prove
Lemma
1.21
b
y
examining
the
difference
betw
een
f
(
x
)
and
its
secants.
61
1.2.9
Proving
the
Relationship
Betw
een
Concavity
and
T
angent
Lines
Pro
of
Pick
any
b
=
a
on
the
graph
y
=
f
(
x
)
.
The
tangent
line
to
y
=
f
(
x
)
at
(
a,
f
(
a
))
has
equation
y
=
f
(
a
)
+
f
′
(
a
)(
x
−
a
)
.
We
will
first
sho
w
that
(
b,
f
(
b
))
do
es
not
lie
ab
ove
the
tangent
line
b
y
sho
wing
that
f
(
b
)
≤
f
(
a
)
+
f
′
(
a
)(
b
−
a
)
.
Consider
the
case
that
b
>
a
.
Denote
the
secant
from
(
a,
f
(
a
))
to
(
b,
f
(
b
))
b
y
the
equation
y
=
s
(
x
)
and
let
g
(
x
)
=
s
(
x
)
−
f
(
x
)
.
The
region
below
y
=
f
(
x
)
is
convex,
so
y
=
s
(
x
)
is
b
elow
y
=
f
(
x
)
fo
r
any
x
b
etw
een
a
and
b
.
Their
difference,
g
(
x
)
,
is
less
than
or
equal
to
0
on
this
interval.
By
the
contrap
ositive
of
Lemma
1.2
w
e
kno
w
g
′
(
a
)
cannot
b
e
greater
than
0
.
s
′
(
a
)
is
the
slop
e
of
the
secant:
f
(
b
)
−
f
(
a
)
b
−
a
.
We
substitute
this
into
g
′
(
a
)
≤
0
and
solve.
g
′
(
a
)
≤
0
s
′
(
a
)
−
f
′
(
a
)
≤
0
(
sum
rule
)
f
(
b
)
−
f
(
a
)
b
−
a
−
f
′
(
a
)
≤
0
(
substitute
)
f
(
b
)
−
f
(
a
)
−
f
′
(
a
)(
b
−
a
)
≤
0
(
b
−
a
>
0)
f
(
b
)
≤
f
(
a
)
+
f
′
(
a
)(
b
−
a
)
W
e
conclude
that
(
b,
f
(
b
))
do
es
not
lie
ab
ove
the
tangent
line.
No
w
we
must
also
show
it
do
es
not
lie
on
the
tangent
line.
This
is
easiest
with
a
contradiction
a
rgument.
Supp
ose
some
(
b,
f
(
b
))
lies
on
the
tangent
line
at
a
.
Then
consider
any
c
b
etw
een
a
and
b
.
(
c,
f
(
c
))
also
do
es
not
lie
above
tangent
line
by
the
argument
we
gave
for
b
.
How
ever,
the
secant
from
(
a,
f
(
a
))
to
(
b,
f
(
b
))
is
part
of
the
tangent
line,
and
(
c,
f
(
c
))
lies
ab
ove
this
b
y
the
definition
of
strict
concavit
y
.
This
is
a
contradiction.
Thus
there
is
no
(
b,
f
(
b
))
that
lies
on
the
tangent
line.
The
case
where
b
<
a
can
b
e
p
roved
with
a
simila
r
argument.
We
conclude
that
(
b,
f
(
b
))
lies
b
elow
the
tangent
line
to
y
=
f
(
x
)
at
(
a,
f
(
a
))
.
y
=
f
(
x
)
y
=
s
(
x
)
(
a,
f
(
a
))
(
b,
f
(
b
))
x
y
Figure
1.22:
A
tangent
line
and
a
secant
to
a
concave
function
Theo
rem
1.22
is
an
“if
and
only
if
”
statement,
so
it
requires
tw
o
arguments
to
p
rove.
We
must
sho
w
that
if
f
is
strictly
concave,
then
its
graph
lies
b
elow
its
tangent
lines.
We
must
also
show
that
if
62
its
graph
lies
b
elow
its
t
angent
lines,
then
it
is
strictly
concave.
F
ortunately
,
Lemma
1.21
has
already
sho
wn
that
if
f
(
x
)
is
strictly
concave,
then
y
=
f
(
x
)
lies
b
elow
the
tangent
line
at
a
.
Pro
of
Supp
ose
we
know
f
is
concave.
F
or
each
a
in
the
domain
of
f
,
Lemma
1.21
states
that
y
=
f
(
x
)
lies
b
elo
w
its
tangent
line
at
a
.
Since
this
holds
for
all
a
,
y
=
f
(
x
)
lies
b
elow
all
of
its
tangent
lines.
No
w
supp
ose
that
w
e
kno
w
y
=
f
(
x
)
lies
b
elow
all
its
tangent
lines.
We
will
verify
that
the
region
under
y
=
f
(
x
)
is
strictly
convex.
Let
a
and
b
be
any
p
oints
in
the
domain
of
f
.
Let
c
b
e
any
p
oint
b
et
ween
them.
We
kno
w
that
(
a,
f
(
a
))
and
(
b,
f
(
b
))
both
lie
b
elow
the
tangent
line
to
y
=
f
(
x
)
at
(
c,
f
(
c
))
.
Thus
the
tangent
line
lies
above
the
entire
secant
b
etw
een
(
a,
f
(
a
))
and
(
b,
f
(
b
))
.
(
c,
f
(
c
))
is
on
this
tangent
line,
so
(
c,
f
(
c
))
is
ab
ove
the
secant
as
well.
Since
this
a
rgument
holds
for
any
such
c
,
we
conclude
that
the
entire
graph
b
etw
een
a
and
b
lies
ab
ove
the
secant.
Since
this
is
true
for
any
choice
of
a
and
b
,
w
e
conclude
that
the
region
below
y
=
f
(
x
)
is
strictly
convex
and
f
(
x
)
is
a
strictly
concave
function.
y
=
f
(
x
)
(
a,
f
(
a
))
(
c,
f
(
c
))
(
b,
f
(
b
))
x
y
Figure
1.23:
A
secant
with
b
oth
endp
oints
b
elow
the
tangent
line
at
(
c,
f
(
c
))
1.2.10
Non-Strict
Concavit
y
and
Convexity
W
e
have
prima
rily
wo
rk
ed
with
strictly
concave
functions,
because
they
give
the
strongest
results
ab
out
maximizers.
The
properties
of
non-strict
concavity
and
convexity
give
simila
r
results.
The
results
fo
r
convexit
y
a
re
obtained
b
y
flipping
the
directions
of
inequalities.
They
a
re
p
robably
not
w
orth
mem-
o
rizing.
On
the
other
hand,
economists
frequently
need
to
wo
rk
with
non-strictly
concave
functions.
It
is
w
orth
understanding
ho
w
utility
maximization
wo
rks
for
these.
Graphs
of
non-strictly
concave
functions
can
intersect
their
o
wn
tangent
lines.
Fo
r
instance,
a
linear
function
is
concave,
and
its
graph
is
identical
to
its
tangent
line.
While
the
graph
of
a
concave
function
can
intersect
the
tangent
lines,
it
cannot
go
ab
ove
them.
63
1.2.10
Non-Strict
Concavity
and
Convexity
V
ariants
of
Lemma
1.21
Let
f
(
x
)
b
e
a
function
on
a
convex
domain.
1
If
f
(
x
)
is
a
concave
function
that
is
differentiable
at
a
,
then
the
graph
y
=
f
(
x
)
has
no
p
oints
ab
ove
its
tangent
line
at
(
a,
f
(
a
))
.
2
If
f
(
x
)
is
a
strictly
convex
function
that
is
differentiable
at
a
,
then
the
rest
of
the
graph
y
=
f
(
x
)
lies
ab
ove
its
tangent
line
at
(
a,
f
(
a
))
.
3
If
f
(
x
)
is
a
convex
function
that
is
differentiable
at
a
,
then
the
graph
y
=
f
(
x
)
has
no
p
oints
b
elo
w
its
tangent
line
at
(
a,
f
(
a
))
.
The
proof
fo
r
non-strict
concavity
is
identical
to
the
p
ro
of
fo
r
strict
concavity
,
with
the
contradiction
a
rgument
omitted.
The
necessary
and
sufficient
conditions
are
unsurprising.
V
ariants
of
Theo
rem
1.22
Let
f
(
x
)
b
e
a
differentiable
function
on
a
convex
domain.
1
f
(
x
)
is
concave
if
and
only
if
the
graph
y
=
f
(
x
)
has
no
p
oints
ab
ove
any
of
its
tangent
lines.
2
f
(
x
)
is
strictly
convex
if
and
only
if
the
graph
y
=
f
(
x
)
lies
ab
ove
each
of
its
tangent
lines
(except
at
the
p
oint
of
tangency).
3
f
(
x
)
is
convex
if
and
only
if
the
graph
y
=
f
(
x
)
has
no
p
oints
b
elow
any
of
its
tangent
lines.
These
results
can
also
b
e
extended
to
well-behaved
non-differentiable
functions.
F
o
r
instance,
what-
ever
“tangent
lines”
you
might
reasonably
add
to
the
function
in
Figure
1.16
will
not
stray
b
elo
w
the
graph.
Rigorously
defining
tangent
lines
in
these
situations
would
require
a
detailed
argument
involving
limits,
so
w
e
will
not
pursue
it
here.
The
va
riants
of
Lemma
1.21
give
rise
to
their
own
corolla
ries
ab
out
maximizers
and
minimizers.
64
V
ariants
of
Co
rollary
1.23
1
If
f
(
x
)
is
a
concave
function
and
f
′
(
x
∗
)
=
0
,
then
x
∗
is
a
global
maximizer
(but
other
p
oints
may
b
e
as
w
ell).
2
If
f
(
x
)
is
a
strictly
convex
function
and
f
′
(
x
∗
)
=
0
,
then
x
∗
is
the
unique
global
minimizer
of
f
.
3
If
f
(
x
)
is
a
convex
function
and
f
′
(
x
∗
)
=
0
,
then
x
∗
is
a
global
minimizer
(but
other
p
oints
may
b
e
as
w
ell).
(
x
∗
,
f
(
x
∗
))
x
1
x
2
Figure
1.24:
A
global
but
non-unique
maximizer
of
a
not
strictly
concave
function
Finally
,
w
e
can
use
the
Va
riants
of
Theorem
1.22
to
produce
second-derivative
tests
fo
r
the
other
fo
rms
of
concavity
and
convexit
y
.
V
ariants
of
Theo
rem
1.24
1
If
f
is
a
function
on
a
convex
domain
D
⊂
R
and
f
′′
(
x
)
≤
0
for
all
x
in
D
,
then
f
(
x
)
is
concave.
2
If
f
is
a
function
on
a
convex
domain
D
⊂
R
and
f
′′
(
x
)
>
0
for
all
x
in
D
,
then
f
(
x
)
is
strictly
convex.
3
If
f
is
a
function
on
a
convex
domain
D
⊂
R
and
f
′′
(
x
)
≥
0
for
all
x
in
D
,
then
f
(
x
)
is
convex.
In
the
non-strict
case,
the
condition
is
necessary
as
well
as
sufficient.
65
1.2.10
Non-Strict
Concavity
and
Convexity
Co
rollary
1.25
A
t
wice
differentiable
function
f
(
x
)
on
a
convex
domain
D
is
concave,
if
and
only
if
f
′′
(
x
)
≤
0
for
all
x
in
D
.
The
sufficiency
of
f
′′
(
x
)
≤
0
is
found
in
the
preceding
variant.
We
establish
necessit
y
with
a
contrap
ositive
a
rgument.
It
should
lo
ok
familiar
to
anyone
who
read
our
proof
of
Theorem
1.24.
Notice
that
it
is
making
a
lo
cal
rather
than
a
global
argument.
Pro
of
Supp
ose
f
′′
(
x
)
≤
0
for
all
x
.
A
variant
of
Theorem
1.24
tells
us
that
f
is
concave.
On
the
other
hand,
supp
ose
f
′′
(
a
)
>
0
for
some
a
.
First
note
than
since
f
′′
(
x
)
exists,
f
is
a
differentiable
function.
Let
y
=
ℓ
(
x
)
b
e
the
tangent
line
to
y
=
f
(
x
)
at
a
.
Consider
g
(
x
)
=
f
(
x
)
−
ℓ
(
x
)
.
W
e
know
the
follo
wing
ab
out
g
:
g
′′
(
a
)
=
f
′′
(
a
)
−
ℓ
′′
(
a
)
>
0
,
since
ℓ
′′
(
x
)
=
0
.
g
′
(
a
)
=
f
′
(
a
)
−
ℓ
′
(
a
)
=
0
,
since
the
derivative
is
the
slop
e
of
the
tangent
line.
g
(
a
)
=
0
,
since
f
(
a
)
=
ℓ
(
a
)
.
Thus
by
a
va
riant
of
the
(lo
cal)
second-order
condition,
Theorem
1.5,
a
is
a
strict
lo
cal
minimizer
of
g
,
meaning
g
(
x
)
>
g
(
a
)
=
0
for
all
x
in
a
neighb
orhoo
d
of
a
.
Thus
y
=
f
(
x
)
lies
ab
ove
y
=
ℓ
(
x
)
in
this
neighb
o
rho
o
d.
By
a
variant
of
Theorem
1.22,
f
is
not
concave.
The
contrap
ositive
of
this
argument
is
that
if
f
is
concave,
then
there
is
no
a
such
that
f
′′
(
a
)
>
0
.
Equivalently
,
f
′′
(
x
)
≤
0
for
all
x
.
This
result
means
that
we
not
only
have
a
wa
y
to
show
that
f
(
x
)
is
concave,
but
also
that
it
is
not
concave.
If
f
′′
(
a
)
>
0
for
any
a
,
then
f
is
not
a
concave
function.
1.2.11
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
definition
of
a
(strictly)
convex
region
(Definitions
1.12
and
1.14)
The
definition
of
a
(strictly)
concave
function
(Definitions
1.16
and
1.17)
The
inequalit
y
condition
for
concave
functions
(Theo
rem
1.19)
The
tangent
line
condition
fo
r
strictly
concave
functions
(Theorem
1.22)
66
The
sufficient
condition
fo
r
a
maximizer
using
strict
concavity
(Corolla
ry
1.23)
The
second
derivative
tests
for
strictly
concave
and
concave
functions
(Theorem
1.24
and
Corolla
ry
1.25)
Here
is
a
summary
of
which
conditions
and
statements
from
this
section
imply
which
others.
Some
of
these
conditions
only
apply
to
differentiable,
o
r
even
t
wice
differentiable
functions.
If
f
differentiable
(t
wice
if
relevant)
(1
−
t
)
f
(
a
)
+
tf
(
b
)
<
f
((1
−
t
)
a
+
tb
)
(1
−
t
)
f
(
a
)
+
tf
(
b
)
≤
f
((1
−
t
)
a
+
tb
)
f
strictly
concave
f
concave
y
=
f
(
x
)
b
elow
tangent
lines
y
=
f
(
x
)
not
ab
ove
tangent
lines
f
′′
(
x
)
<
0
f
′′
(
x
)
≤
0
critical
p
oints
a
re
unique
maximizers
critical
p
oints
are
maximizers
Figure
1.25:
Conditions
and
applications
of
concave
functions
67
1.3
Multiva
riable
Optimization
Goals:
1
Apply
the
first-
and
second-order
conditions
to
multivariable
functions
2
Calculate
the
Hessian
of
a
function
3
Understand
the
definition
and
significance
of
p
ositive
or
negative
definite
matrices
4
T
est
whether
a
matrix
is
p
ositive
or
negative
definite
5
Show
that
a
multivariable
function
is
strictly
concave
1.3.1
P
arametrizations
and
Compositions
In
this
section
we
develop
conditions
to
find
maximizers
of
multiva
riable
functions.
Our
optimization
metho
ds
so
far
rely
on
using
rates
of
change
to
compa
re
a
p
otential
maximizer
to
the
p
oints
around
it.
This
is
a
mo
re
daunting
task
in
the
multivariable
situation,
because
the
compa
rison
p
oints
can
lie
in
any
direction.
The
function
can
b
e
changing
differently
dep
ending
on
which
direction
w
e
travel.
W
e
avoid
the
need
to
develop
no
w
to
ols
from
scratch
b
y
considering
paths
through
a
p
otential
maximizer
a
.
A
path
through
a
is
a
function
x
(
t
)
such
that
for
some
t
0
,
x
(
t
0
)
=
a
.
W
e
can
think
of
the
variable
t
as
rep
re
senting
time.
Then
x
(
t
)
represents
the
p
osition
on
our
path
at
time
t
.
A
path
b
reaks
down
into
co
o
rdinate
functions
x
(
t
)
=
(
x
1
(
t
)
,
x
2
(
t
)
,
.
.
.
,
x
n
(
t
))
If
these
a
re
differentiable,
the
derivative
x
′
(
t
)
=
(
x
′
1
(
t
)
,
x
′
2
(
t
)
,
.
.
.
,
x
′
n
(
t
))
rep
resents
the
instantaneous
rate
of
change
in
p
osition
of
x
with
resp
ect
to
t
.
It
is
tangent
to
the
path.
In
physics
it
is
called
the
velo
city
vector.
68
Click to Load Applet
Figure
1.26:
A
path
and
its
tangent
vector
W
e
will
study
a
function
f
(
x
)
by
studying
comp
ositions
of
the
form
f
(
x
(
t
))
.
At
each
value
of
t
,
we
compute
the
corresponding
position
on
the
path
and
then
evaluate
f
at
that
position.
Geometrically
,
this
is
the
height
of
the
graph
y
=
f
(
x
)
ab
ove
the
curve
x
(
t
)
.
Click to Load Applet
Figure
1.27:
The
graph
of
the
comp
osition
f
(
x
(
t
))
and
the
corresponding
p
oints
in
the
original
graph
y
=
f
(
x
)
The
derivative
of
f
(
x
(
t
))
with
resp
ect
to
t
is
computed
using
the
multivariable
chain
rule.
69
1.3.1
P
arametrizations
and
Compositions
Theo
rem
0.18
Supp
ose
f
(
x
)
is
an
continuously
differentiable
n
-variable
function.
If
x
(
t
)
is
a
differentiable
path,
then
the
derivative
of
the
comp
osition
f
(
x
(
t
))
with
resp
ect
to
t
is
d
f
dt
(
t
)
=
∇
f
(
x
(
t
))
·
x
′
(
t
)
o
r
d
f
dt
(
t
)
=
n
X
i
=1
f
i
(
x
(
t
))
x
′
i
(
t
)
As
with
single-variable
optimization,
the
sign
of
a
derivative
is
imp
ortant
to
our
a
rguments.
W
e
can
obtain
the
sign
of
d
f
dt
visually
,
using
the
follo
wing
fact
ab
out
dot
p
ro
ducts.
The
Sign
of
the
Dot
Pro
duct
Given
nonzero
vecto
rs
u
and
v
,
the
sign
of
u
·
v
is
p
ositive
if
u
and
v
mak
e
an
acute
angle
zero
if
u
and
v
a
re
o
rthogonal
negative
if
u
and
v
mak
e
an
obtuse
angle
The
angle
b
etw
een
∇
f
and
x
′
(
t
)
determines
the
sign
of
d
f
dt
.
This
indicates
whether
f
(
x
)
increases
o
r
decreases
as
x
travels
along
the
path
x
(
t
)
.
Click to Load Applet
Figure
1.28:
The
tangent
line
to
the
graph
y
=
f
(
x
(
t
))
and
the
angle
b
etw
een
∇
f
(
x
(
t
))
and
x
′
(
t
)
70
The
value
of
d
f
dt
is
realized
geometrically
as
the
slop
e
of
the
tangent
line
to
the
graph
of
the
comp
osition
function:
y
=
f
(
x
(
t
))
.
Using
comp
ositions
to
find
maximizers
has
tw
o
main
consequences.
The
first
is
helpful.
The
second
creates
difficulties.
1
f
(
x
(
t
))
is
a
one-va
riable
function.
No
matter
the
dimension
of
x
,
w
e
can
graph
y
=
f
(
x
(
t
))
in
the
ty
-plane
and
lo
ok
for
maximizers
there.
2
One
path
will
not
cover
the
whole
domain
of
f
(
x
)
.
We
have
to
study
many
different
paths
x
(
t
)
,
if
w
e
want
to
compa
re
a
p
otential
maximizer
to
every
other
p
oint
in
the
domain.
W
e
could
study
all
paths
in
the
domain
of
f
,
but
that
w
ould
b
e
needlessly
difficult.
Instead
we
restrict
ourselves
to
paths
that
a
re
lines.
We
will
express
these
lines
using
the
follo
wing
notation.
Notation
Every
line
has
a
direction
vecto
r
v
=
(
v
1
,
v
2
,
.
.
.
,
v
n
)
.
A
line
through
a
in
the
direction
of
v
has
the
equation
x
(
t
)
=
a
+
t
v
In
a
convex
domain,
any
tw
o
values
f
(
a
)
and
f
(
b
)
can
b
e
attained
as
values
of
the
comp
osition
f
(
a
+
t
v
)
,
where
v
=
b
−
a
.
If
w
e
want
to
compare
these
values
(for
instance
to
argue
that
a
is
o
r
is
not
a
maximizer),
w
e
compa
re
f
(
a
+
0
v
)
and
f
(
a
+
1
v
)
.
Lemma
1.26
Supp
ose
f
(
x
)
has
a
convex
domain.
A
p
oint
a
is
a
(lo
cal
or
global)
maximizer,
if
and
only
if
0
is
a
(lo
cal
o
r
global)
maximizer
of
f
(
a
+
t
v
)
for
all
direction
vectors
v
.
This
lemma
tells
us
that
a
necessary
condition
for
0
to
b
e
a
maximizer
of
f
(
a
+
t
v
)
for
all
v
is
a
necessa
ry
condition
fo
r
a
to
b
e
a
maximizer
of
f
(
x
)
.
Similarly
,
a
sufficient
condition
for
0
to
be
a
maximizer
of
f
(
a
+
t
v
)
for
all
v
is
a
sufficient
condition
for
a
to
b
e
a
maximizer
of
f
(
x
)
.
The
comp
ositions
f
(
a
+
t
v
)
are
realized
as
cross
sections
of
the
graph
y
=
f
(
x
)
ab
ove
the
line
x
(
t
)
=
a
+
t
v
.
71
1.3.1
P
arametrizations
and
Compositions
Click to Load Applet
Figure
1.29:
The
graph
of
a
tw
o-variable
function
and
its
cross
section
over
a
line
1.3.2
The
Multiva
riable
First-Order
Condition
Our
sta
rting
p
oint
fo
r
multivariable
optimization
is
Lemma
1.26.
a
is
a
lo
cal
maximizer
or
minimizer
of
f
(
x
)
,
if
and
only
if
0
is
a
lo
cal
maximizer
o
r
minimizer
of
f
(
a
+
t
v
)
fo
r
all
vectors
v
.
A
condition
on
the
comp
ositions
f
(
a
+
t
v
)
at
t
=
0
b
ecomes
a
condition
on
a
.
The
first-o
rder
condition
on
f
(
a
+
t
v
)
at
t
=
0
is
d
f
dt
(0)
=
0
∇
f
(
a
+
0
v
)
·
v
=
0
∇
f
(
a
)
·
v
=
0
If
a
is
a
lo
cal
maximizer,
this
must
hold
for
all
directions
v
.
There
is
only
one
value
of
∇
f
(
a
)
that
satisfies
this
requirement.
Theo
rem
1.27
[The
Multivariable
First-Order
Condition]
Supp
ose
a
lies
in
the
domain
of
f
(
x
)
.
If
a
is
a
lo
cal
maximizer
or
minimizer
of
f
,
then
either
∇
f
(
a
)
=
0
o
r
∇
f
(
a
)
do
es
not
exist.
Lik
e
in
the
single
va
riable
case,
w
e
call
p
oints
where
∇
f
(
x
)
is
0
o
r
undefined
critical
p
oints
.
Finding
critical
p
oints
is
often
the
first
step
in
a
multivariable
optimization.
72
Rema
rk
If
a
lies
at
the
boundary
of
the
domain,
then
it
ma
y
not
b
e
p
ossible
to
travel
in
all
directions
v
from
a
and
still
evaluate
f
(
a
+
t
v
)
.
F
ortunately
,
the
first-o
rder
condition
still
holds.
If
∇
f
(
a
)
exists,
then
the
pa
rtial
derivatives
f
i
(
a
)
exist.
Thus
f
(
a
+
h
e
i
)
exists
for
h
near
0
.
That
means
w
e
can
at
least
travel
in
the
standard
basis
directions
from
a
without
immediately
leaving
the
domain
of
f
(
x
)
.
At
a
lo
cal
maximizer,
the
derivatives
in
these
directions
must
b
e
0
.
1.3.3
Computing
Critical
P
oints
What
do
es
the
multiva
riable
first-order
condition
sa
y
ab
out
f
(
x
1
,
x
2
)
=
x
4
1
−
4
x
1
x
2
+
x
4
2
?
Solution
W
e
compute
the
gradient
vecto
r
by
taking
the
partial
derivatvies
with
resp
ect
to
x
1
and
x
2
.
∇
f
(
x
)
=
(
f
1
(
x
)
,
f
2
(
x
))
=
(4
x
3
1
−
4
x
2
,
4
x
1
+
4
x
3
2
)
This
is
never
undefined,
so
w
e
solve
fo
r
where
it
is
(0
,
0)
.
4
x
3
1
−
4
x
2
=
0
−
4
x
1
+
4
x
3
2
=
0
x
3
1
=
x
2
(
solve
fo
r
x
2
)
−
4
x
1
+
4
x
9
1
=
0
(
substitute
)
4
x
1
(
x
8
−
1)
=
0
(
facto
r
)
x
1
=
0
o
r
±
1
Subsituting
back
into
x
3
1
=
x
2
gives
lets
us
solve
fo
r
the
co
rresp
onding
values
of
x
2
.
We
obtain
the
critical
p
oints
(0
,
0)
,
(1
,
1)
(
−
1
,
−
1)
.
We
conclude
that
no
p
oint
other
than
(0
,
0)
,
(1
,
1)
(
−
1
,
−
1)
can
b
e
a
lo
cal
maximizer
or
a
lo
cal
minimizer.
W
e
cannot
conclude
whether
each
of
these
p
oints
is
a
lo
cal
maximizer,
a
lo
cal
minimzer
o
r
neither.
73
1.3.4
The
Second
Derivative
of
a
Comp
osition
W
e
would
also
lik
e
to
apply
the
lo
cal
or
global
second-order
condition
along
each
line.
In
order
to
do
this,
we
need
a
wa
y
to
compute
d
2
dt
2
f
(
a
+
t
v
)
.
The
chain
rules
allows
us
to
write
the
first
derivative
as
a
dot
p
ro
duct,
or
a
sum.
We
will
write
it
as
a
sum.
d
dt
f
(
a
+
t
v
)
=
n
X
i
=1
f
i
(
a
+
t
v
)
v
i
T
o
obtain
the
second
derivative
w
e
differentiate
b
oth
sides.
We
differentiate
each
term
of
the
sum.
We
can
treat
each
v
i
as
a
constant
multiple,
but
f
i
(
a
+
t
v
)
is
a
comp
osition
of
functions.
The
derivative
of
each
term
will
need
the
chain
rule.
We
again
write
this
as
a
sum,
using
a
different
index
va
riable
to
avoid
ambiguit
y
.
The
partial
derivatives
of
f
i
a
re
f
ij
.
d
2
dt
2
f
(
a
+
t
v
)
=
d
dt
n
X
i
=1
f
i
(
a
+
t
v
)
v
i
d
2
dt
2
f
(
a
+
t
v
)
=
n
X
i
=1
d
dt
(
f
i
(
a
+
t
v
))
v
i
d
2
dt
2
f
(
a
+
t
v
)
=
n
X
i
=1
n
X
j
=1
f
ij
(
a
+
t
v
)
v
j
v
i
d
2
dt
2
f
(
a
+
t
v
)
=
n
X
i
=1
n
X
j
=1
f
ij
(
a
+
t
v
)
v
j
v
i
W
e
could
use
this
formula
to
compute
the
second
derivative
of
a
particula
r
comp
osition
f
(
a
+
t
v
)
.
W
e
could
test
whether
it
is
p
ositive
o
r
negative.
What
w
e
need,
though,
is
to
test
the
sign
of
the
second
derivative
for
all
direction
vectors
v
.
While
this
is
p
ossible
in
small
dimensions,
leaving
the
v
i
as
variables,
the
algeb
ra
quickly
b
ecomes
daunting
as
n
increases.
1.3.5
The
Hessian
Matrix
A
theorem
in
linea
r
algebra
p
rovides
a
sho
rtcut.
T
o
do
linea
r
algebra,
we
need
to
write
our
vectors
lik
e
matrices.
74
P
oint
Fo
rm
and
Column
Fo
rm
When
we
want
to
emphasize
that
a
vector
x
rep
resents
a
point
in
R
n
,
we
write
it
as
x
=
(
x
1
,
x
2
,
.
.
.
x
n
)
.
If
w
e
want
to
do
matrix
multiplication,
w
e
can
write
it
as
a
column
vector:
x
=
x
1
x
2
.
.
.
x
n
The
exp
ression
d
2
dt
2
f
(
a
+
t
v
)
=
n
X
i
=1
n
X
j
=1
f
ij
(
a
+
t
v
)
v
j
v
i
that
w
e
obtained
can
b
e
written
as
a
matrix
product.
h
v
1
v
2
·
·
·
v
n
i
f
11
(
a
+
t
v
)
f
12
(
a
+
t
v
)
·
·
·
f
1
n
(
a
+
t
v
)
f
21
(
a
+
t
v
)
f
22
(
a
+
t
v
)
·
·
·
f
2
n
(
a
+
t
v
)
.
.
.
.
.
.
.
.
.
f
n
1
(
a
+
t
v
)
f
n
2
(
a
+
t
v
)
·
·
·
f
nn
(
a
+
t
v
)
v
1
v
2
.
.
.
v
n
Y
ou
may
want
to
convince
yourself
that
this
algeb
ra
is
correct
with
a
2
o
r
3
dimensional
example.
What
do
we
make
of
the
individual
factors
in
this
product?
The
vectors
on
either
end
of
this
expression
a
re
just
v
and
its
transp
ose
(
v
flipp
ed
sidewa
ys).
The
matrix
in
the
middle
seems
imp
ortant.
We
should
have
a
name
fo
r
it.
Definition
1.28
Given
a
function
f
(
x
)
of
n
variables,
the
Hessian
is
a
n
×
n
matrix
function
of
x
whose
entries
are
the
second
pa
rtial
derivatives
of
f
at
x
.
Its
formula
is
H
f
(
x
)
=
f
11
(
x
)
f
12
(
x
)
·
·
·
f
1
n
(
x
)
f
21
(
x
)
f
22
(
x
)
·
·
·
f
2
n
(
x
)
.
.
.
.
.
.
.
.
.
f
n
1
(
x
)
f
n
2
(
x
)
·
·
·
f
nn
(
x
)
Notice
that
for
w
ell-b
ehaved
functions,
f
ij
=
f
j
i
.
This
means
the
Hessian
will
b
e
a
symmetric
matrix.
75
1.3.6
Negative
Definite
Matrices
Our
Hessian
notation
allows
us
to
write
d
2
dt
2
f
(
a
+
t
v
)
=
v
T
H
f
(
a
+
t
v
)
v
.
The
following
vo
cabulary
captures
exactly
which
Hessian
matrices
will
produce
a
negative
(or
p
ositive)
second
derivative
in
all
directions.
Definition
1.29
A
symmetric
n
×
n
matrix
M
is
negative
definite
,
if
for
all
nonzero
n
-vectors
v
,
v
T
M
v
<
0
.
It
is
p
ositive
definite
,
if
for
all
nonzero
n
-vectors
v
,
v
T
M
v
>
0
.
Rema
rk
W
e
ma
y
think,
based
on
p
ositive
and
negative
numb
ers,
that
most
matrices
are
either
positive
definite
o
r
negative
definite.
In
fact,
a
randomly
chosen
matrix
is
most
likely
to
b
e
neither.
Fo
r
most
matrices,
v
T
M
v
is
p
ositive
fo
r
some
v
and
negative
fo
r
others.
Connecting
this
back
to
our
original
goal,
w
e
can
apply
the
definition
of
negative
definite
to
describ
e
the
sign
of
the
second
derivatives
of
the
comp
ositions
f
(
a
+
t
v
)
.
F
o
r
a
critical
p
oint
a
,
w
e
can
draw
the
follo
wing
conclusions.
If
H
f
(
a
)
is
negative
definite,
then
d
2
f
dt
2
(0)
<
0
fo
r
all
v
.
Since
d
f
dt
(0)
=
0
,
t
=
0
satisfies
the
second-o
rder
condition
for
all
v
.
We
conclude
a
is
a
strict
lo
cal
maximizer
of
f
.
If
H
f
(
x
)
is
negative
definite
for
all
x
,
then
d
2
f
dt
2
(
t
)
<
0
for
all
t
and
all
v
.
Since
d
f
dt
(0)
=
0
,
t
=
0
satisfies
the
global
second-order
condition
for
all
v
.
We
conclude
a
is
the
unique
global
maximizer
of
f
.
Without
an
efficient
wa
y
to
test
whether
a
matrix
is
negative
definite,
we
have
only
managed
to
restate
the
problem.
We
can
say
that
w
e
are
checking
whether
H
f
(
a
)
is
negative
definite,
or
w
e
can
sa
y
that
we
are
checking
whether
n
X
i
=1
n
X
j
=1
f
ij
(
a
)
v
j
v
i
<
0
for
all
v
1
,
v
2
,
.
.
.
,
v
n
.
The
computations
are
the
same,
and
they
a
re
daunting.
F
ortunately
,
there
is
an
easier
wa
y
to
show
that
a
matrix
is
negative
or
p
ositive
definite.
76
Theo
rem
1.30
A
symmetric
n
×
n
matrix
M
is
negative
definite
if
the
upp
er
left
square
minors
M
i
have
determinants
that
satisfy
the
alternating
condition
:
(
−
1)
i
|
M
i
|
>
0
for
all
1
≤
i
≤
n
Example
F
or
a
4
×
4
matrix,
here
are
the
M
i
and
what
w
e
would
check
to
apply
Theo
rem
1.30.
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
·
|
M
1
|
<
0
|
M
2
|
>
0
|
M
3
|
<
0
|
M
4
|
>
0
V
ariant
of
Theo
rem
1.30
A
n
×
n
matrix
M
is
p
ositive
definite
if
the
upp
er
left
square
minors
have
p
ositive
determinants.
1.3.7
The
Multiva
riable
Second-Order
Conditions
No
w
that
we
have
a
check
for
negative
definite
and
positive
definite
matrices,
it
is
w
orth
stating
the
results
of
our
investigation
as
theorems.
These
a
re
our
most
imp
ortant
sufficient
conditions
fo
r
a
maximizer
of
a
multi-va
riable
function.
77
1.3.7
The
Multivariable
Second-Order
Conditions
Theo
rem
1.31
[The
Multivariable
Second-Order
Condition]
Given
a
function
f
(
x
)
with
a
it
its
domain.
If
∇
f
(
a
)
=
0
H
f
(
a
)
is
negative
definite
then
a
is
a
strict
lo
cal
maximizer
of
f
.
Theo
rem
1.32
[The
Multivariable
Global
Second-Order
Condition]
Supp
ose
f
is
a
function
with
a
convex
domain
D
and
x
∗
is
in
D
.
If
∇
f
(
x
∗
)
=
0
H
f
(
x
)
is
negative
definite
for
all
x
∈
D
.
then
x
∗
is
the
only
critical
p
oint
of
f
and
the
unique
global
maximizer.
W
e
can
summarize
the
reasoning
b
ehind
the
global
second-o
rder
condition
as
follows:
H
f
(
x
)
is
negative
definite
fo
r
all
x
d
2
dt
2
f
(
x
∗
+
t
v
)
<
0
for
all
directions
v
∇
f
(
x
∗
)
=
0
d
f
dt
(0)
=
0
along
all
lines
x
∗
+
t
v
0
is
the
global
maximizer
of
f
(
x
∗
+
t
v
)
for
all
directions
v
x
∗
is
the
global
maximizer
of
f
(
x
)
These
theo
rems
have
variants
fo
r
minimizers.
78
V
ariants
of
Theo
rems
1.31
and
1.32
Supp
ose
f
is
a
function
with
domain
D
and
a
is
in
D
.
1
If
∇
f
(
a
)
=
0
H
f
(
a
)
is
p
ositive
de
finite
then
a
is
a
strict
lo
cal
minimizer
of
f
.
2
If
D
is
convex
and
∇
f
(
a
)
=
0
H
f
(
x
)
is
p
ositive
definite
for
all
x
in
D
then
a
is
the
only
critical
p
oint
of
f
and
the
unique
global
minimizer.
Mino
r
determinants
with
the
wrong
sign
produce
the
wrong
sign
of
v
T
M
v
fo
r
some
v
.
This
in
turn
means
the
wrong
second
derivative
in
the
direction
of
v
.
This
relationship
is
formalized
in
the
following
necessa
ry
condition.
Co
rollary
1.33
Let
f
(
x
)
b
e
a
twice-differentiable
function
on
a
convex
domain
D
.
Let
M
=
H
f
(
a
)
.
1
If
a
is
a
lo
cal
maximizer,
then
(
−
1)
i
|
M
i
|
≥
0
for
all
1
≤
i
≤
n.
2
If
a
is
lo
cal
minimizer,
then
|
M
i
|
≥
0
for
all
1
≤
i
≤
n.
1.3.8
Applying
the
Multiva
riable
Second-Order
Condition
Let
f
(
x
1
,
x
2
,
x
3
)
=
3
3
√
x
1
x
3
+ln
x
2
−
x
1
4
−
x
2
5
−
2
x
3
+15
on
the
domain
D
=
{
(
x
1
,
x
2
,
x
3
)
:
x
1
,
x
2
,
x
3
>
0
}
.
Find
the
critical
p
oint
of
f
.
What
do
es
the
Global
Second-Order
Condition
say
ab
out
this
p
oint?
79
1.3.8
Applying
the
Multiva
riable
Second-Order
Condition
Solution
W
e
take
all
three
pa
rtial
derivatives
of
f
.
Since
they
are
alw
ays
defined
on
D
,
the
gradient
vector
exists,
so
the
critical
p
oints
must
b
e
where
they
all
equal
0
.
3
√
x
3
3
p
x
2
1
−
1
4
=
0
1
x
2
−
1
5
=
0
3
√
x
1
3
p
x
2
3
−
2
=
0
3
√
x
3
3
p
x
2
1
=
1
4
1
x
2
=
1
5
3
√
x
1
3
p
x
2
3
=
2
3
√
x
3
=
3
p
x
2
1
4
x
2
=
5
16
3
√
x
1
3
p
x
4
1
=
2
16
x
1
=
2
8
=
x
1
3
√
x
3
=
3
√
8
2
4
x
3
=
1
The
critical
p
oint
is
(8
,
5
,
1)
.
We
compute
the
Hessian
by
taking
the
second
partial
derivatives
of
f
.
H
f
(
x
1
,
x
2
,
x
3
)
=
−
2
3
√
x
3
3
3
√
x
5
1
0
1
3
3
√
x
2
1
x
2
3
0
−
1
x
2
2
0
1
3
3
√
x
2
1
x
2
3
0
−
2
3
√
x
1
3
3
√
x
5
3
T
o
check
that
H
f
(
x
)
is
negative
definite
for
all
x
,
we
apply
Theorem
1.30.
There
a
re
three
determinants
to
calculate.
(
−
1)
1
|
M
1
|
=
(
−
1)
−
2
3
√
x
3
3
3
p
x
5
1
=
2
3
√
x
3
3
3
p
x
5
1
(
−
1)
2
|
M
2
|
=
(1)
−
2
3
√
x
3
3
3
√
x
5
1
0
0
−
1
x
2
2
=
2
3
√
x
3
3
x
2
2
3
p
x
5
1
(
−
1)
3
|
M
3
|
=
−|
H
f
(
x
1
,
x
2
,
x
3
)
|
=
−
−
2
3
√
x
3
3
3
p
x
5
1
!
−
1
x
2
2
0
0
−
2
3
√
x
1
3
3
√
x
5
3
−
0
+
1
3
3
p
x
2
1
x
2
3
!
0
−
1
x
2
2
1
3
3
√
x
2
1
x
2
3
0
!
=
4
9
x
2
2
3
p
x
4
1
x
4
3
−
1
9
x
2
2
3
p
x
4
1
x
4
3
=
1
3
x
2
2
3
p
x
4
1
x
4
3
80
Since
x
1
,
x
2
and
x
3
a
re
all
p
ositive
in
this
domain,
so
are
these
three
quantities.
W
e
conclude
that
H
f
(
x
1
,
x
2
,
x
3
)
is
negative
definite.
By
the
global
second-order
condition,
we
conclude
that
(8
,
5
,
1)
is
the
only
critical
p
oint
(which
we
already
knew
from
the
FOC)
and
that
it
is
the
unique
maximizer
of
f
.
1.3.9
The
Hessian
and
Concavit
y
Linea
r
compositions
pla
y
w
ell
with
secants.
The
height
of
the
secant
ab
ove
each
t
is
the
same
whether
we
are
lo
oking
in
ty
-plane
or
in
R
n
+1
.
W
e
can
state
this
relationship
formally
as
a
lemma.
Lemma
1.34
If
y
=
s
(
t
)
is
a
secant
of
y
=
f
(
a
+
t
v
)
,
then
x
(
t
)
=
a
+
t
v
y
(
t
)
=
s
(
t
)
is
a
secant
of
y
=
f
(
x
)
.
Click to Load Applet
Figure
1.30:
A
secant
of
y
=
f
(
a
+
t
v
)
and
the
corresponding
secant
of
y
=
f
(
x
)
over
x
(
t
)
=
a
+
t
v
W
e
determine
whether
a
function
f
(
x
)
is
concave
by
whether
its
graph
lies
ab
ove
its
sec
ant
lines.
Every
such
secant
line
lies
ab
ove
some
line
a
+
t
v
.
This
secant
stays
b
elo
w
the
graph
y
=
f
(
x
)
,
if
and
only
if
the
co
rresp
onding
secant
sta
ys
b
elow
y
=
f
(
a
+
t
v
)
.
81
1.3.9
The
Hessian
and
Concavity
Lemma
1.35
A
function
f
(
x
)
on
a
convex
domain
is
concave,
if
and
only
if
f
(
a
+
t
v
)
is
concave
for
every
a
in
the
domain
of
f
and
every
direction
vector
v
.
f
(
a
+
t
v
)
is
concave
for
all
lines
a
+
t
v
y
=
f
(
a
+
t
v
)
lies
ab
ove
its
secants
(
t,
s
(
t
))
for
all
lines
a
+
t
v
y
=
f
(
x
)
lies
ab
ove
the
secants
x
(
t
)
=
a
+
t
v
y
(
t
)
=
s
(
t
)
for
all
lines
a
+
t
v
f
(
x
)
is
concave
Def
1.16
Cor
1.34
Def
1.16
V
ariants
of
Lemma
1.35
Supp
ose
f
(
x
)
is
a
function
on
a
convex
domain
1
f
(
x
)
is
strictly
concave
if
and
only
if
f
(
x
(
t
))
is
strictly
concave
fo
r
every
line
x
(
t
)
in
the
domain
of
f
.
2
f
(
x
)
is
convex
if
and
only
if
f
(
x
(
t
))
is
convex
for
every
line
x
(
t
)
in
the
domain
of
f
.
3
f
(
x
)
is
strictly
convex
if
and
only
if
f
(
x
(
t
))
is
strictly
convex
for
every
line
x
(
t
)
in
the
domain
of
f
.
The
result
of
this
lemma
is
that
any
test
for
concavity
o
r
convexity
of
the
single-variable
compositions
f
(
a
+
t
v
)
b
ecomes
a
test
fo
r
the
concavity
of
f
(
x
)
itself.
If
every
comp
osition
passes,
then
f
do
es.
If
at
least
one
comp
osition
fails,
then
so
do
es
f
.
Our
natural
next
step
is
to
tak
e
our
theo
rems
from
single-va
riable
concavity
and
apply
them
to
these
comp
ositions,
Theo
rem
1.22
show
ed
that
the
graphs
single-variable
strictly
concave
functions
lie
b
elow
their
tangent
lines.
T
angent
lines
to
a
composition
have
the
same
heights
as
the
tangent
lines
of
the
multiva
riable
graph.
We
can
formalize
this
relationship
with
a
lemma.
Lemma
1.36
If
the
line
y
=
ℓ
(
t
)
is
tangent
to
y
=
f
(
a
+
t
v
)
,
then
x
(
t
)
=
a
+
t
v
y
(
t
)
=
ℓ
(
t
)
is
a
tangent
line
to
y
=
f
(
x
)
.
82
Click to Load Applet
Figure
1.31:
A
tangent
line
of
y
=
f
(
a
+
t
v
)
and
the
corresponding
tangent
line
of
y
=
f
(
x
)
over
x
(
t
)
=
a
+
t
v
W
e
can
check
whether
tangent
lines
lie
ab
ove
o
r
b
elow
a
graph
y
=
f
(
x
)
by
checking
whether
tangent
lines
lie
ab
ove
o
r
below
the
comp
ositions
y
=
f
(
a
+
t
v
)
.
Theo
rem
1.37
[Multivariable
version
of
Theo
rem
1.22]
A
differentiable
function
f
on
a
convex
domain
is
strictly
concave
if
and
only
if
the
graph
y
=
f
(
x
)
lies
b
elo
w
each
of
its
tangent
lines
(except
at
the
p
oint
of
tangency).
y
=
f
(
x
)
lies
b
elow
its
tangent
lines.
y
=
f
(
a
+
t
v
)
lies
b
elow
all
its
tangent
lines
for
any
a
+
t
v
.
f
(
a
+
t
v
)
is
strictly
concave
for
any
a
+
t
v
f
(
x
)
is
strictly
concave
Cor
1.36
Thm
1.22
Lem
1.35
This
theo
rem
allows
us
to
up
date
our
co
rollary
on
critical
p
oints
of
strictly
concave
functions.
Co
rollary
1.38
[Multiva
riable
Version
of
Corolla
ry
1.23]
If
x
∗
is
a
critical
point
of
a
differentiable,
strictly
concave
function
f
(
x
)
,
then
it
is
the
only
critical
p
oint
and
is
the
unique
global
maximizer.
83
1.3.9
The
Hessian
and
Concavity
W
e
can
also
use
compositions
to
update
our
theorems
ab
out
second
derivatives
and
concavit
y
.
Theo
rem
1.39
[Multivariable
V
ersion
of
Theorem
1.24]
If
f
is
a
function
on
a
convex
domain
D
⊂
R
n
and
d
2
dt
2
f
(
a
+
t
v
)
<
0
fo
r
all
lines
a
+
t
v
in
D
,
then
f
(
x
)
is
strictly
concave.
Co
rollary
1.40
[Multiva
riable
Version
of
Corolla
ry
1.25]
A
twice
differentiable
function
f
(
x
)
on
a
convex
domain
D
is
concave,
if
and
only
if
d
2
dt
2
f
(
a
+
t
v
)
≤
0
on
all
lines
a
+
t
v
in
D
.
As
w
e
have
seen,
the
sign
of
these
second
derivatives
dep
ends
on
the
Hessian
matrix.
This
gives
us
our
most
useful
computational
test
fo
r
the
concavit
y
of
a
multiva
riable
function.
Theo
rem
1.41
Let
f
(
x
)
b
e
a
t
wice-differentiable
function
on
a
convex
domain
D
.
If
H
f
(
x
)
is
negative
definite
for
all
x
in
D
,
then
f
is
strictly
concave.
This
means
that
concavity
plays
the
same
role
in
multiva
riable
optimization
as
in
single-variable
optimization.
F
or
some
functions,
w
e
can
identify
a
maximizer
by
concavity
even
though
they
do
not
satisfy
the
second-o
rder
condition.
On
the
other
hand,
the
most
convenient
wa
y
to
identify
concave
functions
is
still
the
second
derivatives.
84
1.3.10
Negative
Semi-Definite
Matrices
Changing
the
condition
f
′′
(
x
)
<
0
to
f
′′
(
x
)
≤
0
was
a
natural
approach
to
generalizing
Theo
rem
1.24.
As
a
bonus,
w
e
obtained
Co
rollary
1.25,
which
is
both
a
necessary
and
sufficient
condition
fo
r
concavit
y
.
The
same
is
p
ossible
fo
r
multivariable
functions,
but
the
condition
on
t
he
Hessian
is
surp
risingly
complicated.
W
e
might
exp
ect
the
multivariable
analogue
to
b
e
that
(
−
1)
i
|
M
i
|
≥
0
fo
r
each
i
.
Consider
a
function
like
f
(
x
1
,
x
2
,
x
3
)
=
x
2
3
.
In
the
x
3
direction
this
function
is
a
parabola
which
is
strictly
convex.
Unfo
rtunately
,
its
Hessian
passes
our
expected
test
for
concavity
.
H
f
(
x
1
,
x
2
,
x
3
)
=
0
0
0
0
0
0
0
0
2
.
This
has
|
M
i
|
=
0
for
each
i
.
The
determinant
cannot
detect
the
curvature
of
later
va
riables,
if
the
ea
rlier
variables
p
ro
duce
a
row
of
zero
es
in
H
f
(
x
)
.
Our
exp
ected
test
is
not
a
correct
test.
T
o
avoid
this
p
roblem,
we
must
consider
different
o
rders
of
variables.
Which
orders?
All
of
them.
Theo
rem
1.42
Let
f
(
x
)
be
a
t
wice-differentiable
function
on
a
convex
domain
D
⊆
R
n
.
let
σ
b
e
any
reo
rdering
of
the
co
o
rdinates
of
R
n
.
Let
σ
M
=
H
f
(
σ
x
)
.
1
f
is
concave
if
and
only
if
(
−
1)
i
|
σ
M
i
|
≥
0
for
all
σ
and
all
1
≤
i
≤
n.
If
so,
then
v
T
H
f
(
x
)
v
≤
0
for
all
nonzero
v
.
W
e
say
H
f
(
x
)
is
negative
semidefinite
2
f
is
convex
if
and
only
if
|
σ
M
i
|
≥
0
for
all
σ
and
all
1
≤
i
≤
n.
If
so,
then
v
T
H
f
(
x
)
v
≥
0
for
all
nonzero
v
.
W
e
say
H
f
(
x
)
is
p
ositive
semidefinite
T
o
give
a
sense
of
the
amount
of
computation
required,
we
will
consider
an
example.
Supp
ose
H
f
(
x
)
=
0
2
5
2
−
6
0
5
0
−
3
.
W
e
check
the
determinants
|
M
1
|
=
|
0
|
≤
0
|
M
2
|
=
0
2
2
−
6
≥
0
|
M
3
|
=
0
2
5
2
−
6
0
5
0
−
3
≤
0
85
1.3.10
Negative
Semi-Definite
Matrices
But
then
we
must
do
the
same
for
all
p
ossible
reo
rderings
of
x
1
,
x
2
and
x
3
meaning
we
must
check
the
same
three
mino
rs
fo
r
each
of
0
2
5
2
−
6
0
5
0
−
3
o
riginal
H
f
(
x
)
−
6
2
0
2
0
5
0
5
−
3
σ
switches
x
1
and
x
2
−
3
0
5
0
−
6
2
5
2
0
σ
switches
x
1
and
x
3
0
5
2
5
−
3
0
2
0
−
6
σ
switches
x
2
and
x
3
−
6
0
2
0
−
3
5
2
5
0
σ
:
(
x
1
,
x
2
,
x
3
)
7→
(
x
2
,
x
3
,
x
1
)
−
3
5
0
5
0
2
0
2
−
6
σ
:
(
x
1
,
x
2
,
x
3
)
7→
(
x
3
,
x
1
,
x
2
)
This
is
a
significant
amount
of
w
ork,
though
linea
r
algebra
knowledge
lets
us
avoid
some.
F
or
instance,
all
the
|
σ
M
3
|
a
re
equal.
As
the
dimension
increases,
there
a
re
more
symmetries
to
exploit.
With
these
shortcuts,
the
main
driver
in
complexit
y
is
computing
the
n
×
n
determinant,
not
the
number
of
re-ordered
smaller
determinants.
Checking
whether
a
matrix
is
negative
semidefinite
takes
ab
out
twice
as
many
op
erations
as
checking
that
it
is
negative
definite.
1.3.11
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
multiva
riable
first-order
condition
(Theo
rem
1.27)
The
Hessian
matrix
(Definition
1.28)
P
ositive
and
negative
definite
(Definition
1.29)
The
determinant
test
fo
r
a
negative
definite
matrix
(Theorem
1.30)
The
multiva
riable
global
second-order
condition
(Theo
rem
1.32)
The
Hessian
test
fo
r
strict
concavity
(Theorem
1.41)
Here
is
a
summa
ry
of
which
conditions
and
statements
from
this
section
imply
which
others.
86
H
f
(
x
)
negative
definite
fo
r
all
x
H
f
(
x
)
negative
semi-
definite
fo
r
all
x
H
f
(
a
)
negative
definite
f
is
strictly
concave
f
is
concave
a
is
the
unique
global
max
and
only
critical
p
oint
a
is
a
global
max
a
is
a
strict
lo
cal
max
∇
f
(
a
)
=
0
∇
f
(
a
)
=
0
H
f
(
a
)
negative
semidefinite
Thm
1.41
Thm
1.42
SOC
GSOC
GSOC
FOC
FOC
Thm
1.33
Thm
1.33
Figure
1.32:
Relationships
b
etw
een
the
conditions
of
multivariable
optimization
87
1.3.11
Section
Summa
ry
88
Chapter
2
Constrained
Optimization
2.1
Equalit
y
Constraints
Goals:
1
Use
logic
to
reason
ab
out
the
effects
of
a
constraint
on
optimization
problems.
2
Visually
identify
maximizers
and
minimizers
using
level
sets.
3
Compute
the
maximizer
subject
to
an
equality
constraint
using
the
Lagrangian.
4
Use
paths
and
the
chain
rule
to
understand
the
reasoning
b
ehind
the
Lagrangian.
2.1.1
Constrained
Optimization
Given
a
function
f
(
x
)
with
domain
D
,
we
have
so
far
p
erfo
rmed
unconstrained
optimization,
trying
to
compute
max
x
∈
D
f
(
x
)
or
min
x
∈
D
f
(
x
)
In
a
constrained
optimization
problem,
we
restrict
our
attention
to
a
subset
of
the
domain.
F
or
some
subset
S
⊆
D
,
we
want
to
know
max
x
∈
S
f
(
x
)
or
min
x
∈
S
f
(
x
)
A
maximizer
in
S
is
a
vecto
r
a
such
that
f
(
a
)
≥
f
(
x
)
fo
r
all
x
∈
S
.
Geometrically
,
it
is
the
highest
p
oint
on
the
pa
rt
of
the
graph
y
=
f
(
x
)
that
lies
ab
ove
S
.
In
constrained
optimization,
the
function
we
a
re
maximizing
is
called
a
objective
function
.
The
set
S
is
called
the
feasible
set
.
The
notation
ab
ove
is
flexible
enough
to
apply
to
many
different
kinds
of
constraints,
producing
many
different
kinds
of
subset
S
.
Click to Load Applet
Figure
2.1:
The
graph
y
=
f
(
x
)
over
a
tw
o-dimensional
subregion
S
90
Click to Load Applet
Figure
2.2:
The
graph
y
=
f
(
x
)
over
a
curve
S
Click to Load Applet
Figure
2.3:
The
graph
y
=
f
(
x
)
over
a
finite
set
S
Constraints
arise
naturally
in
economics.
Here
are
tw
o
familiar
examples
recast
in
the
abstract
form
that
w
e
just
intro
duced.
Example
What
is
the
maximum
utilit
y
one
can
attain
given
a
budget
constraint?
max
x
∈
S
u
(
x
)
where
S
is
the
set
of
purchases
one
can
afford.
91
2.1.1
Constrained
Optimization
Example
Ho
w
cheaply
can
a
firm
produce
500
units?
min
x
∈
S
c
(
x
)
Here,
S
is
the
set
of
inputs
that
produce
500
units.
Without
knowing
the
specific
nature
of
the
constraint,
we
can
still
deduce
some
basic
facts
ab
out
constrained
optimization
using
logic
and
set
theo
ry
.
Tightening
a
constraint
can
only
reduce
the
value
of
the
maximum.
Lemma
2.1
F
or
a
function
f
(
x
)
with
domain
D
and
a
tw
o
subsets
T
⊆
S
⊆
D
,
max
x
∈
T
f
(
x
)
≤
max
x
∈
S
f
(
x
)
Sometimes
finding
a
maximum
over
S
is
computationally
easier
than
finding
a
maximum
over
T
.
If
w
e
are
lucky
,
the
maximizer
of
f
(
x
)
in
S
might
also
lie
in
T
.
W
e
can
apply
the
following
reasoning.
Co
rollary
2.2
Supp
ose
x
∗
is
a
maximizer
of
f
(
x
)
over
a
set
S
.
If
T
⊆
S
and
x
∗
∈
T
then
x
∗
is
also
a
maximizer
of
f
(
x
)
over
T
and
max
x
∈
T
f
(
x
)
=
max
x
∈
S
f
(
x
)
Rema
rk
Why
do
we
make
the
distinction
b
etw
een
the
domain
and
a
subset?
If
we
want
to
constrain
x
to
values
in
S
,
we
could
redefine
f
to
have
a
domain
of
S
.
This
is
generally
not
a
go
o
d
idea.
We
will
want
to
use
the
derivatives
of
f
to
solve
constrained
optimization
problems.
Fo
r
example,
we
ma
y
need
to
know
lim
h
→
0
f
(
a
+
h
e
i
)
−
f
(
a
)
h
If
a
is
on
the
boundary
of
S
and
we
declare
everything
outside
S
to
no
longer
be
in
the
domain
of
f
,
then
we
may
not
b
e
able
to
evaluate
f
(
a
+
h
e
i
)
in
a
neighb
o
rho
o
d
of
h
=
0
.
92
2.1.2
Equalit
y
Constraints
and
Level
Sets
The
first
t
yp
e
of
constraint
w
e
will
examine
is
an
equality
constraint.
Definition
2.3
An
equalit
y
constraint
is
a
constraint
of
the
form
g
(
x
)
=
0
.
With
such
a
constraint
we
are
solving
max
x
∈
D
f
(
x
)
subject
to
g
(
x
)
=
0
or
max
x
∈
S
f
(
x
)
where
S
=
{
x
∈
D
:
g
(
x
)
=
0
}
T
o
contrast
g
with
the
objective
function
f
,
we
call
it
a
constraint
function
.
Rema
rk
W
e
can
rewrite
any
equation
in
the
variables
of
x
to
have
the
form
g
(
x
)
=
0
.
Fo
r
example:
x
2
=
16
x
1
=
⇒
16
−
x
1
x
2
=
0
An
equalit
y
constraint
is
an
equation,
a
condition
that
a
p
oint
can
satisfy
or
fail
to
satisfy
.
In
order
to
maximize
f
among
such
p
oints,
we
want
to
b
e
able
to
visualize
the
entire
set
of
p
oints
that
satisfy
the
constraint.
Mathematics
has
a
term
for
this
set.
Definition
2.4
Given
a
function
g
(
x
)
a
level
set
of
g
is
the
set
of
p
oints
that,
for
some
numb
er
c
,
satisfy
g
(
x
)
=
c
.
Example
Given
the
function
g
(
x
1
,
x
2
)
=
x
1
x
2
,
here
a
re
four
diffe
rent
level
sets
of
g
:
{
(
x
1
,
x
2
)
:
x
1
x
2
=
9
}
{
(
x
1
,
x
2
)
:
x
1
x
2
=
17
.
32
}
{
(
x
1
,
x
2
)
:
x
1
x
2
=
0
}
{
(
x
1
,
x
2
)
:
x
1
x
2
=
−
4
}
93
2.1.2
Equalit
y
Constraints
and
Level
Sets
Rema
rk
Commonly
,
p
eople
refer
to
a
level
set
by
its
defining
equation,
rather
than
using
set
notation.
Thus
“the
level
set
x
2
1
+
x
2
2
=
16
”
refers
to
the
set
{
(
x
1
,
x
2
)
:
x
2
1
+
x
2
2
=
16
}
.
Level
sets
have
several
applications
in
economics
you
may
b
e
familiar
with:
Example
If
w
e
are
cho
osing
to
buy
quantities
x
1
and
x
2
of
goo
ds
priced
at
p
1
and
p
2
,
then
the
go
o
ds
we
can
buy
with
a
budget
constraint
I
is
the
level
set
(often
called
a
budget
line):
p
1
x
1
+
p
2
x
2
=
I
Example
Given
a
utility
function
u
(
x
1
,
x
2
)
,
the
level
set
u
(
x
1
,
x
2
)
=
c
is
called
an
indifference
curve
.
The
cho
oser
will
b
e
equally
happ
y
with
any
p
oint
on
the
curve.
F
or
a
tw
o-variable
function,
a
level
set
is
the
intersection
of
the
graph
y
=
g
(
x
1
,
x
2
)
with
a
horizontal
plane
y
=
c
.
These
are
often
curves,
so
we
call
them
level
curves
.
Click to Load Applet
Figure
2.4:
The
level
curves
x
1
x
2
=
c
Notice
that
x
1
x
2
=
c
is
not
alwa
ys
a
curve.
When
c
=
0
,
the
level
set
is
the
x
1
-
and
x
2
-axes.
At
the
o
rigin,
this
is
t
wo
curves
(lines)
intersecting.
94
F
or
a
three-va
riable
function,
the
level
sets
are
called
level
surfaces
b
ecause
for
most
values
of
c
,
they
a
re
2
-dimensional.
Click to Load Applet
Figure
2.5:
The
level
surfaces
x
2
1
+
x
2
2
+
x
2
3
=
c
Notice
x
2
1
+
x
2
2
+
x
2
3
=
0
is
not
a
surface,
it
is
a
p
oint.
x
2
1
+
x
2
2
+
x
2
3
=
−
5
is
the
empt
y
set.
It
can
b
e
difficult
to
determine
the
shap
e
and
structure
of
a
level
set.
A
p
ow
erful
mathematical
theo
rem
can
tell
us
when
to
exp
ect
go
o
d
b
ehavior.
Co
rollary
2.5
[Co
rollary
to
the
Implicit
Function
Theorem]
Let
g
(
x
)
b
e
a
continuously
differentiable
function
at
a
.
If
a
lies
on
the
level
set
g
(
x
)
=
c
and
∇
g
(
a
)
=
0
,
then
the
level
set
g
(
x
)
=
c
is
a
(
n
−
1)
-dimensional
shap
e
in
some
neighb
o
rho
o
d
of
a
.
Sp
ecifically
,
it
is
the
graph
of
a
differentiable
function
of
n
−
1
of
the
variables
of
R
n
.
This
means
that
for
smooth
functions
w
e
can
generally
expect
the
level
set
to
b
e
a
space
of
one
dimension
lo
wer
than
the
domain,
except
at
the
o
ccasional
p
oints
where
∇
g
(
x
)
=
0
.
We
can
see
this
in
our
p
revious
examples.
F
or
g
(
x
1
,
x
2
)
=
x
1
x
2
the
nonempty
level
sets
are
curves,
except
for
x
1
x
2
=
0
.
Even
x
1
x
2
=
0
lo
oks
lik
e
a
curve
in
any
neighb
orhoo
d
that
do
es
not
include
(0
,
0)
.
F
or
g
(
x
1
,
x
2
,
x
3
)
=
x
2
1
+
x
2
2
+
x
2
3
the
nonempty
level
sets
are
surfaces
(spheres),
except
for
x
2
1
+
x
2
2
+
x
2
3
=
0
W
e
will
give
an
argument
for
this
corolla
ry
once
we
have
seen
the
implicit
function
theorem
in
chapter
4.
95
2.1.3
Finding
the
Maximizer
and
Minimizer
Graphically
The
level
set
g
(
x
)
=
0
is
obviously
imp
ortant
to
a
constrained
optimization
p
roblem.
W
e
can
also
use
level
sets
to
understand
the
shap
e
of
the
objective
function.
Level
sets
have
the
advantage
that
they
live
in
R
n
,
not
R
n
+1
,
making
them
easier
to
dra
w
than
the
graph
y
=
f
(
x
)
,
Consider
the
objective
function
f
(
x
1
,
x
2
)
=
x
2
1
+
x
2
2
subject
to
x
2
1
−
x
2
−
4
=
0
.
a
What
a
re
the
level
curves
of
f
(
x
1
,
x
2
)
?
b
Ho
w
can
we
use
a
diagram
of
the
level
curves
of
f
to
argue
that
(1
,
−
3)
is
not
a
lo
cal
maximizer
o
r
minimizer?
c
Where
do
the
lo
cal
maximizer(s)
and
lo
cal
minimizer(s)
of
f
on
x
2
1
−
x
2
−
4
=
0
app
ear
to
lie?
d
Are
any
of
the
lo
cal
maximizers
and
minimizers
also
global
maximizers
or
minimizer?
Solution
a
The
level
sets
have
the
fo
rm
x
2
1
+
x
2
2
=
c
these
are
circles
centered
at
the
origin,
with
higher
values
the
fa
rther
from
the
origin
w
e
go.
b
(1
,
−
3)
lies
on
the
level
set
x
2
1
+
x
2
2
=
10
.
We
can
see
from
a
sk
etch
that
the
curve
x
2
1
−
x
2
−
4
=
0
passes
through
this
level
set
from
higher
values
to
lo
wer
values.
Thus
(1
,
−
3)
is
not
a
lo
cal
maximizer
o
r
minimizer.
c
(0
,
−
4)
app
ears
to
b
e
a
lo
cal
maximizer,
because
x
2
1
−
x
2
−
4
=
0
meets
the
x
2
1
+
x
2
2
=
16
and
travels
back
to
smaller-values
in
b
oth
directions.
There
also
app
ear
to
b
e
lo
cal
minimizers
in
the
third
and
fourth
quadrant
where
x
2
1
−
x
2
−
4
=
0
touches
a
smallest
level
set
and
then
travels
back
to
la
rger-valued
level
sets.
d
There
is
no
global
maximizer.
As
we
travel
up
wa
rds
in
x
2
1
−
x
2
−
4
=
0
,
w
e
cross
larger
and
larger
valued
level
sets.
There
is
no
upp
er
b
ound
to
the
values
of
f
(
x
1
,
x
2
)
that
w
e
can
attain
on
the
constraint.
On
the
other
hand,
the
lo
cal
minimizers
are
global
minimizers.
The
only
smaller-valued
level
curves
a
re
inside
that
circle,
and
the
constraint
do
es
not
enter
that
region.
96
c
=
1
c
=?
c
=
10
c
=
16
c
=
25
(1
,
−
3)
global
min
global
min
local
max
x
1
x
2
Figure
2.6:
Level
curves
of
x
2
1
+
x
2
2
and
the
level
curve
x
2
1
−
x
2
−
4
=
0
Click to Load Applet
Figure
2.7:
The
graph
of
y
=
x
2
1
+
x
2
2
over
the
constraint
x
2
1
−
x
2
−
4
=
0
When
the
level
set
g
(
x
)
=
0
crosses
a
level
set
f
(
x
)
=
c
at
a
,
this
usually
means
that
it
intersects
97
2.1.3
Finding
the
Maximizer
and
Minimizer
Graphically
b
oth
la
rger-valued
and
smaller-valued
level
sets
on
either
side.
If
it
do
es,
then
a
cannot
be
a
lo
cal
maximizer
o
r
minimizer
of
f
on
the
constraint.
Main
Idea
W
e
exp
ect
to
find
lo
cal
maximizer
and
minimizers
only
where
the
level
set
g
(
x
)
=
0
is
tangent
to
a
level
set
of
f
.
W
e
can
base
our
intuition
on
the
following
na
rrative:
g
(
x
)
=
0
crosses
higher
and
higher-valued
level
sets
of
f
as
it
approaches
f
(
x
)
=
c
.
When
it
re
aches
f
(
x
)
=
c
,
rather
than
crossing
it
to
even
higher
values,
it
b
rushes
against
it
(tangency)
and
backs
aw
ay
tow
ard
low
er-valued
level
sets.
This
do
es
not
describ
e
every
p
ossible
version
of
tangency
.
The
curve
x
2
−
x
3
1
=
0
is
tangent
to
the
level
set
x
2
=
0
,
but
still
crosses
onto
b
oth
sides.
Furthermo
re
level
sets
can
b
ehave
unp
redictably
when
their
gradient
vector
is
0
.
A
rigorous
a
rgument
for
a
maximizer
using
tangency
generally
needs
to
app
eal
to
the
algeb
raic
properties
of
f
(
x
)
and
g
(
x
)
in
some
wa
y
.
2.1.4
The
Lagrangian
Lo
oking
for
tangency
is
a
valuable
metho
d
for
appro
ximating
the
maximizer,
but
w
e
cannot
exp
ect
it
to
b
e
precise
enough
to
determine
exact
co
ordinates.
Fo
r
an
exact
p
osition,
w
e
need
a
set
of
equations
to
solve.
These
equations
tak
e
the
fo
rm
of
a
necessary
condition
for
a
maximizer
subject
to
an
equality
constraint.
We
first
form
a
new
function
from
b
oth
the
objective
function
and
the
constraint.
Definition
2.6
Given
an
n
-variable
objective
function
f
(
x
)
and
a
constraint
g
(
x
)
=
0
,
the
Lagrangian
is
an
(
n
+
1)
-
va
riable
function
of
the
x
i
and
λ
(lamb
da).
Its
equation
is
L
(
x,
λ
)
=
f
(
x
)
+
λg
(
x
)
98
Theo
rem
2.7
Let
f
(
x
)
and
g
(
x
)
b
e
continuously
differentiable
functions.
If
a
is
a
lo
cal
maximizer
o
r
minimizer
of
f
(
x
)
subject
to
the
constraint
g
(
x
)
=
0
,
then
either
1
there
is
some
numb
er
λ
such
that
(
a,
λ
)
satisfies
the
first-order
condition
of
L
or
2
∇
g
(
a
)
=
0
.
W
e
will
call
these
the
p
oints
that
satisfy
these
conditions
critical
p
oints
o
r
stationa
ry
p
oints
of
the
constrained
optimization
p
roblem.
The
value
of
λ
is
called
a
Lagrange
multiplier
.
Rema
rk
Notice
that
this
theo
rem
is
a
necessary
condition
for
a
maximizer
or
minimizer,
not
a
sufficient
one.
2.1.5
Using
the
Lagrangian
W
e
expect
that
the
objective
function
f
(
x
1
,
x
2
)
=
x
2
1
+
x
2
2
has
a
minimizer
on
the
constraint
x
2
1
−
x
2
−
4
=
0
somewhere
in
the
fourth
quadrant.
Find
it.
Solution
Letting
g
(
x
1
,
x
2
)
=
x
2
1
−
x
2
−
4
we
have
the
Lagrangian:
L
(
x
1
,
x
2
,
λ
)
=
x
2
1
+
x
2
2
+
λ
(
x
2
1
−
x
2
−
4)
∇
g
(
x
1
,
x
2
)
=
(2
x
1
,
−
1)
.
This
is
never
0
,
so
any
minimizer
must
satisfy
the
first-order
condition
of
L
.
99
2.1.5
Using
the
Lagrangian
W
e
set
the
partial
derivatives
equal
to
0
and
solve.
∂
L
∂
x
1
=
0
∂
L
∂
x
2
=
0
∂
L
∂
λ
=
0
2
x
1
+
2
x
1
λ
=
0
2
x
2
−
λ
=
0
x
2
1
−
x
2
−
4
=
0
2
x
1
(1
+
λ
)
=
0
if
x
1
=
0
0
2
−
x
2
−
4
=
0
x
2
=
−
4
if
λ
=
−
1
2
x
2
+
1
=
0
x
2
=
−
1
2
x
2
1
+
1
2
−
4
=
0
x
1
=
±
r
7
2
The
only
p
ossible
lo
cal
maximizers
and
minimizer
a
re
(0
,
−
4)
,
q
7
2
,
−
1
2
and
−
q
7
2
,
−
1
2
.
Since
w
e
a
re
lo
oking
for
a
minimizer
in
the
fourth
quadrant,
it
must
b
e
q
7
2
,
−
1
2
.
2.1.6
Proving
the
First-Order
Condition
of
the
Lagrangian
The
pa
rtial
derivatives
of
the
Lagrangian
are
∂
L
∂
x
i
=
f
i
(
x
)
+
λg
i
(
x
)
∂
L
∂
λ
=
g
(
x
)
Any
solution
(
a,
λ
)
to
the
first-order
condition
on
L
satisfies
1
f
i
(
a
)
=
−
λg
i
(
a
)
fo
r
all
i
,
so
∇
f
(
a
)
=
−
λ
∇
g
(
a
)
.
2
g
(
a
)
=
0
,
so
a
lies
on
the
constraint.
The
necessity
of
2
is
easy
to
explain.
If
a
do
es
not
lie
on
the
constraint,
then
it
cannot
be
a
maximizer
o
r
minimizer
subject
to
that
constraint.
In
o
rder
to
understand
1
,
we
need
to
investigate
the
relationship
b
et
ween
gradient
vectors
and
level
sets.
Like
with
multivariable
unconstrained
optimization,
we
use
comp
ositions
over
paths
to
study
maximizers
on
constraints.
We
need
to
alter
our
approach
in
three
imp
ortant
wa
ys.
100
W
e
a
re
only
compa
ring
f
(
a
)
to
f
(
b
)
for
other
b
in
the
level
set
g
(
x
)
=
0
.
Thus
w
e
only
consider
paths
x
(
t
)
that
stay
in
the
level
set.
Unless
the
level
set
is
linear,
w
e
cannot
assume
the
paths
x
(
t
)
are
straight
lines.
They
likely
need
to
curve
to
sta
y
in
the
level
set.
The
tangent
vectors
x
′
(
t
)
generally
cannot
p
oint
in
all
directions.
Most
directions
of
travel
would
cause
the
path
to
leave
the
level
set.
No
w
we
examine
the
comp
osition
f
(
x
(
t
))
where
x
(
t
)
is
a
path
in
the
level
set
g
(
x
)
=
0
.
Click to Load Applet
Figure
2.8:
The
comp
osition
f
(
x
1
(
t
)
,
x
2
(
t
))
fo
r
a
path
(
x
1
(
t
)
,
x
2
(
t
))
in
the
level
set
g
(
x
1
,
x
2
)
=
0
and
its
realization
as
the
heights
of
y
=
f
(
x
1
,
x
2
)
If
a
is
a
lo
cal
maximizer
o
r
minimizer
of
f
(
x
)
among
the
p
oints
on
g
(
x
)
=
0
then
it
must
b
e
a
lo
cal
maximizer
o
r
minimizer
of
the
composition
f
(
x
(
t
))
fo
r
all
paths
x
(
t
)
in
the
level
set
g
(
x
)
=
0
.
This
means
the
t
-value
corresponding
to
a
(we
will
call
it
t
0
)
must
satisfy
the
first-order
condition
of
f
(
x
(
t
))
.
T
o
compute
the
derivative
of
the
comp
osition,
w
e
apply
the
chain
rule.
d
f
dt
(
t
0
)
=
0
(first-o
rder
condition)
∇
f
(
x
(
t
0
))
·
x
′
(
t
0
)
=
0
(chain
rule)
∇
f
(
a
)
·
x
′
(
t
0
)
=
0
In
the
unconstrained
case,
∇
f
(
a
)
·
v
=
0
fo
r
all
v
allow
ed
us
to
conclude
that
∇
f
(
a
)
=
0
.
Our
conclusion
cannot
be
so
restrictive
here.
x
(
t
)
must
b
e
a
path
in
the
level
set.
We
a
re
thus
only
considering
vecto
rs
x
′
(
t
0
)
that
a
re
tangent
to
the
level
set
at
a
.
T
o
make
a
conclusion
ab
out
∇
f
(
a
)
,
we
app
eal
to
the
geometric
properties
of
the
dot
product.
∇
f
(
a
)
and
x
′
(
t
0
)
must
b
e
orthogonal
fo
r
all
paths
in
the
level
set
that
pass
through
a
at
t
0
.
W
e
intro
duce
the
follo
wing
vo
cabulary
to
describe
this
relationship
b
etw
een
∇
f
(
a
)
and
the
level
set.
101
2.1.6
Proving
the
First-Order
Condition
of
the
Lagrangian
Definition
2.8
A
vecto
r
v
is
no
rmal
to
a
set
at
a
,
if
v
o
rthogonal
to
all
tangent
vectors
of
that
set
at
a
.
T
o
avoid
having
to
mention
it
as
a
sepa
rate
case,
w
e
consider
0
to
b
e
o
rthogonal
to
every
vecto
r.
Lemma
2.9
If
a
is
a
lo
cal
maximizer
or
lo
cal
minimizer
of
a
continuously
differentiable
function
f
(
x
)
subject
to
g
(
x
)
=
0
,
then
∇
f
(
a
)
is
normal
to
the
level
set
g
(
x
)
=
0
.
This
gives
us
a
go
o
d
characterization
of
the
gradient
of
the
objective
function
at
a
lo
cal
maximizer
o
r
minimizer.
We
can
no
w
visually
rule
out
a
p
otential
maximizer,
if
its
gradient
do
es
not
p
oint
in
the
right
direction.
We
no
w
need
to
connect
this
to
our
computational
to
ol.
The
Lagrangian
compares
∇
f
to
∇
g
.
What
can
we
say
ab
out
∇
g
?
Again,
we
consider
a
path
x
(
t
)
in
the
level
set
g
(
x
)
=
0
.
This
time
we
study
the
comp
osition
g
(
x
(
t
))
.
If
you
think
carefully
ab
out
what
w
e
have
done,
you
will
notice
that
this
is
a
silly
comp
osition.
We
can
evaluate
this
comp
osition
at
any
value
of
t
,
and
we
will
alwa
ys
get
0
(mak
e
sure
y
ou
se
e
why).
W
e
can
now
apply
the
chain
rule
to
g
(
x
(
t
))
,
the
zero
function.
This
time
the
derivative
is
zero
b
ecause
w
e
are
differentiating
a
constant
function.
dg
dt
(
t
0
)
=
∇
g
(
x
(
t
0
))
·
x
′
(
t
0
)
0
=
∇
g
(
a
)
·
x
′
(
t
0
)
W
e
conclude
that
∇
g
(
x
(
t
))
is
orthogonal
to
x
′
(
t
)
for
all
paths
x
(
t
)
in
the
level
set
g
(
x
)
=
0
.
Rema
rk
This
argument
did
not
rely
on
a
being
any
special
p
oint
on
the
level
set.
Nor
did
it
matter
that
the
level
set
had
value
0
.
Any
level
set
g
(
x
)
=
c
will
have
this
relationship
to
the
gradient,
since
the
composition
will
still
b
e
a
constant
function.
The
result
of
this
argument
is
the
one
of
the
most
imp
ortant
geometric
facts
ab
out
gradient
vectors.
W
e
will
use
it
over
and
over
again.
102
Theo
rem
2.10
If
g
is
a
continuously
differentiable
function,
and
g
(
a
)
=
c
,
then
∇
g
(
a
)
normal
to
the
level
set
g
(
x
)
=
c
.
In
R
2
,
this
means
that
∇
g
(
a
)
points
either
90
◦
degrees
clo
ckwise
or
90
◦
counterclo
ckwise
from
x
′
(
t
0
)
.
Click to Load Applet
Figure
2.9:
Two
p
ossible
∇
g
(
x
(
a
))
orthogonal
to
x
′
(
t
0
)
In
the
case
of
a
level
surface,
there
are
many
paths
through
a
.
In
order
to
b
e
normal
to
the
level
set,
there
a
re
only
t
wo
p
ossible
directions
for
∇
g
(
a
)
.
Click to Load Applet
Figure
2.10:
Two
p
ossible
∇
g
(
a
)
orthogonal
to
all
p
ossible
tangent
vectors
x
′
(
t
0
)
in
the
level
set
g
(
x
)
=
0
.
W
e
can
use
Corolla
ry
2.5
to
generalize
this
to
an
n
-va
riable
function.
The
level
set
is
an
(
n
−
1)
dimensional
graph
in
n
-space
(as
long
as
the
gradient
is
nonzero).
It
has
tw
o
normal
directions,
p
ointing
in
opp
osite
directions.
W
e
can
summarize
what
w
e
have
learned
ab
out
the
gradients
of
f
and
g
as
follows.
103
2.1.6
Proving
the
First-Order
Condition
of
the
Lagrangian
∇
f
(
a
)
is
normal
to
the
level
set
g
(
x
)
=
0
at
a
lo
cal
maximizer
a
.
∇
g
(
x
)
is
normal
to
the
level
set
g
(
x
)
=
0
at
all
p
oints
on
the
level
set.
If
∇
g
(
a
)
=
0
,
then
there
a
re
only
tw
o
normal
directions
to
the
level
set
g
(
x
)
=
0
,
p
ointing
in
opp
osite
directions.
V
ectors
in
the
same
or
opp
osite
directions
are
pa
rallel
.
Algeb
raically
,
they
are
scalar
multiples
of
each
other.
We
conclude
that
at
any
local
maximizer
or
lo
cal
minimizer,
f
(
a
)
=
−
λ
∇
g
(
x
)
for
some
λ
.
This
is
exactly
what
the
first-o
rder
condition
of
the
Lagrangian
requires.
Click to Load Applet
Figure
2.11:
Pa
rallel
vectors
∇
f
(
x
)
and
∇
g
(
x
)
at
a
minimizer
of
f
(
x
)
subject
to
g
(
x
)
=
0
This
line
of
re
asoning
is
complex.
It
is
also
imp
ortant.
We
will
revisit
and
generalize
it
later.
Here
is
a
reminder
of
the
steps
of
our
argument.
a
is
a
lo
cal
maximizer
of
f
(
x
)
on
g
(
x
)
=
0
t
0
is
a
lo
cal
maximizer
of
f
(
x
(
t
))
on
any
path
x
(
t
)
through
a
in
g
(
x
)
=
0
d
dt
f
(
x
(
t
))
=
0
at
t
0
∇
f
(
a
)
·
x
′
(
t
0
)
=
0
∇
f
(
a
)
is
normal
to
g
(
x
)
=
0
∇
g
(
x
)
is
alwa
ys
no
rmal
to
g
(
x
)
=
0
There
a
re
only
tw
o
no
rmal
directions
to
g
(
x
)
=
0
∇
f
(
a
)
and
∇
g
(
a
)
are
parallel
∇
f
(
x
)
=
−
λ
∇
g
(
a
)
for
some
λ
(
λ,
a
)
satisfies
the
FOC
of
the
Lagrangian
FOC
chain
rule
104
2.1.7
The
T
angency
Metho
d
This
a
rgument
also
validates
the
tangency
metho
d
of
lo
cating
a
maximizer.
Theorem
2.10
applies
to
f
as
w
ell
as
to
g
.
∇
f
(
a
)
is
normal
to
the
level
set
f
(
x
)
=
c
where
c
=
f
(
a
)
.
A
t
a
lo
cal
maximizer,
where
∇
f
(
a
)
and
∇
g
(
a
)
a
re
parallel,
it
must
follow
that
the
level
sets
g
(
x
)
=
0
and
f
(
x
)
=
c
have
the
same
no
rmal
directions.
This
implies
that
g
(
x
)
=
0
and
f
(
x
)
=
c
are
tangent
to
each
other.
Co
rollary
2.11
Supp
ose
b
oth
∇
f
(
a
)
and
∇
g
(
a
)
exist
and
a
re
not
0
.
If
a
is
a
maximizer
of
f
(
x
)
subject
to
g
(
x
)
=
0
,
then
the
level
sets
of
f
and
g
that
contain
a
are
tangent
to
each
other
at
a
.
Click to Load Applet
Figure
2.12:
Pa
rallel
∇
f
and
∇
g
and
their
tangent
level
curves
at
a
lo
cal
maximizer
2.1.8
Section
Summa
ry
The
imp
o
rtant
definitions
and
results
from
this
section
were
The
effect
of
strengthening
a
constraint
(Lemma
2.1)
The
definition
of
a
level
set
(Definition
2.4)
The
Lagrangian
(Definition
2.6)
The
first-o
rder
condition
of
t
he
Lagrangian
(Theo
rem
2.7)
The
direction
of
the
gradient
(Theo
rem
2.10)
The
tangency
of
level
curves
(Co
rolla
ry
2.11)
105
2.2
Inequalit
y
Constraints
Goals:
1
Identify
maximizers
subject
to
inequality
constraints.
2
Use
the
Lagrange
multiplier
to
identify
p
oints
that
cannot
b
e
maximizers
or
minimizers.
3
Recognize
several
notations
for
complementary
slackness.
2.2.1
Inequalit
y
Constraints
and
Upp
er
Level
Sets
Definition
2.12
An
inequalit
y
constraint
is
a
constraint
of
the
form
g
(
x
)
≥
0
.
With
such
a
constraint
we
are
solving
max
x
∈
D
f
(
x
)
subject
to
g
(
x
)
≥
0
or
max
x
∈
S
f
(
x
)
where
S
=
{
x
∈
D
:
g
(
x
)
≥
0
}
Rema
rk
Lik
e
with
an
equality
constraint,
just
ab
out
any
non-strict
inequality
can
b
e
rewritten
in
the
fo
rm
g
(
x
)
≥
0
.
Even
an
expression
g
(
x
)
≤
k
is
equivalent
to
k
−
g
(
x
)
≥
0
.
Here
a
re
tw
o
protot
ypical
applications
of
inequality
constraints
in
economics.
Example
If
you
have
a
budget
to
buy
tw
o
types
of
go
o
ds,
but
y
ou
are
not
required
to
sp
end
all
of
it,
then
your
constraint
is
I
−
p
1
x
1
−
p
2
x
2
≥
0
106
Example
If
x
1
denotes
a
physical
quantit
y
the
n
it
is
often
app
ropriate
to
assume
x
1
≥
0
.
Inequalities
lik
e
these
define
the
familiar
shap
e
of
the
budget
set.
I
/p
2
I
/p
1
x
1
x
2
Figure
2.13:
The
region
constrained
by
I
−
p
1
x
1
−
p
2
x
2
≥
0
,
x
1
≥
0
and
x
2
≥
0
The
set
of
p
oints
that
satisfies
an
inequality
constraint
generally
lies
on
one
side
of
the
level
set
g
(
x
)
=
0
.
Graphically
,
it
is
the
subset
of
the
domain
over
which
the
graph
y
=
g
(
x
)
is
ab
ove
the
plane
y
=
0
.
We
therefore
adopt
the
following
terminology
.
Definition
2.13
Given
a
multivariate
function
f
(
x
)
a
upp
er
level
set
of
f
is
the
set
of
p
oints
that,
for
some
numb
er
c
,
satisfy
f
(
x
)
≥
c
.
The
p
oints
that
satisfy
f
(
x
)
≤
c
are
a
lo
w
er
level
set
.
107
2.2.1
Inequalit
y
Constraints
and
Upp
er
Level
Sets
Click to Load Applet
Figure
2.14:
The
upp
er
level
set
where
y
=
f
(
x
)
lies
ab
ove
y
=
c
W
e
can
use
the
chain
rule
to
determine
whether
a
path
travels
into
an
upp
er
level
set
or
low
er
level
set.
Lemma
2.14
Supp
ose
a
lies
in
the
level
set
f
(
x
)
=
c
,
∇
f
(
a
)
=
0
,
and
x
(
t
)
is
a
path
that
passes
through
a
at
t
0
.
If
x
′
(
t
0
)
makes
an
acute
angle
with
∇
f
(
a
)
,
then
immediately
after
t
0
,
x
(
t
)
travels
into
the
upp
er
level
set
f
(
x
)
≥
c
.
If
x
′
(
t
0
)
makes
an
obtuse
angle
with
∇
f
(
a
)
,
then
immediately
after
t
0
,
x
(
t
)
travels
into
the
lo
wer
level
set
f
(
x
)
≤
c
.
Figure
2.15:
The
gradient
vector
,
the
upp
er
level
set
and
the
low
er
level
set
of
f
108
The
gradient
vecto
r
makes
an
acute
angle
with
itself,
so
w
e
obtain
the
following
characterization
immediately
.
Co
rollary
2.15
Supp
ose
f
(
a
)
=
c
.
If
∇
f
(
a
)
=
0
,
then
∇
f
(
a
)
p
oints
into
the
upp
er
level
set
f
(
x
)
≥
c
.
W
e
can
visualize
the
reasoning
b
ehind
Lemma
2.14.
Here
is
the
case
where
the
angle
is
acute.
x
′
(
t
0
)
mak
es
an
acute
angle
with
∇
f
(
a
)
∇
f
(
a
)
·
x
′
(
t
0
)
>
0
d
f
dt
(
t
0
)
>
0
In
some
nbhd
of
t
0
,
f
(
x
(
t
))
>
f
(
x
(
t
0
))
fo
r
t
>
t
0
.
In
some
nbhd
of
t
0
,
x
(
t
)
is
in
the
upp
er
level
set
f
(
x
)
≥
c
for
t
>
t
0
.
chain
rule
Lem
1.2
2.2.2
Binding
and
Nonbinding
Constraints
W
e
already
have
the
to
ols
to
imp
ose
necessa
ry
conditions
on
a
maximizer
subject
to
an
inequality
constraint.
Our
strategy
is
to
consider
tw
o
cases
and
to
apply
the
approp
riate
first-order
conditions
to
each
one.
Supp
ose
a
is
a
lo
cal
maximizer
of
f
(
x
)
subject
to
g
(
x
)
≥
0
.
1
If
g
(
a
)
=
0
we
say
that
the
constraint
is
binding
.
a
x
1
x
2
Figure
2.16:
A
maximizer
a
on
g
(
x
)
=
0
109
2.2.2
Binding
and
Nonbinding
Constraints
If
a
is
a
maximizer
subject
to
g
(
x
)
≥
0
,
then
it
also
maximizes
f
subject
to
g
(
x
)
=
0
.
It
must
satisfy
the
F
OC
of
the
Lagrangian.
2
If
g
(
a
)
>
0
then
the
constraint
is
non-binding
.
x
′
(
t
)
a
x
1
x
2
Figure
2.17:
A
maximizer
a
in
g
(
x
)
>
0
and
some
directions
of
travel
through
a
W
e
can
travel
in
all
directions
x
′
(
t
)
through
a
without
immediately
leaving
the
upp
er
level
set.
T
o
ensure
∇
f
(
a
)
·
x
′
(
t
)
=
0
,
∇
f
(
a
)
must
b
e
the
zero
vector.
A
t
an
interior
p
oint,
the
b
ounda
ry
is
not
relevant
to
local
measurements
like
the
derivative.
The
conditions
of
optimization
are
those
of
an
unconstrained
optimization,
as
if
the
constraint
did
not
exist.
2.2.3
The
Sign
of
λ
at
a
Maximizer
If
the
inequalit
y
constraint
g
(
x
)
=
0
binds
at
lo
cal
maximizer
a
,
then
a
will
satisfy
the
first-order
condition
of
the
Lagrangian
L
(
x,
λ
)
=
f
(
x
)
+
λg
(
x
)
What
can
w
e
say
about
the
λ
that
go
es
with
a
?
Any
path
in
the
direction
of
∇
f
(
a
)
p
oints
to
wa
rd
higher
values
of
f
.
If
∇
f
(
a
)
p
oints
into
the
region
g
(
x
)
≥
0
,
then
a
cannot
b
e
a
lo
cal
maximizer
subject
to
g
(
x
)
≥
0
.
This
can
b
e
tested
by
the
sign
of
λ
.
∇
g
∇
f
a
x
1
x
2
Figure
2.18:
A
non-maximizing
critical
p
oint
a
on
g
(
x
)
=
0
110
The
first-o
rder
condition
of
the
Lagrangian
requires
that
∇
f
(
a
)
=
−
λ
∇
g
(
a
)
.
∇
g
(
a
)
p
oints
into
the
upp
er
level
set
g
(
x
)
≥
0
.
If
λ
is
negative
then
∇
f
(
a
)
p
oints
into
this
set
as
w
ell.
Lemma
2.16
Supp
ose
that
(
a,
λ
)
satisfies
the
first-order
condition
of
the
Lagrangian
of
f
subject
to
g
(
x
)
=
0
.
If
a
is
a
lo
cal
maximizer
of
f
(
x
)
subject
to
g
(
x
)
≥
0
,
then
either
1
λ
≥
0
or
2
∇
g
(
a
)
=
0
Understanding
the
intuition
b
ehind
this
result
is
probably
mo
re
imp
ortant
than
kno
wing
a
formal
p
ro
of.
Still,
here
is
a
p
ro
of.
Like
the
argument
we
have
already
given,
it
is
a
p
ro
of
of
the
contrap
ositive.
Pro
of
If
∇
g
(
a
)
=
0
then
the
lemma
is
satisfied.
W
e
will
consider
the
case
where
∇
g
(
a
)
=
0
and
λ
<
0
.
We
will
sho
w
that
a
is
not
a
lo
cal
maximizer.
Consider
the
line
through
a
in
the
direction
of
∇
g
(
a
)
.
Its
equation
is
x
(
t
)
=
a
+
t
∇
g
(
a
)
and
x
′
(0)
=
∇
g
(
a
)
.
By
Lemma
2.14,
since
∇
g
(
a
)
·
x
′
(0)
=
∇
g
(
a
)
·
∇
g
(
a
)
>
0
,
this
line
travels
into
the
upp
er
level
set
g
(
x
)
≥
0
at
t
=
0
.
What
happ
ens
to
values
of
f
along
this
line?
Consider
the
composition
f
(
a
+
t
∇
g
(
a
))
.
Its
derivative
at
t
=
0
is
d
f
dt
(0)
=
∇
f
(
a
)
·
∇
g
(
a
)
=
(
−
λ
∇
g
(
a
))
·
∇
g
(
a
)
=
−
λ
(
∇
g
(
a
)
·
∇
g
(
a
))
>
0
By
Lemma
1.2,
there
is
a
neighb
orhoo
d
of
0
where
for
t
>
0
,
f
(
x
(
t
))
>
f
(
x
(0))
=
f
(
a
)
.
We
have
seen
that
these
p
oints
lie
in
the
upp
er
level
set,
so
a
cannot
b
e
a
maximizer
of
f
(
x
)
subject
to
g
(
x
)
≥
0
.
Since
a
cannot
b
e
a
maximizer
when
λ
<
0
,
we
conclude
that
if
a
is
a
maximizer,
then
λ
≥
0
.
Rema
rk
Sho
wing
that
a
is
not
a
maximizer
required
us
to
compare
it
to
p
oints
in
the
upper
level
set
g
(
x
)
≥
0
.
This
reasoning
do
es
not
wo
rk
when
maximizing
subject
to
an
equality
constraint.
The
line
w
e
used
does
not
lie
in
g
(
x
)
=
0
.
We
can
thus
p
out
no
restriction
on
λ
when
maximizing
subject
to
g
(
x
)
=
0
.
λ
can
b
e
p
ositive,
negative,
o
r
zero
at
a
maximizer.
111
2.2.4
Complementa
ry
Slackness
Whenever
w
e
have
an
inequality
const
raint,
there
a
re
tw
o
cases
to
check:
it
can
b
e
binding
or
non-
binding.
W
e
could
keep
track
of
this
manually
,
but
there
is
an
easier
wa
y
to
organize
our
search.
We
can
use
the
Lagrangian
L
(
x,
λ
)
=
f
(
x
)
+
λg
(
x
)
to
test
fo
r
b
oth
the
binding
and
non-binding
case.
g
(
x
)
≥
0
binding:
A
maximizer
a
satisfies
the
first-order
condition
of
the
Lagrangian.
This
automatically
checks
g
(
a
)
=
0
.
W
e
can
check
that
λ
≥
0
.
g
(
x
)
≥
0
nonbinding:
W
e
can
set
λ
=
0
No
w
a
maximizer
a
satisfies
the
first-
o
rder
condition
of
L
(
x,
λ
)
=
f
(
x
)
.
W
e
can
check
that
g
(
a
)
≥
0
.
There
is
an
interesting
algebraic
symmetry
b
et
ween
the
role
of
λ
and
g
(
a
)
.
The
following
definition
allo
ws
us
to
combine
both
these
cases
into
one
statement.
Definition
2.17
A
p
oint
satisfies
tw
o
non-strict
inequalities
with
complementa
ry
slackness
,
if
at
least
one
of
them
holds
with
equalit
y
.
W
e
can
cover
b
oth
cases
of
an
inequalit
y
constraint
b
y
demanding
that
g
(
x
)
≥
0
and
λ
≥
0
hold
with
complementa
ry
slackness.
Here
are
a
few
different
wa
ys
to
denote
complementary
slackness.
Notations
fo
r
Complementary
Slackness
1
g
(
x
)
≥
0
and
λ
≥
0
hold
with
complementa
ry
slackness
2
g
(
x
)
≥
0
,
λ
≥
0
and
λg
(
x
)
=
0
3
λ
≥
0
and
g
(
x
)
≥
0
,
=
0
if
λ
>
0
112
2.2.5
Conditions
fo
r
a
Maximizer
Subject
to
an
Inequality
Constraint
With
complementary
s
lackness
in
hand,
w
e
can
write
a
necessary
condition
for
a
maximizer
on
an
inequalit
y
constraint.
This
theorem
is
written
using
notation
2
,
but
we
could
rewrite
it
with
one
of
the
other
notations.
Theo
rem
2.18
Let
f
(
x
)
and
g
(
x
)
b
e
continuously
differentiable
functions.
If
a
is
a
lo
cal
maximizer
of
f
(
x
)
subject
to
the
constraint
g
(
x
)
≥
0
,
then
one
of
the
following
must
b
e
true
1
there
is
some
numb
er
λ
such
that
(
a,
λ
)
satisfies
∂
L
∂
x
i
(
a,
λ
)
=
0
for
all
i
g
(
a
)
≥
0
,
λ
≥
0
,
and
λg
(
a
)
=
0
.
2
g
(
a
)
=
0
and
∇
g
(
a
)
=
0
.
2.2.6
Opimization
Subject
to
an
Inequalit
y
Constraint
Find
the
maximizer
of
f
(
x
1
,
x
2
)
=
10
−
x
2
1
−
x
2
2
subject
to
3
x
1
+
4
x
2
≥
25
.
Solution
First
we
write
our
constraint
in
the
correct
form.
g
(
x
1
,
x
2
)
=
3
x
1
+
4
x
2
−
25
≥
0
.
Note
that
∇
g
(
x
1
,
x
2
)
is
never
0
,
so
any
critical
p
oint
must
satisfy
our
complementary
slackness
conditions.
The
Lagrangian
is
L
(
x
1
,
x
2
,
λ
)
=
10
−
x
2
1
+
x
2
2
+
λ
(3
x
1
+
4
x
2
−
25)
The
conditions
a
re
∂
L
∂
x
1
=
0
∂
L
∂
x
2
=
0
λ
≥
0
3
x
1
+
4
x
2
−
25
≥
0
λ
(3
x
1
+
4
x
2
−
25)
=
0
113
2.2.6
Opimization
Subject
to
an
Inequalit
y
Constraint
First
consider
the
λ
=
0
case.
The
constraint
is
nonbinding.
W
e
solve
the
equations
then
check
the
inequalit
y
.
∂
L
∂
x
1
=
0
∂
L
∂
x
2
=
0
λ
=
0
−
2
x
1
+
3
λ
=
0
−
2
x
2
+
4
λ
=
0
−
2
x
1
=
0
−
2
x
2
=
0
x
1
=
0
x
2
=
0
check
3(0)
+
4(0)
−
25
≥
0
The
only
critical
point
w
e
found
failed
the
check.
There
is
no
maximizer
where
the
constraint
is
nonbinding.
Now
consider
the
g
(
x
)
=
0
case.
∂
L
∂
x
1
=
0
∂
L
∂
x
2
=
0
g
(
x
)
=
0
−
2
x
1
+
3
λ
=
0
−
2
x
2
+
4
λ
=
0
3
x
1
+
4
x
2
−
25
=
0
x
1
=
3
2
λ
x
2
=
2
λ
9
2
λ
+
8
λ
−
25
=
0
25
2
λ
=
25
λ
=
2
x
1
=
3
2
(2)
x
2
=
2(2)
x
1
=
3
x
2
=
4
check
λ
≥
0
2
≥
0
The
p
oint
(3
,
4
,
2)
passes
all
of
these
conditions.
We
conclude
that
(3
,
4)
is
the
only
p
ossible
maximizer.
Since
this
is
not
a
sufficient
condition,
we
cannot
conclude
(3
,
4)
is
a
maximizer.
There
may
be
no
maximizer
at
all.
Main
Idea
T
o
solve
fo
r
the
maximizer
over
an
inequality
constraint
1
Pick
a
case
from
complementary
slackness
2
Set
up
and
solve
the
system
of
equations
3
Check
whether
the
solutions
satisfy
the
inequalities
4
Rep
eat
for
the
other
case
from
complementary
slackness
114
2.2.7
Section
Summa
ry
The
imp
o
rtant
definitions
and
results
from
this
section
were
The
notation
of
an
inequalit
y
constraint
(Definition
2.12)
The
definition
of
an
upp
er
o
r
lo
wer
level
set
(Definition
2.13)
The
sign
of
λ
at
a
maximizer
subject
to
an
inequality
constraint
(Lemma
2.16)
Complementa
ry
slackness
(Definition
2.17)
Conditions
fo
r
a
maximizer
on
an
inequality
constraint
(Theorem
2.18)
115
2.3
The
Kuhn-T
uck
er
Conditions
Goals:
1
Solve
for
maximizers
on
multiple
constraints
using
the
Kuhn-T
ucker
conditions
2
Recognize
what
each
case
of
the
Kuhn-T
ucker
conditions
is
checking
2.3.1
Multiple
Equalit
y
Constraints
In
economics,
it
is
easy
to
imagine
a
firm
is
constrained
by
more
than
one
equation:
A
budget
fo
r
capital
and
labor
Availabilit
y
of
lab
or
o
r
other
inputs
A
government
regulation
Y
ou
can’t
purchase
o
r
produce
a
negative
amount
What
if
w
e
w
ant
to
maximize
an
n
-va
riable
function
f
(
x
)
subject
to
more
than
one
equalit
y
con-
straint?
g
1
(
x
)
=
g
2
(
x
)
=
·
·
·
=
g
m
(
x
)
=
0
The
feasible
set,
is
the
intersection
of
the
level
sets
g
j
(
x
)
=
0
.
We
have
seen
that
a
single
level
set
is
usually
one
dimension
less
than
the
space
it
lies
in.
The
intersection
of
m
level
sets
will
usually
b
e
an
(
n
−
m
)
-dimensional
shap
e.
Notation
The
g
i
a
re
different
functions,
not
partial
derivatives
of
g
.
Whenever
we
have
multiple
g
’s
a
round,
we
will
need
to
write
our
pa
rtial
derivatives
using
∂
notation,
for
instance
∂
∂
x
2
g
3
(
x
)
not
g
3
2
(
x
)
116
Click to Load Applet
Figure
2.19:
Two
level
sets
intersecting
at
a
pair
of
(one-dimensional)
curves
Click to Load Applet
Figure
2.20:
Three
level
sets
intersecting
at
a
pair
of
(zero-dimensional)
p
oints
Lik
e
with
a
single
constraint,
there
are
exceptions.
Also
like
a
single
constraint,
these
exceptions
rely
on
the
inadequacy
of
the
gradient
vecto
r(s).
The
following
definition
come
from
linear
algebra.
A
co
rollary
of
the
multiva
riable
implicit
function
theorem
characterizes
the
intersection
of
the
level
sets.
Definition
2.19
V
ectors
v
1
,
v
2
,
.
.
.
v
m
a
re
linearly
dependent
if
there
a
re
constants
k
j
,
not
all
0
,
such
that
m
X
j
=1
k
j
v
j
=
0
If
no
such
k
j
exist
then
they
a
re
linea
rly
indep
endent
117
2.3.1
Multiple
Equality
Constraints
Equivalently
,
vecto
rs
are
linea
rly
dep
endent,
if
one
of
them
is
a
linear
combination
of
the
others.
Co
rollary
2.20
Let
g
1
(
x
)
,
g
2
(
x
)
,
.
.
.
g
m
(
x
)
b
e
a
continuously
differentiable
functions
at
a
.
If
a
lies
on
the
level
sets
g
j
(
x
)
=
c
j
and
the
∇
g
j
(
a
)
a
re
linearly
indep
endent,
then
the
intersection
of
the
level
sets
g
j
(
x
)
=
c
j
is
a
(
n
−
m
)
-dimensional
shap
e
in
some
neighb
o
rho
o
d
of
a
.
Sp
ecifically
,
it
is
the
graph
of
m
differentiable
functions
on
R
n
−
m
.
Rema
rk
Any
set
containing
the
zero
vector
is
linearly
dependent.
Thus
this
co
rollary
agrees
with
our
criterion
fo
r
a
single
constraint.
If
the
vector
∇
g
(
a
)
=
0
,
then
it
is
linea
rly
dep
endent.
Tw
o
vecto
rs
are
linearly
dep
endent
if
they
are
parallel.
With
pa
rallel
gradients,
the
intersection
of
m
level
sets
can
have
few
er
than
(
n
−
m
)
dimensions.
Click to Load Applet
Figure
2.21:
Two
level
sets
with
parallel
gradients
intersecting
at
a
p
oint
P
arallel
gradients
can
also
occur
when
the
level
sets
overlap,
in
this
case
the
intersection
can
have
mo
re
than
(
n
−
m
)
dimensions.
Click to Load Applet
Figure
2.22:
Two
(overlapping)
level
sets
whose
intersection
is
a
plane
118
Three
vectors
are
linea
rly
dep
endent
if
they
are
coplanar.
Coplanar
gradient
vecto
rs
can
also
lead
to
intersections
of
unexp
ected
dimension.
Click to Load Applet
Figure
2.23:
Three
level
sets
with
coplanar
gradients
intersecting
on
a
line
Main
Idea
Each
additional
equality
constraint
reduces
the
dimension
of
the
feasible
set
b
y
1
,
unless
its
gradient
vecto
r
is
a
linear
combination
of
the
existing
gradient
vecto
rs.
2.3.2
The
Lagrangian
fo
r
Multiple
Equality
Constraints
F
or
multiple
constraints,
we
write
a
Lagrangian
that
uses
all
of
the
constraints.
We
need
a
different
va
riable
λ
for
each
one.
The
first-order
condition
of
this
Lagrangian
is
a
necessary
condition
for
a
lo
cal
maximizer
o
r
lo
cal
minimizer.
Definition
2.21
Given
the
p
roblem
of
maximizing
an
n
-variable
objective
function
f
(
x
)
subject
to
constraints
g
j
(
x
)
=
0
fo
r
1
≤
j
≤
m
,
the
Lagrangian
is
a
function
of
the
n
comp
onents
of
x
and
m
va
riables
λ
=
(
λ
1
,
.
.
.
,
λ
m
)
.
L
(
x,
λ
)
=
f
(
x
)
+
m
X
j
=1
λ
j
g
j
(
x
)
119
2.3.2
The
Lagrangian
for
Multiple
Equality
Constraints
Theo
rem
2.22
If
a
is
a
lo
cal
maximizer
or
lo
cal
minimizer
of
f
(
x
)
subject
to
constraints
g
j
(
x
)
=
0
,
all
continuously
differentiable
at
a
,
then
either
1
(
a,
λ
)
satisfies
the
first-o
rder
condition
of
L
fo
r
some
λ
o
r
2
The
∇
g
j
(
a
)
a
re
linearly
dep
endent
Notation
When
developing
the
theory
,
we
write
our
constraint
functions
and
our
Lagrange
multipliers
using
a
common
letter
and
an
index
va
riable
g
1
(
x
)
,
g
2
(
x
)
,
.
.
.
λ
1
,
λ
2
,
.
.
.
This
makes
it
easier
to
generalize
to
any
numb
er
of
constraints,
because
can
use
Σ
notation
in
the
Lagrangian.
F
or
a
small
number
of
constraints,
using
an
index
is
inconvenient.
Economists
often
use
different
letters
fo
r
each
constraint
and
multiplier.
g
(
x
)
,
h
(
x
)
,
.
.
.
λ,
µ,
.
.
.
2.3.3
A
Multi-Constraint
Optimization
Find
the
maximizer
of
f
(
x
1
,
x
2
,
x
3
)
=
3
x
2
on
the
constraints
x
2
1
+
x
2
3
−
50
=
0
and
x
1
+
x
2
+
x
3
=
0
.
Solution
The
Lagrangian
is
L
(
x
1
,
x
2
,
x
3
,
λ
1
,
λ
2
)
=
3
x
2
+
λ
1
(
x
2
1
+
x
2
3
−
50)
+
λ
2
(
x
1
+
x
2
+
x
3
)
The
first-o
rder
condition
is
120
∂
L
∂
x
1
=
0
∂
L
∂
x
2
=
0
∂
L
∂
x
3
=
0
∂
L
∂
λ
1
=
0
∂
L
∂
λ
2
=
0
2
λ
1
x
1
+
λ
2
=
0
3
+
λ
2
=
0
2
λ
1
x
3
+
λ
2
=
0
x
2
1
+
x
2
3
−
50
=
0
x
1
+
x
2
+
x
3
=
0
λ
2
=
−
3
x
1
=
3
2
λ
1
x
3
=
3
2
λ
1
x
1
=
x
3
x
2
1
+
x
2
1
−
50
=
0
x
1
=
±
5
±
5
=
x
3
±
5
+
x
2
±
5
=
0
x
2
=
∓
10
W
e
conclude
that
(
±
5
,
∓
10
,
±
5)
.
The
value
f
(
x
1
,
x
2
,
x
3
)
=
3
x
2
is
larger
when
x
2
=
10
.
W
e
conclude
that
no
p
oint
b
esides
(
−
5
,
10
,
5)
can
b
e
the
maximizer
of
f
on
these
constraints.
Because
this
is
not
a
sufficient
condition,
w
e
cannot
b
e
sure
that
this
is
the
maximizer.
It
may
b
e
that
no
maximizer
exists
at
all.
W
e
can
visualize
the
maximizer
of
f
(
x
1
,
x
2
,
x
3
)
=
3
x
2
b
y
lo
oking
for
the
p
oint
in
the
level
sets
that
is
fa
rthest
in
the
x
2
direction.
It
app
ears
that
such
a
p
oint
exists
at
(
−
5
,
10
,
−
5)
.
Click to Load Applet
Figure
2.24:
The
gradients
of
x
2
1
+
x
2
3
−
50
,
x
1
+
x
2
+
x
3
=
0
,
and
f
(
x
)
=
3
x
2
at
the
maximizer
121
2.3.4
The
Reasoning
fo
r
the
Multi-Constraint
Lagrangian
The
pa
rtial
derivatives
of
the
Lagrangian
are
L
x
i
=
∂
f
∂
x
i
(
x
)
+
m
X
j
=1
λ
j
∂
g
j
∂
x
i
(
x
)
L
λ
j
=
g
j
(
x
)
Any
solution
(
a,
λ
)
to
the
first-o
rder
condition
on
L
satisfies
f
x
i
(
a
)
=
−
m
X
j
=1
λ
j
∂
g
j
∂
x
i
(
a
)
fo
r
all
i
,
so
∇
f
(
a
)
=
−
P
λ
j
∇
g
j
(
a
)
.
g
j
(
a
)
=
0
fo
r
all
j
so
a
lies
on
all
the
constraints.
W
e
can
examine
the
relationship
b
etw
een
the
gradients
using
the
same
procedure
we
used
fo
r
a
single
constraint.
We
take
a
path
x
(
t
)
through
a
at
t
0
.
We
assume
that
x
(
t
)
lies
in
the
feasible
set.
This
means
it
lies
in
all
of
the
level
sets
g
j
(
x
)
=
0
.
If
a
is
a
lo
cal
maximizer
or
minimizer
of
f
(
x
)
on
the
feasible
set
then
it
must
b
e
a
lo
cal
maximizer
o
r
minimizer
of
the
composition
f
(
x
(
t
))
for
all
paths
x
(
t
)
in
the
feasible
set.
This
means
the
t
-value
co
rresp
onding
to
a
(w
e
will
call
it
t
0
)
must
satisfy
the
first-order
condition
of
f
(
x
(
t
))
.
The
derivative
calculation
should
b
e
familia
r:
0
=
d
f
dt
(
t
0
)
0
=
∇
f
(
x
(
t
0
))
·
x
′
(
t
0
)
0
=
∇
f
(
a
)
·
x
′
(
t
0
)
Thus
∇
f
(
a
)
is
normal
to
the
feasible
set.
When
we
had
a
single
constraint,
w
e
argued
that
there
were
t
w
o
opp
osite
no
rmal
directions
to
the
feasible
set,
and
∇
g
(
a
)
p
ointed
in
one
of
them.
This
relied
on
the
fact
that
our
feasible
set
was
a
level
set
and
had
dimension
(
n
−
1)
.
If
w
e
assume
that
the
∇
g
j
(
a
)
are
linearly
indep
endent,
then
the
feasible
set
has
dimension
(
n
−
m
)
at
a
.
There
is
an
entire
m
-dimensional
space
of
vectors
normal
to
the
feasible
set
at
a
.
If
a
is
a
maximizer,
∇
f
(
a
)
must
b
e
a
vector
in
this
space.
122
Click to Load Applet
Figure
2.25:
The
gradient
vecto
r
of
the
objective
function
and
the
normal
plane
of
the
feasible
set
at
a
lo
cal
maximizer
of
the
comp
osition:
f
(
x
(
t
))
The
feasible
set
lies
in
each
level
set
g
j
(
x
)
=
0
.
The
gradient
of
each
g
j
is
no
rmal
its
level
set.
W
e
can
put
together
what
w
e
kno
w
ab
out
the
gradients
and
the
shape
of
the
feasible
set.
f
(
a
)
is
normal
to
the
feasible
set
at
a
lo
cal
maximizer
a
.
Each
∇
g
j
(
x
)
is
normal
to
the
feasible
set
at
all
p
oints
in
the
level
set.
If
the
∇
g
j
(
a
)
a
re
linearly
indep
endent,
then
the
normal
space
is
m
-dimensional.
m
indep
endent
vectors
in
a
m
-dimensional
space
must
span
the
space.
W
e
conclude
that
∇
f
(
a
)
is
a
linea
r
combination
of
the
∇
g
j
(
a
)
.
This
is
what
the
first-order
condition
of
the
Lagrangian
requires.
Example
If
a
is
a
maximizer
of
a
three-variable
function
subject
to
t
wo
equality
constraints,
then
∇
f
(
a
)
must
lie
in
the
no
rmal
plane
,
which
is
spanned
by
∇
g
1
(
a
)
and
∇
g
2
(
a
)
.
123
2.3.4
The
Reasoning
for
the
Multi-Constraint
Lagrangian
Click to Load Applet
Figure
2.26:
Two
gradients
spanning
the
normal
plane
of
the
feasible
set
2.3.5
An
Optimization
with
Dep
endent
Gradients
When
the
gradients
of
the
g
j
(
x
)
are
linearly
dep
endent,
the
feasible
set
may
not
have
the
exp
ected
dimension
of
(
n
−
m
)
.
Even
if
it
do
es,
the
gradient
vecto
rs
need
not
span
the
no
rmal
space.
In
this
case,
calculus
arguments
cannot
rule
out
a
p
oint
from
b
eing
a
maximizer.
Here
is
an
example
where
insisting
up
on
the
first-o
rder
condition
of
the
Lagrangian
would
incorrectly
rule
out
a
maximizer.
Consider
the
smo
oth
function
f
(
x
1
,
x
2
)
=
x
2
1
+
x
2
2
and
the
constraints
x
2
+
2
=
0
and
x
2
−
(
x
1
−
3)
2
+
2
=
0
.
If
we
attempt
to
apply
the
first-order
condition
we
obtain
the
following.
The
constraints
intersect
only
at
(3
,
−
2)
.
A
t
(3
,
−
2)
the
gradients
are
∇
g
1
(3
,
−
2)
=
(0
,
1)
∇
g
2
(3
,
−
2)
=
(0
,
1)
∇
f
(3
,
−
2)
=
(6
,
−
4)
.
∇
f
cannot
b
e
written
as
a
linear
combination
of
∇
g
1
and
∇
g
2
Even
though
it
do
esn’t
satisfy
the
first-order
condition,
(3
,
−
2)
must
b
e
the
maximizer
(and
mini-
mizer),
since
it
is
the
only
p
oint
that
satisfies
b
oth
constraints.
124
Figure
2.27:
(3
,
2)
is
a
maximizer,
but
∇
f
cannot
b
e
written
λ
1
∇
g
1
+
λ
2
∇
g
2
.
If
we
count
dimensions,
the
feasible
set
is
a
0
-dimensional
subspace
of
R
n
.
Its
no
rmal
space
is
2
−
0
=
2
-dimensional,
which
suggests
that
∇
f
(3
,
−
2)
can
b
e
any
vector
in
R
2
.
This
sounds
biza
rre
but
actually
mak
es
sense.
(3
,
−
2)
is
the
entire
feasible
set.
It
must
b
e
a
maximizer,
no
matter
what
∇
f
(3
,
−
2)
is.
The
exception
for
dep
endent
gradients
in
Theorem
2.22
exists
to
avoid
ruling
out
a
maximizer
in
a
situation
lik
e
this.
2.3.6
The
Kuhn-T
uck
er
Conditions
The
Kuhn-T
uck
er
conditions
are
a
robust
set
of
necessa
ry
conditions
for
constrained
optimizations
with
inequality
constraints,
p
otentially
in
addition
to
some
number
of
equality
constraints.
They
combine
all
of
the
ideas
w
e
have
develop
ed
in
this
chapter.
We
will
use
the
same
Lagrangian
that
we
w
ould
fo
r
equalit
y
constraints.
A
Lagrangian
F
or
Multiple
Equalit
y
and
Inequality
Constraints
Given
an
objective
function
f
(
x
)
and
constraints
of
the
fo
rms
g
j
(
x
)
≥
0
or
g
j
(
x
)
=
0
the
Lagrangian
is
L
(
x,
λ
)
=
f
(
x
)
+
m
X
j
=1
λ
j
g
j
(
x
)
.
Lik
e
with
a
single
inequality
constraint,
complementa
ry
slackness
will
remove
the
λ
j
g
j
(
x
)
when
g
j
(
x
)
do
es
not
bind.
125
2.3.6
The
Kuhn-T
ucker
Conditions
Theo
rem
2.23
[The
Kuhn-T
ucker
Conditions]
Given
the
objective
function
f
(
x
)
and
constraints
of
the
fo
rms
g
j
(
x
)
≥
0
o
r
g
j
(
x
)
=
0
,
then
at
any
lo
cal
maximizer
a
one
of
the
following
must
b
e
true
1
There
is
some
vector
λ
such
that
(
a,
λ
)
satisfies
the
Kuhn-T
uck
er
conditions
:
F
or
each
va
riable
x
i
,
L
x
i
(
a,
λ
)
=
0
F
or
each
equalit
y
constraint
function
g
j
,
g
j
(
a
)
=
0
F
or
each
inequalit
y
constraint
function
g
j
,
g
j
(
a
)
≥
0
and
λ
j
≥
0
and
λ
j
g
j
(
a
)
=
0
2
The
binding
∇
g
j
(
a
)
a
re
linearly
dep
endent
2.3.7
Solving
the
Kuhn-T
uck
er
Conditions
Acco
rding
to
Kuhn-T
ucker,
what
p
oints
(
x
1
,
x
2
)
could
b
e
maximizers
of
f
(
x
1
,
x
2
)
=
x
1
x
2
2
given
the
constraints
12
−
x
1
−
4
x
2
≥
0
9
−
x
1
−
x
2
≥
0
Solution
The
Lagrangian
is
L
(
x
1
,
x
2
,
λ
1
,
λ
2
)
=
x
1
x
2
2
+
λ
1
(12
−
x
1
−
4
x
2
)
+
λ
2
(9
−
x
1
−
x
2
)
.
The
Kuhn-T
uck
er
conditions
a
re
∂
L
∂
x
1
=
x
2
2
−
λ
1
−
λ
2
=
0
∂
L
∂
x
2
=
2
x
1
x
2
−
4
λ
1
−
λ
2
=
0
12
−
x
1
−
4
x
2
≥
0
9
−
x
1
−
x
2
≥
0
λ
1
≥
0
λ
2
≥
0
λ
1
(12
−
x
1
−
4
x
2
)
=
0
λ
2
(9
−
x
1
−
x
2
)
=
0
The
final
equations
carry
our
complementa
ry
slackness
conditions.
There
are
t
wo
w
ays
to
cho
ose
which
facto
r
is
0
for
each,
giving
us
2
×
2
=
4
cases
to
check.
126
1
λ
1
=
0
,
λ
2
=
0
x
2
2
−
λ
1
−
λ
2
=
0
2
x
1
x
2
−
4
λ
1
−
λ
2
=
0
x
2
2
=
0
2
x
1
x
2
=
0
x
2
=
0
check
12
−
x
1
−
4
x
2
≥
0
6
−
x
1
−
x
2
≥
0
12
−
x
1
≥
0
6
−
x
1
≥
0
x
1
≤
12
x
1
≤
9
So
(
k
,
0
,
0
,
0)
satisfies
the
Kuhn-T
ucker
conditions
for
k
≤
9
.
2
λ
1
=
0
,
9
−
x
1
−
x
2
=
0
x
2
2
−
λ
1
−
λ
2
=
0
2
x
1
x
2
−
4
λ
1
−
λ
2
=
0
9
−
x
1
−
x
2
=
0
x
2
2
−
λ
2
=
0
2
x
1
x
2
−
λ
2
=
0
x
2
2
=
2
x
1
x
2
x
2
2
−
2
x
1
x
2
=
0
x
2
(
x
2
−
2
x
1
)
=
0
x
2
=
0
covered
in
case
1
o
r
x
2
=
2
x
1
9
−
x
1
−
2
x
1
=
0
x
1
=
3
x
2
=
6
6
2
−
λ
2
=
0
λ
2
=
36
check
12
−
x
1
−
4
x
2
≥
0
λ
2
≥
0
12
−
3
−
4(6)
≥
0
36
≥
0
−
15
≥
0
(3
,
6
,
0
,
36)
do
es
not
satisfy
the
inequalities.
127
2.3.7
Solving
the
Kuhn-T
uck
er
Conditions
3
12
−
x
1
−
4
x
2
=
0
,
λ
2
=
0
x
2
2
−
λ
1
−
λ
2
=
0
2
x
1
x
2
−
4
λ
1
−
λ
2
=
0
12
−
x
1
−
4
x
2
=
0
x
2
2
−
λ
1
=
0
2
x
1
x
2
−
4
λ
1
=
0
4
x
2
2
=
4
λ
1
2
x
1
x
2
=
4
λ
1
4
x
2
2
=
2
x
1
x
2
4
x
2
2
−
2
x
1
x
2
=
0
2
x
2
(2
x
2
−
x
1
)
=
0
x
2
=
0
covered
in
case
1
o
r
2
x
2
=
x
1
12
−
2
x
2
−
4
x
2
=
0
x
2
=
2
x
1
=
4
2
2
−
λ
1
=
0
λ
2
=
4
check
9
−
x
1
−
x
2
≥
0
λ
1
≥
0
9
−
4
−
2
≥
0
4
≥
0
(4
,
2
,
4
,
0)
satisfies
the
Kuhn-T
ucker
conditions
4
12
−
x
1
−
4
x
2
=
0
,
9
−
x
1
−
x
2
=
0
x
2
2
−
λ
1
−
λ
2
=
0
2
x
1
x
2
−
4
λ
1
−
λ
2
=
0
12
−
x
1
−
4
x
2
=
0
9
−
x
1
−
x
2
=
0
12
−
x
1
−
4
x
2
=
0
−
(9
−
x
1
−
x
2
)
=
−
0
3
−
3
x
2
=
0
x
2
=
1
9
−
x
1
−
1
=
0
x
1
=
8
1
−
λ
1
−
λ
2
=
0
16
−
4
λ
1
−
λ
2
=
0
−
(16
−
4
λ
1
−
λ
2
)
=
−
0
−
15
+
3
λ
1
=
0
λ
1
=
5
16
−
4(5)
−
λ
2
=
0
λ
2
=
−
4
check
λ
1
≥
0
λ
2
≥
0
5
≥
0
−
4
≥
0
(8
,
1
,
5
,
−
4)
do
es
not
satisfy
the
Kuhn-T
ucker
conditions.
The
Kuhn-T
ucker
conditions
are
necessa
ry
.
No
p
oint
except
(4
,
2)
o
r
(
k
,
0)
fo
r
k
≤
9
can
b
e
a
maximizer.
128
By
evaluating
w
e
see
f
(4
,
2)
=
16
f
(
k
,
0)
=
0
So
only
(4
,
2)
could
b
e
a
maximizer.
We
cannot
conclude
that
it
is
a
maximizer,
until
we
learn
a
relevant
sufficient
condition.
Strategy
In
general
our
metho
d
fo
r
s
olving
the
Kuhn
T
uck
er
conditions
is
Pick
an
equalit
y
from
a
facto
r
of
each
complementary
slackness
condition.
Solve
the
resulting
system
of
equations.
Disca
rd
any
solutions
that
violate
the
remaining
inequalities.
Rep
eat
fo
r
a
different
choice
of
equalities.
With
m
inequalit
y
c
onstraints,
w
e
will
need
to
rep
eat
for
all
2
m
combinations
of
equalities.
2.3.8
Visualizing
the
Kuhn-T
uck
er
Conditions
Each
choice
of
equalities
mutates
the
Kuhn-T
ucker
conditions
into
the
necessary
conditions
from
Theo
rem
2.22,
where
the
binding
g
j
(
x
)
≥
0
are
treated
as
equalit
y
constraints.
The
following
diagram
sho
ws
ho
w
each
choice
co
rresp
onds
to
a
different
piece
of
the
feasible
set,
where
different
constraints
bind.
∇
g
1
∇
g
2
x
1
x
2
g
1
(
x
)
≥
0
and
λ
1
=
0
g
2
(
x
)
=
0
and
λ
2
≥
0
L
(
x,
λ
)
=
f
(
x
)
+
λ
2
g
2
(
x
)
∇
f
(
x
)
=
−
λ
2
∇
g
2
(
x
)
g
1
(
x
)
≥
0
and
λ
1
=
0
g
2
(
x
)
≥
0
and
λ
2
=
0
L
(
x,
λ
)
=
f
(
x
)
∇
f
(
x
)
=
0
g
1
(
x
)
=
0
and
λ
1
≥
0
g
2
(
x
)
=
0
and
λ
2
≥
0
L
(
x,
λ
)
=
f
(
x
)
+
λ
1
g
1
(
x
)
+
λ
2
g
2
(
x
)
∇
f
(
x
)
=
−
λ
1
∇
g
1
(
x
)
−
λ
2
∇
g
2
(
x
)
g
1
(
x
)
=
0
and
λ
1
≥
0
g
2
(
x
)
≥
0
and
λ
2
=
0
L
(
x,
λ
)
=
f
(
x
)
+
λ
1
g
1
(
x
)
∇
f
(
x
)
=
−
λ
1
∇
g
1
(
x
)
Figure
2.28:
The
four
regions
covered
by
tw
o
pairs
of
inequalities
with
complementary
slackness
129
2.3.8
Visualizing
the
Kuhn-T
ucker
Conditions
In
the
case
of
multiple
binding
inequality
constraints,
the
condition
that
λ
j
≥
0
forces
∇
f
(
a
)
to
lie
in
the
cone
made
b
y
−∇
g
j
(
x
)
.
∇
g
1
∇
g
2
−∇
g
1
−∇
g
2
∇
f
x
1
x
2
Figure
2.29:
The
cone
where
∇
f
must
lie
if
b
oth
constraints
bind
at
a
maximizer
2.3.9
Proving
the
Necessit
y
of
the
Kuhn-T
ucker
Conditions
T
o
justify
the
necessity
of
the
Kuhn-T
ucker
conditions,
we
can
list
each
condition
and
describ
e
why
it
must
apply
.
1
A
given
a
will
satisfy
some
binding
equations
g
j
(
x
)
=
0
.
2
Fo
r
the
constraints
that
do
not
bind,
setting
λ
j
=
0
turns
L
(
x,
λ
)
into
the
Lagrangian
for
f
with
only
the
binding
equality
constraints.
If
a
is
a
lo
cal
maximizer,
it
must
satisfy
Theorem
2.22,
sp
ecifically
,
∂
L
∂
x
i
(
a
)
=
0
.
3
Fo
r
each
inequality
constraint,
a
maximizer
must
satisfy
g
j
(
a
)
≥
0
to
b
e
feasible.
4
Finally
,
if
every
path
from
a
into
the
feasible
region
decreases
f
,
then
all
λ
j
≥
0
.
W
e
have
not
given
a
convincing
argument
for
the
last
statement
y
et.
A
formal
proof
is
just
b
elo
w,
but
it
ma
y
b
e
more
illuminating
to
convince
yourself
graphically
.
T
ry
drawing
a
few
∇
f
that
lie
outside
the
cone
b
etw
een
−∇
g
1
and
−∇
g
2
in
the
previous
figure.
Fo
r
each
one,
y
ou
should
b
e
able
to
identify
a
vecto
r
x
′
(
t
)
that
p
oints
into
the
feasible
region
but
mak
es
an
acute
angle
with
∇
f
.
Here
is
a
fo
rmal
p
ro
of
that
λ
j
≥
0
at
a
maximizer.
Pro
of
Pick
any
inequality
constraint
g
k
(
x
)
≥
0
.
If
g
k
(
x
)
≥
0
is
not
binding
at
a
,
then
λ
k
=
0
and
we
are
done.
130
W
e
will
therefo
re
consider
the
case
where
g
k
(
a
)
≥
0
is
binding
and
show
λ
≥
0
.
Let
S
b
e
the
inter-
section
of
all
the
binding
constraints
except
g
k
(
x
)
=
0
.
Since
the
gradients
of
the
binding
constraints
a
linea
rly
indep
endent,
we
can
conclude:
The
gradients
of
the
binding
constraints
other
than
∇
g
k
(
a
)
span
the
no
rmal
space
of
S
at
a
∇
g
k
(
a
)
is
not
a
linear
combination
of
these
gradients,
so
it
do
es
not
lie
in
the
no
rmal
space
of
S
at
a
There
must
be
a
path
x
(
t
)
in
S
through
a
such
that
x
′
(
t
0
)
is
not
orthogonal
to
∇
g
k
(
a
)
.
We
can
pick
such
a
path
so
that
∇
g
k
(
a
)
·
x
′
(
t
0
)
>
0
.
If
the
first
path
we
try
produces
a
negative
dot
product,
just
traverse
x
(
t
)
backw
ards
instead.
First
we
sho
w
that,
in
some
neighb
o
rho
o
d
of
t
0
,
x
(
t
)
lies
in
the
feasible
region
fo
r
all
t
>
t
0
.
We
check
that
it
satisfies
each
constraint.
a
lies
in
the
interior
of
each
upper
level
set
g
j
(
x
)
≥
0
for
each
nonbinding
g
j
.
x
(
t
)
will
travel
for
some
distance
b
efo
re
leaving
the
upper
level
set.
Since
x
(
t
)
was
chosen
to
lie
in
S
it
lies
in
the
level
set
g
j
(
x
)
=
0
for
all
binding
g
j
except
g
k
.
Since
∇
g
k
(
a
)
·
x
′
(
t
0
)
>
0
,
x
(
t
)
must
travel
into
the
upp
er
level
set
g
k
(
x
)
≥
0
.
Thus
in
some
neighborhoo
d
of
t
0
,
x
(
t
)
lies
in
the
feasible
set
for
t
>
t
0
.
Since
a
is
a
lo
cal
maximizer,
f
(
x
(
t
0
))
≥
f
(
x
(
t
))
for
t
>
t
0
in
this
neighb
orhoo
d.
Thus
d
f
dt
(
t
0
)
≤
0
.
We
use
the
chain
rule
to
examine
this
inequalit
y
.
d
f
dt
(
t
0
)
≤
0
∇
f
(
a
)
·
x
′
(
t
0
)
≤
0
−
m
X
j
=1
λ
j
∇
g
j
(
a
)
·
x
′
(
t
0
)
≤
0
W
e
pause
to
examine
the
terms
of
this
summation.
Most
are
0
.
Here
is
the
reasoning.
F
or
nonbinding
g
j
(
x
)
,
λ
j
=
0
.
F
or
binding
g
j
(
x
)
except
j
=
m
,
we
have
∇
g
j
(
a
)
·
x
′
(
t
0
)
=
0
,
since
∇
g
j
(
a
)
is
a
normal
vector
of
S
and
x
(
t
)
lies
in
S
.
Finally
,
∇
g
k
(
a
)
·
x
′
(
t
0
)
>
0
by
our
choice
of
x
(
t
)
.
W
e
apply
these
to
our
inequality
.
−
m
X
j
=1
λ
j
∇
g
j
(
a
)
·
x
′
(
t
0
)
≤
0
−
λ
k
∇
g
k
(
a
)
·
x
′
(
t
0
)
≤
0
λ
k
≥
0
131
2.3.10
Kuhn-T
uck
er
with
Non-Negativity
Constraints
If
we
demand
that
x
i
≥
0
for
each
i
,
then
w
e
have
added
n
new
constraints
to
our
Langrangian.
W
e
will
use
µ
i
fo
r
the
Lagrange
multipliers
of
the
x
i
.
L
(
x,
λ,
µ
)
=
f
(
x
)
+
m
X
j
=1
λ
j
g
j
(
x
)
+
n
X
i
=1
µ
i
x
i
This
is
unwieldy
.
If
we
are
clever,
we
can
do
b
etter.
F
or
1
≤
k
≤
n
,
the
inequality
conditions
for
µ
k
a
re
∂
L
∂
µ
k
=
x
k
≥
0
and
µ
k
≥
0
If
the
constraint
x
k
≥
0
is
not
binding,
then
µ
k
=
0
.
The
µ
k
x
k
term
is
0
in
L
and
its
partial
derivatives.
We
can
remove
it,
but
we
will
still
need
to
verify
that
x
k
≥
0
is
satisfied.
If
the
constraint
is
binding,
then
w
e
can
still
remove
the
µ
k
x
k
term
from
L
.
The
term
go
es
to
0
anyw
a
y
in
L
x
i
fo
r
i
=
k
.
In
L
x
k
,
there
is
a
single
+
µ
k
term.
L
x
k
is
supp
osed
to
b
e
0
,
whereas
µ
k
≥
0
.
We
can
replicate
this
effect
by
removing
µ
k
x
k
from
L
,
but
requiring
that
the
remaining
terms
of
L
x
k
have
a
sum
less
than
o
r
equal
to
0
.
F
or
each
k
,
w
e
can
remove
the
va
riable
µ
k
and
the
µ
k
x
k
term
from
our
Lagrangian
in
exchange
for
some
new
conditions.
132
Co
rollary
2.24
[The
Kuhn-T
uck
er
Conditions
with
Non-Negativity]
Given
the
objective
function
f
(
x
)
and
constraints
of
the
forms
g
j
(
x
)
≥
0
or
g
j
(
x
)
=
0
,
along
with
x
i
≥
0
for
each
i
,
then
at
any
lo
cal
maximizer
a
one
of
the
following
must
b
e
true.
1
There
is
some
vecto
r
λ
such
that
(
a,
λ
)
satisfies
the
Kuhn-T
ucker
Conditions
with
Non-
Negativit
y
Constraints
:
F
or
each
i
,
a
i
≥
0
and
∂
L
∂
x
i
(
a,
λ
)
≤
0
and
a
i
∂
L
∂
x
i
(
a,
λ
)
=
0
F
or
each
equalit
y
constraint
function
g
j
,
g
j
(
a
)
=
0
F
or
each
inequalit
y
constraint
function
g
j
,
g
j
(
a
)
≥
0
and
λ
j
≥
0
and
λ
j
g
j
(
a
)
=
0
2
The
binding
constraints
(including
any
x
i
=
0
)
have
linea
rly
dep
endent
gradients
at
a
.
2.3.11
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
shap
e
of
an
intersection
of
level
sets
(Corolla
ry
2.20)
The
Lagrangian
fo
r
multiple
constraints
(Definition
2.21)
First-o
rder
condition
for
multiple
equalit
y
constraints
(Theorem
2.22)
The
Kuhn-T
uck
er
conditions
(Theorem
2.23)
The
Kuhn-T
uck
er
conditions
with
non-negativity
(Co
rollary
2.24)
133
2.3.11
Section
Summa
ry
134
Chapter
3
Compa
rative
Statics
3.1
The
Implicit
F
unction
Theo
rem
Goals:
1
Identify
when
one
variable
of
an
implicit
equation
can
b
e
written
as
a
function
of
the
other(s)
2
Compute
the
derivative
of
one
variable
with
resp
ect
to
another
3
Determine
how
the
optimal
choice
resp
onds
to
changes
in
an
exogenous
parameter
3.1.1
Compa
rative
Statics
Compa
rative
statics
study
situations
with
one
or
mo
re
choice
variables
x
i
and
one
or
more
exoge-
nous
pa
rameters
α
j
.
Example
A
consumer
maximizing
her
utilit
y
has
the
choice
va
riable
q
:
the
quantity
of
a
product
they
buy
and
the
exogenous
pa
rameter
p
:
the
price
of
that
product
136
Example
F
rom
the
p
oint
of
view
of
a
producer,
the
choice
variables
might
b
e
q
:
the
quantity
of
a
product
they
produce
p
:
the
price
they
sell
for
the
exogenous
pa
rameters
may
include
w
,
r
:
the
price
of
lab
or
or
capital
t,
s
:
a
tax
or
subsidy
imp
osed
by
the
government
some
asp
ect
of
the
demand
function
Rema
rk
The
choice
va
riables
a
re
the
va
riables
whose
value
is
chosen
by
the
agent
who
wants
to
maximize
the
function.
In
economics,
we
assume
that
cho
osers
are
rational
and
well-info
rmed.
We
assume
they
will
learn
the
value
of
the
pa
rameters.
After
that,
they
will
pick
the
value
of
the
choice
va
riables
that
maximizes
their
objective
function.
Comparative
statics
ask
how
the
outcome
changes
as
the
value
of
the
parameter
changes.
We
will
present
the
to
ols
to
compute
tw
o
types
of
comparative
statics
1
How
do
es
the
optimal
value
of
a
choice
variable
change
as
a
parameter
changes?
2
How
do
es
the
value
of
the
objective
function
change
as
a
parameter
changes?
Notation
Given
an
objective
function
f
(
x,
α
)
,
with
choice
va
riable
x
and
parameter
α
,
we
exp
ect
that
different
values
of
α
will
lead
the
cho
oser
to
pick
different
values
of
x
.
The
cho
oser’s
optimal
choice
is
a
function
of
α
that
we
write
x
∗
(
α
)
.
The
first-o
rder
condition
tells
us
that
for
each
α
,
x
∗
(
α
)
must
satisfy
the
first-order
condition:
f
x
(
x
∗
(
α
)
,
α
)
=
0
Without
an
expression
for
x
∗
(
α
)
,
trying
to
understand
its
rate
of
change
raises
a
sequence
of
questions.
1
What
is
dx
∗
(
α
)
dα
?
137
3.1.1
Compa
rative
Statics
2
How
do
we
know
that
x
∗
(
α
)
is
differentiable?
3
How
do
we
know
that
x
∗
(
α
)
is
even
a
function?
Given
a
sp
ecific
f
,
we
can
solve
the
first-order
condition
algeb
raically
.
If
w
e
can
tell
which
solution
is
the
maximizer
(assuming
one
exists),
then
we
can
write
an
exp
ression
for
x
∗
as
a
function
of
α
and
answ
er
all
these
questions.
In
the
case
of
a
general
or
abstract
f
,
this
may
not
b
e
p
ossible.
Fo
rtunately
,
mathematics
has
the
vocabulary
to
describ
e
these
questions
in
abstract.
It
also
has
a
p
ow
erful
to
ol
to
answ
er
them.
3.1.2
Explicit
F
unctions
and
Implicit
Equations
Definition
3.1
An
explicit
function
equates
a
function
or
dep
endent
variable
to
an
exp
ression
entirely
in
terms
of
the
indep
endent
va
riables.
Example
The
follo
wing
equations
describ
e
explicit
functions.
f
(
x
)
=
x
3
−
√
x
+
7
y
=
sin(
x
2
)
z
=
3
xy
+
x
2
−
2
g
(
x
1
,
x
2
,
x
3
)
=
e
x
1
cos(
x
2
x
3
)
Since
each
input
has
at
most
one
output,
graphs
of
explicit
functions
pass
the
vertical
line
test.
We
also
have
a
va
riety
of
to
ols
fo
r
differentiating
explicit
functions
(though
not
all
are
differentiable).
W
e
do
not
have
the
luxury
of
alwa
ys
w
o
rking
with
explicit
functions.
We
use
the
following
vo
cabulary
when
w
e
wish
to
dra
w
a
contrast
with
explicit
functions.
Definition
3.2
An
implicit
equation
in
tw
o
or
mo
re
variables
do
es
not
necessarily
have
any
dep
endent
variable
which
is
equated
to
an
exp
ression
in
the
others.
Sometimes
we
can
solve
an
implicit
equation
to
obtain
an
explicit
function,
but
other
times
we
cannot.
138
Example
2
x
+
3
y
=
12
=
⇒
y
=
4
−
2
3
x
x
2
1
+
x
2
2
+
x
2
3
=
25
=
⇒
x
3
=
±
q
25
−
x
2
1
−
x
2
2
(
multiple
outputs
fo
r
each
input,
not
a
function
)
13
−
3
y
3
+
xy
4
=
y
5
=
⇒
x
=
y
5
−
13
+
3
y
3
y
4
(
can
solve
fo
r
x
but
not
for
y
)
x
3
+
y
3
=
6
xy
=
⇒
y
=
???
(
requires
the
cubic
fo
rmula,
not
a
function
)
The
solutions
to
any
implicit
equation
form
a
level
set
of
some
function.
We
obtain
the
function
by
rewriting
the
equation
in
a
fo
rm
lik
e:
F
(
x,
y
)
=
c
or
F
(
x
)
=
c
3.1.3
An
Implicit
Equation
Consider
the
implicit
equation
x
3
+
y
3
=
6
xy
a
W
rite
the
equation
in
the
fo
rm
F
(
x,
y
)
=
c
.
b
Do
es
this
equation
have
a
graph?
c
Can
w
e
solve
it
to
obtain
an
explicit
function?
Solution
a
W
e
can
write
x
3
+
y
3
−
6
xy
=
0
b
Y
es.
The
graph
is
the
set
of
p
oints
(
x,
y
)
that
satisfy
the
equation.
Every
equation
has
a
graph,
though
some
graphs
a
re
the
empty
set.
139
3.1.3
An
Implicit
Equation
c
W
e
could
lo
ok
up
the
cubic
formula
and
solve
this.
The
cubic
fo
rmula,
like
the
quadratic
formula,
returns
multiple
values
for
the
variable.
The
result
is
not
a
function.
We
can
see
this
in
the
graph,
which
fails
the
vertical
line
test.
Figure
3.1:
The
graph
of
x
3
+
y
3
−
6
xy
=
0
Sometimes
we
cannot
write
y
as
an
explicit
function
of
x
in
a
wa
y
that
describ
es
the
whole
graph.
If
w
e
are
only
interested
in
rates
of
change,
we
can
restrict
our
attention
to
a
small
neighb
orhoo
d
of
the
graph.
At
most
p
oints,
this
neighb
orhoo
d
do
es
lo
ok
like
the
graph
of
some
function
y
=
f
(
x
)
.
Click to Load Applet
Figure
3.2:
A
neighb
orhoo
d
in
which
the
graph
x
3
+
y
3
−
6
xy
=
0
is
the
graph
of
an
explicit
function
This
is
not
alw
ays
p
ossible.
Consider
the
pa
rt
of
x
3
+
y
3
−
6
xy
=
0
near
(0
,
0)
.
In
any
neighborhoo
d
w
e
cho
ose,
there
are
three
branches
of
the
graph
that
extend
to
the
right.
No
matter
how
small
a
neighb
o
rho
o
d
w
e
cho
ose
around
(0
,
0)
,
the
graph
will
fail
the
vertical
line
test
and
cannot
b
e
written
as
y
=
f
(
x
)
.
140
3.1.4
The
Implicit
F
unction
Theorem
The
implicit
function
theorem
tells
us
when
a
p
oint
on
the
graph
of
an
implicit
equation
has
a
neighb
o
rho
o
d
that
is
identical
to
the
graph
of
an
explicit
function.
The
basic
version
tak
es
an
implicit
equation
in
tw
o
va
riables
and
writes
a
function
that
expresses
one
(the
dep
endent
variable)
in
terms
of
the
other
(the
indep
endent
va
riable).
Theo
rem
3.3
[The
Implicit
F
unction
Theorem]
Supp
ose
F
(
x,
y
)
is
a
continuously
differentiable
function
and
(
a,
b
)
is
a
p
oint
on
F
(
x,
y
)
=
c
.
If
F
y
(
a,
b
)
=
0
,
then
there
exists
a
differentiable
function
f
(
x
)
such
that
y
=
f
(
x
)
and
F
(
x,
y
)
=
c
describ
e
the
same
graph
in
some
neighb
o
rho
o
d
of
(
a,
b
)
.
The
theorem
do
es
not
tell
us
what
the
function
f
(
x
)
is,
only
that
it
exists.
Even
so,
we
can
express
its
derivatives
in
terms
of
F
.
Co
rollary
3.4
The
derivative
of
f
with
resp
ect
to
x
at
a
is
given
by
f
′
(
a
)
=
−
F
x
(
a,
b
)
F
y
(
a,
b
)
The
derivation
of
this
fo
rmula
is
famous
and
not
to
o
difficult.
T
o
compute
the
derivative
of
f
,
we
pa
rameterize
a
path
in
the
graph
y
=
f
(
x
)
.
Unlike
our
previous
parametrizations,
w
e
will
use
x
as
the
pa
rameter.
Differentiating
x
with
resp
ect
to
x
is
most
palatable
with
Leibniz
notation.
x
=
x
dx
dx
=
1
y
=
f
(
x
)
dy
dx
=
f
′
(
x
)
The
p
oints
(
x,
f
(
x
))
lie
in
F
(
x,
y
)
=
c
near
(
a,
b
)
.
Thus
the
composition
F
(
x,
f
(
x
))
is
the
constant
function
c
.
Differentiating
F
(
x,
f
(
x
))
=
c
with
resp
ect
to
the
parameter
x
to
p
ro
duces
an
equation
that
contains
f
′
(
x
)
.
We
solve
this
equation
to
obtain
an
expression
for
f
′
(
x
)
.
141
3.1.4
The
Implicit
Function
Theorem
F
(
x,
f
(
x
)
=
c
(
y
=
f
(
x
)
lies
in
F
(
x,
y
)
=
c
)
dF
(
x,
f
(
x
))
dx
=
0
(
derivative
of
a
constant
is
0)
F
x
(
x,
f
(
x
))
dx
dx
+
F
y
(
x,
f
(
x
))
d
f
(
x
)
dx
=
0
(
chain
rule
)
F
x
(
x,
f
(
x
))(1)
+
F
y
(
x,
f
(
x
))
f
′
(
x
)
=
0
(
evaluate
derivative
of
x
)
f
′
(
x
)
=
−
F
x
(
x,
f
(
x
))
F
y
(
x,
f
(
x
))
(
solve
fo
r
f
′
(
x
))
f
′
(
a
)
=
−
F
x
(
a,
b
)
F
y
(
a,
b
)
(
evaluate
at
x
=
a
)
The
implicit
function
theorem
guarantees
that
f
exists
and
is
differentiable
in
a
neighb
orhoo
d
of
a
.
Since
we
don’t
know
ho
w
big
this
neighborhoo
d
is,
x
=
a
is
the
only
p
oint
at
which
we
can
be
sure
f
′
(
x
)
exists.
W
e
often
apply
this
formula
before
checking
whether
the
implicit
function
theo
rem
applies.
Assuming
F
is
continuously
differentiable,
the
theo
rem
will
only
fail
when
F
y
(
a,
b
)
=
0
.
Conveniently
,
this
formula
will
b
e
undefined
in
that
case.
W
e
can
also
determine
the
derivative
of
f
geometrically
.
Near
(
a,
b
)
,
the
graph
y
=
f
(
x
)
is
the
level
set
F
(
x,
y
)
=
c
.
We
know
the
∇
F
(
a,
b
)
=
(
F
x
(
a,
b
)
,
F
y
(
a,
b
))
is
normal
to
the
level
set.
The
gradient
has
a
slop
e
of
F
y
(
a,b
)
F
x
(
a,b
)
.
The
tangent
line,
which
is
p
erp
endicula
r,
has
a
negative
recip
ro
cal
slop
e:
−
F
x
(
a,b
)
F
y
(
a,b
)
.
The
slop
e
of
the
tangent
line
is
also
the
derivative
f
′
(
a
)
.
Click to Load Applet
Figure
3.3:
The
gradient
of
F
and
the
tangent
line
whose
slop
e
is
f
′
(
a
)
142
3.1.5
Applying
the
Implicit
F
unction
Theorem
If
x
and
y
satisfy
x
3
−
2
xy
2
+
3
y
=
−
13
,
show
that
y
can
b
e
written
as
a
function
f
(
x
)
of
x
near
(2
,
3)
and
compute
f
′
(2)
Solution
W
e
may
first
w
ant
to
check
that
(2
,
3)
satisfies
the
implicit
equation.
(2)
2
−
2(2)(3)
2
+
3(3)
=
−
13
In
order
to
apply
the
implicit
function
theorem,
we
need
to
know
that
F
(
x,
y
)
=
x
3
−
2
xy
2
+
3
y
has
continuous
pa
rtial
derivatives.
It
is
a
p
olynomial,
so
it
do
es.
Finally
,
we
need
to
show
F
y
(2
,
3)
=
0
.
F
y
(
x,
y
)
=
−
4
xy
+
3
F
y
(2
,
3)
=
−
4(2)(3)
+
3
F
y
(2
,
3)
=
−
21
=
0
Therefo
re,
b
y
the
implicit
function
theo
rem,
there
is
a
function
f
(
x
)
such
that
y
=
f
(
x
)
and
x
3
−
2
xy
2
+
3
y
=
−
13
describ
e
the
same
graph
near
(2
,
3)
.
The
derivative
of
f
at
x
=
2
is
given
by
Corolla
ry
3.4.
f
′
(2)
=
−
F
x
(2
,
3)
F
y
(2
,
3)
=
−
3(2)
2
−
2(3)
2
−
4(2)(3)
+
3
=
−
−
6
−
21
=
−
2
7
Figure
3.4:
The
graph
of
x
3
−
2
xy
2
+
3
y
=
−
13
and
its
tangent
line
at
(2
,
3)
143
3.1.6
The
Derivative
of
the
Optimal
Choice
In
comparative
statics,
w
e
are
interested
in
the
function
x
∗
(
α
)
,
which
is
a
solution
to
the
equation
f
x
(
x,
α
)
=
0
.
We
apply
the
implicit
function
theorem
where:
α
takes
the
role
of
the
indep
endent
variable
“
x
”.
x
tak
es
the
role
of
the
dep
endent
variable
“
y
”.
f
x
(
x,
α
)
takes
the
role
of
the
t
wo-va
riable
function
“
F
”.
(
b,
a
)
is
a
p
oint
on
the
graph
f
x
(
x,
α
)
=
0
.
The
derivatives
of
F
a
re
second
derivatives
of
f
.
The
implicit
function
theo
rem
requires
that
F
x
(
b,
a
)
=
f
xx
(
b,
a
)
=
0
.
It
concludes
there
is
a
differentiable
function
x
∗
(
α
)
such
that
x
=
x
∗
(
α
)
matches
the
graph
of
f
x
(
x,
α
)
=
0
in
a
neighb
o
rho
o
d
of
(
b,
a
)
.
Co
rollary
3.5
Given
a
function
f
(
x,
α
)
,
supp
ose
that
1
a
is
a
value
of
α
and
b
=
x
∗
(
a
)
2
x
∗
(
α
)
satisfies
f
x
(
x
∗
(
α
)
,
α
)
=
0
nea
r
(
b,
a
)
3
f
(
x,
α
)
has
continuous
second
derivatives
near
(
b,
a
)
4
f
xx
(
b,
a
)
=
0
Then
dx
∗
(
a
)
dα
=
−
f
xα
(
b,
a
)
f
xx
(
b,
a
)
This
computes
the
derivative
at
a
p
oint.
If,
in
some
interval
of
α
values,
every
p
oint
(
x
∗
(
a
)
,
a
)
satisfies
these
conditions,
then
w
e
can
extend
this
to
a
derivative
function
for
x
∗
(
α
)
.
dx
∗
(
α
)
dα
=
−
f
xα
(
x
∗
(
α
)
,
α
)
f
xx
(
x
∗
(
α
)
,
α
)
144
Rema
rk
When
wo
rking
with
compa
rative
statics,
we
can
reasonably
assume
the
requirements
of
the
implicit
function
theo
rem.
1
If
an
optimal
choice
exists,
then
it
must
satisfy
the
first-order
condition.
2
It
makes
sense
to
wo
rk
with
smo
oth
functions
when
mo
deling
empirical
data.
3
At
a
maximizer,
f
xx
is
likely
(though
not
required)
to
b
e
negative.
Alternately
,
assuming
f
xx
<
0
on
the
entire
domain
gua
rantees
that
x
∗
(
α
)
is
actually
the
global
maximizer
for
each
α
.
3.1.7
The
Construction
of
f
(
x
)
Here
we
will
give
a
construction
of
the
function
y
=
f
(
x
)
that
matches
the
graph
of
F
(
x,
y
)
=
c
nea
r
a
p
oint
(
a,
b
)
.
The
same
argument
wo
rks
for
more
variables,
but
the
pictures
are
harder
to
draw.
Constructing
f
(
x
)
requires
the
following
to
ols:
1
Lemma
1.11:
If
f
′
(
x
)
>
0
then
f
is
increasing.
2
Definition
of
continuity:
The
values
of
F
can
b
e
kept
arbitra
rily
close
to
F
(
a,
b
)
by
restricting
to
p
oints
sufficiently
close
to
(
a,
b
)
.
3
The
intermediate
value
theorem:
If
f
(
x
)
is
continuous
and
f
(
a
−
)
<
c
<
f
(
a
+
)
then
there
is
a
value
k
b
et
ween
a
−
and
a
+
such
that
f
(
k
)
=
c
.
The
implicit
function
theorem
requires
that
F
y
(
a,
b
)
=
0
.
There
are
tw
o
cases
to
consider,
but
the
a
rguments
are
analogous.
We
will
consider
the
case
F
y
(
a,
b
)
>
0
.
145
3.1.7
The
Construction
of
f
(
x
)
1
F
is
continuously
differentiable,
so
there
is
a
neighb
orhoo
d
of
(
a,
b
)
where
F
y
(
x,
y
)
>
0
.
2
Within
this
neighb
orhoo
d,
we
can
travel
h
units
in
the
y
-direction
from
(
a,
b
)
.
Let
F
(
a,
b
+
h
)
=
c
+
and
F
(
a,
b
−
h
)
=
c
−
3
Since
F
y
(
x,
y
)
>
0
,
w
e
have
c
+
>
c
and
c
−
<
c
.
4
Since
F
is
continuous,
there
is
a
neigh-
b
o
rho
o
d
of
(
a,
b
+
h
)
where
F
(
x,
y
)
>
c
.
There
is
a
neighb
orhoo
d
of
(
a,
b
−
h
)
where
F
(
x,
y
)
<
c
.
5
We
consider
segments
from
(
x,
b
+
h
)
to
(
x,
b
−
h
)
,
with
one
endpoint
in
each
neighb
o
rho
o
d.
6
Apply
the
intermediate
value
theorem,
since
F
(
x,
b
−
h
)
<
c
<
F
(
x,
b
+
h
)
,
there
is
a
k
b
etw
een
b
−
h
and
b
+
h
such
that
F
(
x,
k
)
=
c
.
7
Since
F
y
(
x,
y
)
>
0
,
F
is
increasing
along
the
segment,
so
it
cannot
tak
e
the
value
c
more
than
once.
The
k
in
the
previous
step
is
unique.
8
Rep
eat
this
for
every
segment
of
the
fo
rm
(
x,
b
+
h
)
to
(
x,
b
−
h
)
,
and
define
f
(
x
)
=
k
.
F
y
(
x,
y
)
>
0
F
(
a,
b
+
h
)
=
c
+
>
c
F
(
a,
b
−
h
)
=
c
−
<
c
(
a,
b
)
F
y
(
x,
y
)
>
0
F
(
x,
y
)
>
c
F
(
x,
y
)
<
c
(
a,
b
)
n
some
in
F
y
(
x,
y
)
>
0
F
(
x,
b
+
h
)
>
c
F
(
x,
b
−
h
)
<
c
(
x,
b
+
h
)
(
x,
b
−
h
)
(
a,
b
)
(
x,
k
)
F
y
(
x,
y
)
>
0
y
=
f
(
x
)
146
3.1.8
The
Implicit
F
unction
Theorem
with
Mo
re
Va
riables
W
e
can
also
apply
the
implicit
function
theorem
to
an
implicit
equation
of
n
>
2
va
riables.
In
this
case,
one
dep
endent
va
riable
can
be
expressed
as
a
function
of
n
−
1
indep
endent
variables.
Theo
rem
3.6
[The
Multivariable
Implicit
F
unction
Theorem]
Supp
ose
F
(
x,
y
)
is
a
continuously
differentiable
function
and
F
(
a,
b
)
=
c
.
If
F
y
(
a,
b
)
=
0
,
then
there
exists
a
differentiable
function
f
(
x
)
such
that
y
=
f
(
x
)
and
F
(
x,
y
)
=
c
describe
the
same
graph
in
some
neighb
o
rho
o
d
of
(
a,
b
)
.
Since
f
(
x
)
is
an
n
−
1
variable
function,
the
derivatives
we
can
compute
a
re
the
partial
derivatives.
The
fo
rmula
for
these
derivatives
is
analogous
to
the
single-va
riable
version.
Co
rollary
3.7
F
or
each
va
riable
x
k
,
the
pa
rtial
derivative
of
f
with
resp
ect
to
x
k
at
a
is
given
by
f
x
k
(
a
)
=
−
F
x
k
(
a,
b
)
F
y
(
a,
b
)
W
e
can
now
justify
our
earlier
characterization
of
level
sets.
At
the
time
we
made
no
mention
of
dep
endent
and
independent
va
riables.
This
lack
of
distinction
actually
mak
es
the
implicit
function
theo
rem
easier
to
apply
.
Rema
rk
There
is
nothing
special
ab
out
the
letter
y
,
nor
the
fact
that
it
is
the
last
variable
of
F
.
The
variable
“
y
”
in
the
implicit
function
theo
rem
can
apply
to
any
va
riable
of
an
implicit
equation,
so
long
as
the
pa
rtial
derivative
with
resp
ect
to
that
variable
is
not
zero.
147
3.1.8
The
Implicit
Function
Theorem
with
More
Va
riables
Click to Load Applet
Figure
3.5:
A
p
oint
where
x
3
+
y
3
−
6
xy
=
0
cannot
b
e
written
in
the
form
y
=
f
(
x
)
b
ecause
it
fails
the
vertical
line
test
but
can
b
e
rewritten
as
x
=
f
(
y
)
If
we
a
re
not
picky
ab
out
which
variable
is
written
as
a
function
of
the
others,
then
the
implicit
function
theo
rem
only
fails
when
all
the
partial
derivatives
a
re
0
.
As
long
as
the
gradient
of
the
function
is
not
the
zero
vecto
r,
one
comp
onent
can
play
the
role
of
the
dep
endent
variable.
This
is
exactly
what
our
co
rollary
requires.
Co
rollary
2.5
Let
g
(
x
)
b
e
a
continuously
differentiable
function
at
a
.
If
a
lies
on
the
level
set
g
(
x
)
=
c
and
∇
g
(
a
)
=
0
,
then
the
level
set
g
(
x
)
=
c
is
a
(
n
−
1)
-dimensional
shap
e
in
some
neighb
o
rho
o
d
of
a
.
Sp
ecifically
,
it
is
the
graph
of
a
differentiable
function
of
n
−
1
of
the
variables
of
R
n
.
The
“differentiable
function”
is
the
function
f
produced
by
the
implicit
function
theorem.
W
e
can
also
apply
this
version
of
the
implicit
function
theorem
and
its
corolla
ry
to
compa
rative
statics.
Consider
a
function
of
one
choice
variable
and
multiple
parameters.
We
write
this
objective
function
as
f
(
x,
α
)
.
The
implicit
function
theorem
and
its
corolla
ry
can
compute
the
partial
derivatives
of
x
∗
(
α
)
.
148
Co
rollary
3.8
Given
a
function
f
(
x,
α
)
,
supp
ose
that
1
a
is
a
value
of
α
and
b
=
x
∗
(
a
)
.
2
x
∗
(
α
)
satisfies
f
x
(
x
∗
(
α
)
,
α
)
=
0
near
(
b,
a
)
3
f
(
x,
α
)
has
continuous
second
derivatives
near
(
b,
a
)
4
f
xx
(
b,
a
)
=
0
Then
∂
x
∗
(
a
)
∂
α
k
=
−
f
xα
k
(
b,
a
)
f
xx
(
b,
a
)
.
W
e
can
justify
the
multivariable
implicit
function
and
Co
rollary
3.7
using
arguments
similar
to
the
single-va
riable
versions.
The
construction
of
f
is
the
same
for
b
oth
versions,
except
that
x
replaces
x
.
The
computation
of
the
pa
rtial
derivatives
of
f
requires
more
adaptation.
Since
f
k
(
x
)
is
a
partial
derivative,
we
treat
x
k
as
a
pa
rameter.
The
other
x
i
a
re
constants,
held
equal
to
the
co
rresp
onding
comp
onents
of
a
.
x
i
=
(
a
i
if
i
=
k
x
k
if
i
=
k
dx
i
dx
k
=
(
0
if
i
=
k
1
if
i
=
k
y
=
f
(
x
)
dy
dx
k
=
f
x
k
(
x
)
The
strategy
is
the
same.
We
differentiate
F
(
x,
f
(
x
))
with
resp
ect
to
x
k
,
solve
for
f
x
k
(
x
)
,
and
evaluate
at
x
k
=
a
k
.
F
(
x,
f
(
x
))
=
c
((
x,
f
(
x
))
lies
in
F
(
x,
y
)
=
c
)
dF
(
x,
f
(
x
))
dx
k
=
0
(
derivative
of
a
constant
is
0)
n
X
i
=1
F
x
i
(
x,
f
(
x
))
dx
i
dx
k
+
F
y
(
x,
f
(
x
))
f
x
k
(
x
)
=
0
(
chain
rule
)
F
x
k
(
x,
f
(
x
))
dx
k
dx
k
+
F
y
(
x,
f
(
x
))
f
x
k
(
x
)
=
0
(
dx
i
dx
k
=
0
fo
r
i
=
k
)
F
x
k
(
x,
f
(
x
))(1)
+
F
y
(
x,
f
(
x
))
f
x
k
(
x
)
=
0
(
evaluate
derivative
of
x
k
)
f
x
k
(
x
)
=
−
F
x
k
(
x,
f
(
x
))
F
y
(
x,
f
(
x
))
(
solve
fo
r
f
x
k
(
x
))
f
x
k
(
a
)
=
−
F
x
k
(
a,
b
)
F
y
(
a,
b
)
(
evaluate
at
x
=
a
)
149
3.1.9
The
Implicit
F
unction
Theorem
fo
r
Multiple
Equations
W
e
have
seen
tw
o
instances
previously
where
the
solution
to
multiple
implicit
equations
was
relevant.
1
The
feasible
set
of
multiple
equality
constraints
is
the
intersection
of
multiple
level
sets.
2
Critical
p
oints
of
a
multivariable
function
satisfy
f
x
i
(
x
)
=
0
for
each
i
.
A
graph
of
the
form
y
=
f
(
x
)
in
R
n
+1
will
have
dimension
n
.
In
general,
each
equation
we
wish
to
satisfy
lo
w
ers
the
dimension
of
our
space
of
solutions
b
y
1
.
If
we
w
ant
to
express
an
intersection
of
level
sets,
y
=
f
(
x
)
will
not
have
the
right
dimension.
The
w
ay
to
handle
this
loss
of
dimension
in
an
explicit
function
is
to
increase
the
number
of
dep
endent
va
riables.
Sp
ecifically
,
if
we
have
n
-va
riables
x
and
m
-variables
y
then
the
graph
of
a
family
of
functions
y
j
=
f
j
(
x
)
will
have
dimension
n
in
R
n
+
m
The
most
general
version
of
the
implicit
function
theorem
states
when
a
family
of
implicit
equations
can
b
e
exp
ressed
as
a
family
of
explicit
functions
instead.
Notation
Given
a
family
of
functions
F
(
x,
y
)
=
(
F
1
(
x,
y
)
,
F
2
(
x,
y
)
,
.
.
.
,
F
m
(
x,
y
))
the
derivative
of
F
with
resp
ect
to
y
j
is
the
family
of
functions
∂
F
∂
y
j
(
x,
y
)
=
∂
F
1
∂
y
j
(
x,
y
)
,
∂
F
2
∂
y
j
(
x,
y
)
,
.
.
.
,
∂
F
n
∂
y
j
(
x,
y
)
Note
that
we
are
using
subscripts
here
to
indicate
different
comp
onents
of
the
vector
F
,
not
as
pa
rtial
derivatives.
Theo
rem
3.9
[The
Implicit
F
unction
Theorem
for
Multiple
Dep
endent
Va
riables]
Supp
ose
y
is
an
m
-vector
and
F
(
x,
y
)
is
a
family
of
m
continuously
differentiable
functions
such
that
F
(
a,
b
)
=
(
c
1
,
.
.
.
,
c
m
)
.
If
the
vecto
rs
∂
F
∂
y
j
(
a,
b
)
are
linearly
indep
endent,
then
there
exists
a
family
of
differentiable
functions
(
f
j
(
x
))
such
that
the
equations
y
j
=
f
j
(
x
)
describe
the
same
graph
as
F
(
x,
y
)
=
(
c
1
,
.
.
.
,
c
m
)
in
some
neighb
o
rho
o
d
of
(
a,
b
)
.
W
e
can
use
the
chain
rule
to
solve
for
pa
rtial
derivatives
∂
f
j
(
a,
b
)
∂
x
k
,
but
the
derivative
of
any
equation
F
j
(
x,
y
)
=
c
j
with
resp
ect
to
x
k
will
contain
the
derivatives
of
all
the
f
j
.
T
o
compute
the
derivative
w
e
w
ant,
we
need
to
differentiate
all
of
the
implicit
equations
and
solve
a
system
of
equations.
Here
is
the
simplest
example.
150
Example
Consider
t
wo
implicit
equations
F
1
(
x,
y
1
,
y
2
)
=
c
1
and
F
2
(
x,
y
1
,
y
2
)
=
c
2
.
Assuming
F
y
1
and
F
y
2
a
re
linea
rly
indep
endent,
the
implicit
function
gua
rantees
differentiable
explicit
functions
y
1
=
f
1
(
x
)
and
y
2
=
f
2
(
x
)
.
Differentiating
the
original
implicit
equations
with
resp
ect
to
x
gives
∂
F
1
∂
x
dx
dx
+
∂
F
1
∂
y
1
f
′
1
(
x
)
+
∂
F
1
∂
y
2
f
′
2
(
x
)
=
0
∂
F
2
∂
x
dx
dx
+
∂
F
2
∂
y
1
f
′
1
(
x
)
+
∂
F
2
∂
y
2
f
′
2
(
x
)
=
0
W
e
can
use
this
system
to
solve
for
f
′
1
(
x
)
and
f
′
2
(
x
)
.
3.1.10
The
Derivative
of
Optimal
Choice
with
Multiple
Choice
Va
riables
Supp
ose
y
our
utility
is
a
function
of
t
wo
choice
variables
and
one
exogenous
parameter.
u
(
x
1
,
x
2
,
α
)
Y
our
optimal
choices
x
∗
1
(
α
)
and
x
∗
2
(
α
)
satisfy
the
following
implicit
equations:
u
1
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
=
0
u
2
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
=
0
.
By
the
implicit
function
theo
rem,
we
can
write
x
∗
1
(
α
)
and
x
∗
2
(
α
)
as
differentiable
explicit
functions
of
α
if
their
derivatives
with
resp
ect
to
x
1
and
x
2
a
re
linearly
independent.
Here
are
those
derivatives.
∂
(
u
1
,
u
2
)
∂
x
1
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
=
(
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
,
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
))
∂
(
u
1
,
u
2
)
∂
x
2
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
=
(
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
,
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
))
These
a
re
the
columns
of
the
second
upp
er
left
square
minor
of
the
Hessian
of
u
.
Columns
of
a
matrix
a
re
indep
endent,
if
the
matrix
has
a
nonzero
determinant.
|
H
u
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
2
|
=
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
=
0
Assuming
this
holds,
we
can
use
the
chain
rule
to
differentiate
b
oth
implicit
equations
with
resp
ect
to
α
.
We
obtain
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
1
(
α
)
dα
+
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
2
(
α
)
dα
+
u
1
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dα
dα
=
0
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
1
(
α
)
dα
+
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
2
(
α
)
dα
+
u
2
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dα
dα
=
0
151
3.1.10
The
Derivative
of
Optimal
Choice
with
Multiple
Choice
Va
riables
In
a
concrete
example
we
could
solve
for
dx
∗
1
(
α
)
dα
and
dx
∗
2
(
α
)
dα
directly
using
algebra.
If
that
seems
difficult
o
r
the
problem
is
abstract,
we
can
b
orro
w
an
app
roach
from
linear
algebra.
We
write
this
system
of
equations
as
a
matrix
p
ro
duct.
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
1
(
α
)
dα
dx
∗
2
(
α
)
dα
=
−
u
1
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
−
u
2
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
Cramer’s
rule
writes
the
solution
to
a
matrix
equation
as
a
ratio
of
determinants.
dx
∗
1
(
α
)
dα
=
−
u
1
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
−
u
2
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
dx
∗
2
(
α
)
dα
=
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
−
u
1
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
−
u
2
α
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
11
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
12
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
21
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
u
22
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
Notice
the
denominato
r
in
each
formula
is
|
H
u
(
x
∗
1
(
α
)
,
x
∗
2
(
α
)
,
α
)
2
|
.
W
e
replace
the
co
rresp
onding
column
with
the
right
side
of
the
equation
to
obtain
the
numerator.
Cramer’s
rule
can
apply
to
any
numb
er
of
va
riables,
so
w
e
can
extend
this
procedure
to
more
than
t
wo
choice
variables.
We
can
also
handle
multiple
parameters.
This
metho
d
would
compute
partial
derivatives
of
x
∗
i
(
α
)
with
resp
ect
to
some
α
j
.
Main
Ideas
The
optimal
choice
of
multiple
va
riables
satisfies
the
multiva
riable
first-order
condition.
The
implicit
function
theo
rem
applies
to
the
optimal
choice
variables,
if
the
Hessian
minor
co
r-
resp
onding
to
the
choice
variables
has
a
nonzero
determinant.
If
the
optimal
choice
satisfies
the
second-o
rder
condition,
this
is
automatic.
W
e
can
differentiate
the
multiva
riable
first-order
condition
with
resp
ect
to
a
pa
rameter
and
obtain
a
linea
r
system
of
equations
in
the
derivatives
of
the
choice
variables.
Cramer’s
rule
is
a
reliable
w
a
y
to
solve
such
sy
stems.
152
3.1.11
Section
Summa
ry
The
k
ey
definitions
and
results
from
this
section
were
Explicit
functions
and
implicit
equations
(Definitions
3.1
and
3.2)
The
implicit
function
theo
rem
(Theo
rem
3.3)
The
derivative
of
the
function
gua
ranteed
b
y
the
implicit
function
theo
rem
(Corolla
ry
3.4)
The
derivative
of
the
function
x
∗
(
α
)
(Corolla
ry
3.5)
153
3.2
The
Envelop
e
Theo
rem
Goals:
1
Use
the
envelop
e
theorem
to
compute
the
derivative
of
value
functions.
3.2.1
V
alue
Functions
Definition
3.10
Supp
ose
w
e
have
f
(
x,
α
)
,
a
differentiable
function.
x
is
a
choice
va
riable
α
is
a
parameter
x
∗
(
α
)
is
the
x
that
maximizes
f
(
x,
α
)
for
a
given
α
.
Then
the
outcome
that
will
o
ccur
fo
r
each
α
is
the
value
function
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
V
o
cabulary
In
the
case
that
f
has
an
economic
meaning,
we
sometimes
use
the
term
indirect
,
along
with
the
∗
notation,
to
refer
to
its
value
function
of
f
.
If
u
(
x,
α
)
is
a
utility
function
then
u
∗
(
α
)
=
u
(
x
∗
(
α
)
,
α
)
is
the
indirect
utilit
y
function
.
If
π
(
x,
α
)
is
a
profit
function
then
π
∗
(
α
)
=
π
(
x
∗
(
α
)
,
α
)
is
the
indirect
p
rofit
function
.
x
∗
(
α
)
is
a
solution
to
f
x
(
x,
α
)
=
0
.
Assuming
f
is
continuously
differentiable,
the
implicit
function
gua
rantees
that
x
∗
(
α
)
is
a
differentiable
function.
This
means
that
V
is
a
differentiable
function.
If
we
w
ant
to
understand
how
a
change
in
α
affects
the
value
of
V
,
we
want
to
compute
V
′
(
α
)
.
154
V
′
(
α
)
computes
ho
w
a
change
in
a
pa
rameter
affects
the
an
outcome,
assuming
the
agents
involved
mak
e
the
optimal
choices.
This
derivative
can
answer
a
variet
y
of
questions
in
economics.
Example
Ho
w
will
increasing
a
tax
impact
company
profits?
Will
increasing
a
subsidy
increase
consumer
w
ell-b
eing?
In
the
case
that
w
e
have
an
exp
ression
for
f
,
computing
V
′
(
α
)
is
not
difficult:
1
Solve
for
x
∗
(
α
)
2
Substitute
x
∗
(
α
)
into
f
(
x,
α
)
to
obtain
an
expression
for
V
(
α
)
3
Differentiate
V
(
α
)
But
is
there
a
b
etter
w
a
y?
3.2.2
The
Envelop
e
Theo
rem
The
envelop
e
theorem
gives
us
an
alternative
metho
d.
We
compute
V
′
(
α
)
by
pa
rametrizing
the
values
of
α
and
x
∗
in
a
neighb
o
rho
o
d
where
x
∗
(
α
)
is
differentiable.
We
use
α
as
the
parameter.
α
=
α
dα
dα
=
1
x
=
x
∗
(
α
)
W
e
can
apply
the
chain
rule
to
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
.
We
use
the
fact
that
x
∗
(
α
)
satisfies
the
first-order
condition
fo
r
all
α
.
Sp
ecifically
,
have
f
x
(
x
∗
(
α
)
,
α
)
=
0
.
V
′
(
α
)
=
d
f
(
x,
α
)
dα
=
f
x
(
x
∗
(
α
)
,
α
)
dx
∗
(
α
)
dα
+
f
α
(
x
∗
(
α
)
,
α
)
dα
dα
(
chain
rule
)
=
(0)
dx
∗
(
α
)
dα
+
f
α
(
x
∗
(
α
)
,
α
)(1)
(
FOC
)
=
f
α
(
x
∗
(
α
)
,
α
)
155
3.2.2
The
Envelop
e
Theorem
Rema
rk
This
computation
requires
us
to
know
that
x
∗
(
α
)
is
a
differentiable
function.
Here
are
t
wo
w
ays
to
verify
this.
1
Fo
r
a
concrete
function,
compute
x
∗
(
α
)
.
Verify
directly
that
it
is
differentiable.
2
In
more
abstract
settings,
apply
the
implicit
function
theorem.
Check
that
f
has
continuous
second
derivatives
f
xx
(
x
∗
(
α
)
,
α
)
=
0
fo
r
all
α
.
.
Theo
rem
3.11
[The
Envelop
e
Theorem,
Single-V
ariable]
Supp
ose
f
(
x,
α
)
is
a
differentiable
function,
and
x
∗
(
α
)
that
maximizes
f
for
each
α
is
a
differentiable
function.
The
follo
wing
tw
o
derivatives
are
equal:
V
′
(
α
)
|
{z
}
derivative
of
value
function
=
f
α
(
x
∗
(
α
)
,
α
)
|
{z
}
partial
derivative
of
original
function
The
envelop
e
theo
rem
allo
ws
us
to
compute
a
partial
derivative
of
f
instead
of
the
total
derivative
of
V
.
In
some
sense,
the
envelope
theorem
is
sa
ying
that
the
change
in
x
∗
do
es
not
matter.
This
mak
es
sense,
b
ecause
at
a
maximizer,
w
e
cannot
increase
the
value
of
the
function
by
changing
x
.
This
is
an
intere
sting
insight
into
the
b
ehavior
of
value
functions,
but
all
we
have
done
is
traded
one
derivative
fo
r
another.
We
still
need
to
compute
x
∗
(
α
)
to
evaluate
the
partial
derivative.
It
is
natural
to
ask:
do
es
the
envelop
e
theorem
save
us
any
wo
rk
in
practice?
Compa
re
the
following
methods
of
computing
V
′
(
α
)
:
Without
the
envelop
e
theo
rem
1
Compute
x
∗
(
α
)
2
Substitute
into
f
(
x,
α
)
to
get
V
(
α
)
3
Differentiate
V
(
α
)
With
the
envelop
e
theo
rem
1
Compute
x
∗
(
α
)
2
Pa
rtially
differentiate
f
(
x,
α
)
3
Substitute
x
∗
(
α
)
156
Rema
rk
In
concrete
situations,
the
first
metho
d
gives
us
a
more
complicated
function
to
differentiate.
In
abstract
situations,
the
first
metho
d
ma
y
be
imp
ossible
or
give
us
an
answer
in
a
less
useful
form.
The
envelope
theorem
can
also
b
e
justified
visually
.
If
w
e
pick
a
sp
ecific
a
,
we
can
compare
t
wo
functions
1
The
value
function:
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
,
which
uses
the
b
est
x
fo
r
e
ach
α
2
The
stubb
o
rn
function
:
V
0
(
α
)
=
f
(
x
∗
(
a
)
,
α
)
,
which
sticks
with
the
best
x
fo
r
a
,
even
if
α
changes.
The
stubb
o
rn
function
has
the
following
p
rop
erties
Since
its
x
-co
o
rdinate
is
constant,
the
derivative
of
V
0
is
equal
to
the
pa
rtial
derivative
f
α
.
V
0
(
a
)
=
V
(
a
)
F
or
any
other
α
,
x
∗
(
a
)
cannot
b
e
a
b
etter
choice
than
x
∗
(
α
)
.
Thus
V
0
(
α
)
≤
V
(
α
)
.
The
graph
y
=
V
0
(
α
)
meets
y
=
V
(
α
)
at
a
but
do
es
not
go
ab
ove
it.
They
must
b
e
tangent,
which
means
their
derivatives
a
re
equal.
V
′
(
a
)
=
V
′
0
(
a
)
=
f
α
(
x
∗
(
a
)
,
a
)
Click to Load Applet
Figure
3.6:
The
graphs
of
the
value
function
and
stubb
orn
function
and
their
tangent
line
The
envelop
e
theorem
gets
its
name
from
the
fact
that
y
=
V
(
α
)
envelop
es
all
of
the
stubb
o
rn
functions
y
=
V
0
(
α
)
,
generated
by
cho
osing
different
values
of
a
.
157
3.2.3
Generalizations
of
the
Envelop
e
Theo
rem
There
a
re
several
wa
ys
to
generalize
the
envelop
e
theorem.
First
we
can
consider
a
function
of
mo
re
choice
va
riables
or
mo
re
parameters.
1
If
f
is
a
function
of
n
choice
variables,
then
each
choice
va
riable
has
an
optimal
value
that
dep
ends
of
α
.
We
obtain
a
family
of
functions
x
∗
i
(
α
)
.
We
can
write
them
as
a
vector
x
∗
(
α
)
.
The
value
function
is
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
2
If
f
is
a
function
of
m
parameters
α
j
,
we
can
express
the
parameters
as
a
vecto
r
α
.
The
optimal
choice
of
each
choice
variable
is
a
function
of
all
the
α
j
.
The
value
function
also
a
function
of
all
the
α
j
.
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
Since
the
x
∗
i
(
α
)
satisfy
the
family
of
equations
f
x
i
(
x,
α
)
=
0
,
we
need
the
multi-equation
version
of
the
implicit
function
theo
rem
to
justify
their
existence
and
differentiability
.
The
value
function
is
a
function
of
multiple
parameters,
so
the
envelop
e
theorem
computes
its
pa
rtial
derivatives.
Theo
rem
3.12
[The
Envelop
e
Theorem,
Multiva
riable]
Supp
ose
f
(
x,
α
)
is
a
differentiable
function
x
∗
(
α
)
that
maximizes
f
for
each
α
is
a
differentiable
function
F
or
any
region
of
R
m
where
x
∗
(
α
)
is
differentiable
and
any
co
o
rdinate
α
k
of
α
w
e
have:
V
α
k
(
α
)
=
f
α
k
(
x
∗
(
α
)
,
α
)
W
e
can
again
sho
w
this
with
a
pa
rametrization.
Since
w
e
need
the
pa
rtial
derivative,
w
e
treat
α
k
as
a
parameter.
The
other
α
j
a
re
constants,
held
equal
to
the
corresponding
comp
onents
of
some
fixed
a
.
This
pa
rametrization
will
not
cover
the
entire
domain
of
α
,
only
those
that
lie
in
the
α
k
-direction
from
a
.
α
j
=
(
a
j
j
=
k
α
k
j
=
k
dα
j
dα
k
=
(
0
if
j
=
k
1
if
j
=
k
x
=
x
∗
(
α
)
In
general,
x
∗
i
is
a
multivariable
function
of
α
.
In
this
parametrization,
x
i
is
a
function
only
of
the
pa
rameter
α
k
.
By
the
first-order
condition,
x
∗
(
α
)
satisfies
f
x
i
(
x
∗
(
α
)
,
α
)
=
0
for
each
x
i
.
We
can
apply
158
the
chain
rule
to
V
(
α
)
=
f
(
x
∗
(
α
)
,
α
)
to
compute
V
α
k
(
α
)
.
V
α
k
(
α
)
=
d
f
(
x
∗
(
α
)
,
α
)
dα
k
=
n
X
i
=1
f
x
i
(
x
∗
(
α
)
,
α
)
dx
∗
i
(
α
)
dα
k
+
m
X
j
=1
f
α
j
(
x
∗
(
α
)
,
α
)
dα
j
dα
k
=
n
X
i
=1
(0)
dx
∗
i
(
α
)
dα
k
+
X
j
=
k
f
α
j
(
x
∗
(
α
)
,
α
)(0)
+
f
α
k
(
x
∗
(
α
)
,
α
)(1)
=
f
α
k
(
x
∗
(
α
)
,
α
)
This
is
valid
fo
r
any
α
that
lies
on
our
path
through
a
in
the
α
k
-direction.
W
e
can
apply
this
reasoning
to
any
a
though,
so
the
computation
holds
for
the
entire
domain
of
x
∗
(
α
)
.
Our
final
generalization
of
the
envelop
e
theo
rem
assumes
that
x
∗
(
α
)
is
the
optimal
choice
given
an
equalit
y
constraint,
g
(
x,
α
)
=
0
.
Theo
rem
3.13
[The
Envelop
e
Theorem,
Constrained]
Supp
ose
f
(
x,
α
)
is
a
differentiable
objective
function
g
(
x,
α
)
is
a
differentiable
constraint
function
x
∗
(
α
)
that
maximizes
f
subject
to
g
(
x,
α
)
=
0
fo
r
each
α
is
a
differentiable
function
λ
∗
(
α
)
is
the
value
of
λ
that
solves
the
first-o
rder
conditions
of
L
along
with
x
∗
(
α
)
and
α
.
F
or
any
region
of
R
m
where
x
∗
(
α
)
and
λ
∗
(
α
)
a
re
differentiable
and
any
co
ordinate
α
k
of
α
w
e
have:
V
α
k
(
α
)
=
L
α
k
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
Proving
this
version
requires
us
to
use
the
fact
that
g
(
x
∗
(
α
)
,
α
)
=
0
for
all
α
.
This
means
that
L
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
=
f
(
x
∗
(
α
)
,
α
)
+
λ
∗
(
α
)
g
(
x
∗
(
α
)
,
α
)
=
f
(
x
∗
(
α
)
,
α
)
+
λ
∗
(
α
)(0)
=
V
(
α
)
x
∗
(
α
)
do
es
not
necessa
rily
satisfy
the
first-order
condition
of
f
,
but
for
a
fixed
α
it
do
es
satisfy
the
first
o
rder
condition
of
L
(
x
∗
(
α
)
,
λ
∗
(
α
))
.
Sp
ecifically:
L
x
i
(
x
∗
(
α
)
,
λ
∗
(
α
))
=
0
L
λ
(
x
∗
(
α
)
,
λ
∗
(
α
))
=
0
159
3.2.3
Generalizations
of
the
Envelop
e
Theorem
W
e
can
use
the
same
pa
rametrization
as
the
unconstrained
case,
with
the
understanding
that
x
∗
(
α
)
no
w
describ
es
the
maximizer
of
the
constrained
optimization
and
with
λ
∗
(
α
)
the
corresponding
λ
value.
V
α
k
(
α
)
=
d
f
(
x
∗
(
α
)
,
α
)
dα
k
=
d
L
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
dα
k
=
n
X
i
=1
L
x
i
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
dx
∗
i
(
α
)
dα
k
+
m
X
j
=1
L
α
j
(
x
∗
(
α,
α,
λ
∗
(
α
))
dα
j
dα
k
+
L
λ
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
dλ
∗
(
α
)
dα
k
=
n
X
i
=1
(0)
dx
∗
i
(
α
)
dα
k
+
X
j
=
k
L
α
j
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))(0)
+
L
α
k
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))(1)
+
(0)
dλ
∗
(
α
)
dα
k
=
L
α
k
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
F
urther
Generalizations
W
e
can
extend
this
reasoning
to
multiple
equality
constraints
without
any
trouble.
W
e
can
also
also
extend
the
envelop
e
theorem
to
inequalit
y
constraints,
where
we
set
λ
∗
(
α
)
=
0
for
any
α
such
that
the
constraint
is
not
binding
at
the
maximizer
x
∗
(
α
)
.
If
x
∗
and
λ
∗
a
re
differentiable
functions
of
α
in
this
setting,
we
can
still
conclude
that
V
α
k
(
α
)
=
L
α
k
(
x
∗
(
α
)
,
α,
λ
∗
(
α
))
Ho
wever,
we
expect
x
∗
(
α
)
will
not
b
e
differentiable
at
the
transition
betw
een
the
binding
and
nonbinding
cases.
Proving
the
inequalit
y
case
is
identical
to
the
equality
case,
except
L
λ
(
x
(
t
)
,
α
(
t
)
,
λ
(
t
))
|
{z
}
=0
if
binding
λ
′
(
t
)
|
{z
}
=0
if
nonbinding
=
0
Rema
rk
The
generalizations
of
the
envelop
e
theo
rem
require
that
x
∗
(
α
)
is
a
differentiable
function.
Like
in
the
single-va
riable
case,
it
usually
makes
sense
to
solve
for
x
∗
(
α
)
and
directly
verify
that
it
is
differentiable.
If
w
e
need
to
use
the
implicit
function
theorem
instead,
w
e
need
the
multiva
riable
version.
The
requirement
fo
r
that
is
|
H
f
x
(
x,
α
)
|
=
0
or
|
H
L
x,λ
(
x,
α
)
|
=
0
dep
ending
on
whether
there
is
a
constraint.
160
3.2.4
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
definition
of
a
value
function
(Definition
3.10)
Three
versions
of
the
envelop
e
theo
rem
(Theo
rems
3.11,
3.12
and
3.13)
161
3.2.4
Section
Summa
ry
162
Chapter
4
Sufficient
Conditions
4.1
The
Extreme
V
alue
Theo
rem
Goals:
1
Recognize
when
the
extreme
value
theorem
applies.
2
Apply
the
extreme
value
theorem
to
identify
a
maximizer.
4.1.1
The
Extreme
V
alue
Theorem
In
previous
examples,
we
often
found
that
only
one
p
oint
satisfied
the
necessary
conditions
fo
r
a
maximizer.
Still
w
e
did
not
conclude
that
this
point
w
as
a
maximizer.
We
would
have
found
it
very
useful
to
kno
w
that
a
maximizer
existed
in
these
circumstances.
We
then
could
have
identified
our
point
as
that
maximizer.
Even
if
we
only
narro
wed
our
search
down
to
tw
o
or
three
p
otential
maximizers,
the
info
rmation
that
one
of
them
is
in
fact
the
maximizer
would
have
b
een
helpful.
This
is
what
the
extreme
value
theo
rem
do
es
fo
r
us.
Theo
rem
4.1
[The
Extreme
V
alue
Theorem]
If
f
(
x
)
is
a
continuous
function
and
S
is
a
closed
and
b
ounded
subset
of
the
domain
of
f
,
then
there
exists
an
x
∗
that
maximizes
f
(
x
)
subject
to
x
∈
S
.
In
order
to
apply
this
theo
rem,
w
e
need
to
b
e
able
to
identify
when
a
region
S
is
closed
and
b
ounded.
Here
a
re
the
definitions
of
those
terms.
Definition
4.2
Let
S
b
e
a
subset
of
R
n
.
S
is
closed
if
it
contains
all
of
the
p
oints
on
its
b
oundary
.
S
is
b
ounded
if
there
is
some
upp
er
limit
to
how
fa
r
its
p
oints
get
from
the
origin
(or
any
other
fixed
p
oint).
If
there
are
p
oints
of
S
a
rbitrarily
far
from
the
origin,
then
S
is
unb
ounded
.
164
F
or
one-variable
functions,
we
can
use
the
fact
that
a
union
of
finitely
many
closed
intervals
(or
isolated
p
oints)
is
closed.
If
the
intervals
are
each
finite
length,
then
the
union
is
also
b
ounded.
x
−
4
−
2
0
2
4
Figure
4.1:
A
finite
union
of
closed
intervals
The
same
holds
fo
r
c
urves.
A
curve
must
include
its
endp
oints,
or
have
no
endp
oints
to
b
e
closed.
x
1
x
2
Figure
4.2:
A
closed
curve
x
1
x
2
Figure
4.3:
The
graph
x
2
1
+
x
2
2
=
9
is
closed.
Generally
,
a
region
defined
b
y
a
strict
inequalit
y
will
not
contain
its
b
oundary
p
oints
and
thus
will
not
b
e
closed.
x
1
x
2
Figure
4.4:
x
2
1
+
x
2
2
≤
9
is
closed.
x
1
x
2
Figure
4.5:
x
2
1
+
x
2
2
<
9
is
not
closed.
If
multiple
inequalities
are
involved
and
relevant,
they
must
all
b
e
nonstrict
in
order
to
avoid
removing
b
ounda
ry
p
oints.
An
interesting
case
is
the
removal
of
a
single
interior
p
oint.
If
w
e
exclude
that
p
oint
from
S
,
then
that
p
oint
b
ecomes
a
b
oundary
p
oint.
Any
neighb
orhoo
d
of
it
contains
p
oints
in
S
and
a
p
oint
not
in
S
.
165
4.1.1
The
Extreme
V
alue
Theorem
x
1
x
2
Figure
4.6:
−
2
≤
x
1
≤
2
and
−
3
<
x
2
<
3
is
not
closed.
x
1
x
2
Figure
4.7:
−
2
≤
x
1
≤
2
and
−
3
≤
x
2
≤
3
and
(
x
1
,
x
2
)
=
(1
,
2)
is
not
closed.
Boundedness
is
a
simpler
concept
and
easy
to
check.
If
y
ou
can
draw
a
circle
around
S
,
it
is
b
ounded.
If
no
circle
is
big
enough,
it
is
unb
ounded.
x
1
x
2
Figure
4.8:
−
2
≤
x
1
≤
2
and
−
3
≤
x
2
≤
3
is
b
ounded.
x
1
x
2
Figure
4.9:
−
2
≤
x
1
≤
2
is
unb
ounded.
The
examples
ab
ove
a
re
a
go
o
d
wa
y
to
visually
recognize
a
closed
and
bounded
set.
What
if
w
e
have
an
equation
o
r
inequality
instead
of
a
graph?
The
following
theorems
answer
some
of
these
questions.
Theo
rem
4.3
If
f
(
x
)
is
a
continuous
function,
then
any
level
set
or
upp
er
level
set
of
f
is
closed.
166
Theo
rem
4.4
1
The
intersection
of
any
numb
er
of
closed
sets
is
closed.
2
The
union
of
any
finite
numb
er
of
closed
sets
is
closed.
The
second
theorem
is
sp
ecifically
useful
for
feasible
sets
defined
b
y
multiple
constraints.
Under
our
fo
rmulation
of
constrained
optimization,
the
feasible
set
is
an
intersection
of
level
sets
and
upp
er
level
sets.
As
long
as
the
constraint
functions
are
continuous,
the
feasible
set
will
b
e
closed.
The
extreme
value
theorem
is
a
st
anda
rd
result
in
analysis.
While
w
e
will
not
prove
it,
w
e
can
at
least
demonstrate
that
each
hyp
othesis
is
necessa
ry
.
Example
Consider
f
(
x
)
=
x
2
on
the
region
S
=
{
x
:
x
≥
0
}
.
f
(
x
)
is
continuous
and
S
is
closed
but
not
b
ounded.
f
(
x
)
grows
without
b
ound
in
S
and
has
no
maximum.
2
4
2
4
x
y
Figure
4.10:
The
graph
of
y
=
x
2
over
[0
,
∞
)
Example
Consider
f
(
x
)
=
(
x
2
if
x
<
2
0
if
x
≥
2
on
the
region
S
=
{
x
:
0
≤
x
≤
2
}
.
S
is
closed
and
bounded,
but
f
(
x
)
is
not
continuous.
f
(
x
)
app
roaches
a
value
of
4
but
never
reaches
it.
There
is
no
maximizer
a
.
Fo
r
any
a
<
2
,
there
is
a
b
closer
to
2
with
f
(
b
)
>
f
(
a
)
.
167
4.1.1
The
Extreme
V
alue
Theorem
2
4
2
4
x
y
Figure
4.11:
The
graph
of
y
=
f
(
x
)
over
[0
,
2]
Example
Consider
f
(
x
)
=
x
2
on
the
region
S
=
{
x
:
0
≤
x
<
2
}
.
f
(
x
)
is
continuous
and
S
is
b
ounded
but
not
closed.
Again
f
(
x
)
app
roaches
a
value
of
4
but
never
reaches
it.
There
is
no
maximizer
a
.
Fo
r
any
a
in
S
,
there
is
a
b
closer
to
2
with
f
(
b
)
>
f
(
a
)
.
2
4
2
4
x
y
Figure
4.12:
The
graph
of
y
=
x
2
over
[0
,
2)
168
4.1.2
Applying
the
Extreme
V
alue
Theo
rem
Find
the
maximizer(s)
of
f
(
x
1
,
x
2
)
=
x
2
1
+
x
2
1
x
2
+
2
x
2
2
subject
to
8
−
x
2
1
−
x
2
2
≥
0
.
−
4
−
2
2
4
−
4
−
2
2
4
x
1
x
2
Figure
4.13:
The
feasible
set
Solution
Apply
the
necessa
ry
conditions
fo
r
maximizers
subject
to
an
inequality
.
The
Lagrangian
is
L
(
x
1
,
x
2
,
λ
)
=
x
2
1
+
x
2
1
x
2
+
2
x
2
2
+
λ
(8
−
x
2
1
−
x
2
2
)
The
conditions
a
re
L
x
1
(
x
1
,
x
2
,
λ
)
=
2
x
1
+
2
x
1
x
2
−
2
λx
1
=
0
L
x
2
(
x
1
,
x
2
,
λ
)
=
x
2
1
+
4
x
2
−
2
λx
2
=
0
8
−
x
2
1
−
x
2
2
≥
0
,
λ
≥
0
and
λ
(8
−
x
2
1
−
x
2
2
)
=
0
W
e
solve
each
case
of
the
complementary
slackness.
1
Set
λ
=
0
2
x
1
+
2
x
1
x
2
=
0
x
2
1
+
4
x
2
=
0
2
x
1
(1
+
x
2
)
=
0
if
x
1
=
0
0
2
+
4
x
2
=
0
x
2
=
0
if
x
2
=
−
1
x
2
1
+
4(
−
1)
=
0
x
1
=
±
2
Check
that
(0
,
0
,
0)
and
(
±
2
,
−
1
,
0)
satisfy
8
−
x
2
1
−
x
2
2
≥
0
.
They
do.
2
Set
8
−
x
2
1
−
x
2
2
=
0
.
We
need
to
solve
2
x
1
+
2
x
1
x
2
−
2
λx
1
=
0
x
2
1
+
4
x
2
−
2
λx
2
=
0
8
−
x
2
1
−
x
2
2
=
0
169
4.1.2
Applying
the
Extreme
V
alue
Theo
rem
One
go
o
d
app
roach
is
to
factor
2
x
1
+
2
x
1
x
2
−
2
λx
1
=
0
2
x
1
(1
+
x
2
−
λ
)
=
0
x
1
=
0
o
r
λ
=
1
+
x
2
W
e
treat
these
tw
o
cases
separately
.
a
If
x
1
=
0
,
8
−
(0)
2
−
x
2
2
=
0
x
2
=
±
2
√
2
(0)
2
+
4(
±
2
√
2)
−
2
λ
(
±
2
√
2)
=
0
±
8
√
2
=
±
4
√
2
λ
2
=
λ
b
If
λ
=
1
+
x
2
,
x
2
1
+
4
x
2
−
2(1
+
x
2
)
x
2
=
0
x
2
1
=
2
x
2
2
−
2
x
2
8
−
(2
x
2
2
−
2
x
2
)
−
x
2
2
=
0
3
x
2
2
−
2
x
2
−
8
=
0
−
(3
x
2
+
4)(
x
2
−
2)
=
0
if
x
2
=
2
λ
=
1
+
2
x
2
1
=
8
−
(2)
2
λ
=
3
x
1
±
2
if
x
2
=
−
4
3
λ
=
1
−
4
3
x
2
1
=
8
−
4
3
2
λ
=
−
1
3
x
1
=
±
2
√
14
3
W
e
verify
that
(0
,
±
2
√
2
,
2)
,
(
±
2
,
2
,
3)
and
±
2
√
14
3
,
4
3
,
−
1
3
satisfy
λ
≥
0
.
The
third
one
does
not.
No
w
we
can
apply
the
extreme
value
theo
rem.
The
function
f
(
x
1
,
x
2
)
is
continuous.
The
feasible
set
is
the
upp
er
level
set
of
a
continuous
function:
10
−
x
2
1
−
x
2
2
≥
0
,
so
it
is
closed.
The
feasible
set
is
a
disk,
so
it
is
b
ounded.
The
extreme
value
theorem
tells
us
a
maximizer
must
exist.
Only
a
p
oint
that
satisfies
our
necessary
condition
can
b
e
that
maximizer.
T
o
determine
which
one,
170
w
e
evaluate
f
(
x
1
,
x
2
)
at
each.
f
(0
,
0)
=
(0)
2
+
(0)
2
(0)
+
2(0)
2
=
0
f
(
±
2
,
−
1)
=
(
±
2)
2
+
(
±
2)
2
(
−
1)
+
2(
−
1)
2
=
2
f
(0
,
±
2
√
2)
=
(0)
2
+
(0)
2
(
±
2
√
2)
+
2(
±
2
√
2)
2
=
16
f
(
±
2
,
2)
=
(
±
2)
2
+
(
±
2)
2
(2)
+
2(2)
2
=
20
Because
they
p
ro
duce
the
greatest
values
among
the
candidates,
the
maximizers
a
re
(2
,
2)
and
(
−
2
,
2)
.
Click to Load Applet
Figure
4.14:
The
maximizers
of
y
=
x
2
1
+
x
2
1
x
2
+
2
x
2
2
subject
to
8
−
x
2
1
−
x
2
2
≥
0
Main
Ideas
The
algebraic
expressions
tell
us
when
the
objective
function
is
continuous
and
the
feasible
set
is
closed.
Dra
w
the
feasible
set
to
decide
whether
it
is
b
ounded.
If
the
EVT
applies,
w
e
can
evaluate
f
at
all
of
the
p
oints
that
passed
our
necessary
conditions.
The
one
that
attains
the
greatest
value
is
the
maximizer.
171
4.1.3
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
extreme
value
theo
rem
(Theo
rem
4.1)
The
meaning
of
closed
and
b
ounded
(Definition
4.2)
Level
sets,
upp
er
level
sets,
and
their
intersections
are
closed
(Theorems
4.3
and
4.4)
172
4.2
The
Bo
rdered
Hessian
Goals:
1
Use
the
b
ordered
Hessian
to
identify
lo
cal
maximizers
of
a
function
subject
to
a
constraint.
4.2.1
The
Bo
rdered
Hessian
W
e
used
the
Hessian
matrix
to
recognize
maximizers
and
minimizers
in
unconstrained
optimization.
This
wo
rked
b
ecause
the
Hessian
computed
the
second
derivative
over
a
straight
line
a
+
t
v
.
An
equality
constraint
do
es
not
usually
contain
straight
lines.
The
test
for
the
second
derivative
will
need
to
take
account
of
the
shap
e
of
the
level
set
g
(
x
)
=
0
.
Definition
4.5
Given
a
constrained
optimization
problem
with
Hessian
L
,
the
matrix
H
L
(
λ,
x
)
is
called
the
b
ordered
Hessian
of
the
constrained
optimization
p
roblem.
Example
The
2
-va
riable
bordered
Hessian
has
the
form
H
L
(
λ,
x
1
,
x
2
)
=
L
λλ
L
λ
1
L
λ
2
L
λ
1
L
11
L
12
L
λ
1
L
12
L
22
=
0
g
1
g
2
g
1
f
11
+
λg
11
f
12
+
λg
12
g
2
f
21
+
λg
21
f
22
+
λg
22
Why
is
this
called
“b
o
rdered?”
The
b
ottom
right
2
×
2
minor
lo
oks
like
a
Hessian.
It
is
b
o
rdered
to
the
left
and
ab
ove
by
∇
g
.
Notice
that
w
e
have
switched
the
o
rder
of
variables
in
our
Lagrangian.
This
is
common
when
writing
the
b
ordered
Hessian.
Placing
the
b
o
rder
on
the
top
allows
us
to
write
our
condition
for
a
local
maximizer
in
a
familiar
wa
y
.
If
we
instead
p
rioritized
consistency
,
we
could
keep
the
λ
last
and
mo
dify
our
condition.
173
4.2.1
The
Bordered
Hessian
Theo
rem
4.6
Let
f
(
x
)
b
e
an
n
-va
riable
function
and
g
(
x
)
=
0
b
e
a
constraint.
If
(
ℓ,
a
)
satisfies
the
first-order
condition
of
the
Lagrangian
and
the
upp
er
left
squa
re
mino
rs
of
H
L
(
ℓ,
a
)
satisfy
(
−
1)
i
|
M
i
|
<
0
2
≤
i
≤
n
+
1
,
then
a
is
a
strict
lo
cal
maximizer
of
f
among
p
oints
on
the
constraint.
Rema
rk
W
e
do
not
test
|
M
1
|
,
since
M
1
=
[0]
.
W
e
generally
do
not
nee
d
to
w
orry
ab
out
M
2
either,
since
|
M
2
|
=
0
g
1
g
1
f
11
+
λg
11
=
−
(
g
1
)
2
Notice
the
inequalit
y
is
reversed
from
the
unconstrained
second-order
condition.
V
ariant
of
Theo
rem
4.6
Let
f
(
x
)
b
e
an
n
-variable
function
and
g
(
x
)
=
0
be
a
constraint.
If
(
ℓ,
a
)
satisfies
the
first-order
condition
of
the
Lagrangian
and
the
upp
er
left
squa
re
mino
rs
of
H
L
(
ℓ,
a
)
satisfy
|
M
i
|
<
0
2
≤
i
≤
n
+
1
,
then
a
is
a
strict
lo
cal
minimizer
of
f
among
p
oints
on
the
constraint.
Rema
rk
W
e
might
hop
e
to
quickly
extend
this
to
a
global
condition,
but
unfo
rtunately
,
H
L
only
tells
us
ab
out
the
second
derivatives
at
a
critical
point.
Deriving
a
condition
that
wo
rks
for
all
points
on
the
constraint
is
p
ossible
but
mo
re
com
plicated.
174
4.2.2
Using
the
Bo
rdered
Hessian
Let
f
(
x
1
,
x
2
)
=
x
2
1
+
x
2
2
on
the
domain
D
=
{
(
x
1
,
x
2
)
:
x
1
>
0
,
x
2
>
0
}
Find
the
critical
p
oint
of
f
on
the
constraint
x
4
1
+
x
4
2
=
2
.
Classify
it
as
a
lo
cal
maximizer
or
lo
cal
maximizer
of
f
(or
neither)
on
the
constraint.
Solution
The
Lagrangian
is
L
(
x
1
,
x
2
,
λ
)
=
x
2
1
+
x
2
2
+
λ
(2
−
x
4
1
−
x
4
2
)
Here
a
re
the
first-order
conditions.
We
can
solve
them,
using
the
fact
that
x
1
>
0
and
x
2
>
0
.
2
x
1
−
4
λx
3
1
=
0
2
x
2
−
4
λx
3
2
=
0
2
−
x
4
1
−
x
4
2
=
0
x
2
1
=
1
2
λ
x
2
2
=
1
2
λ
x
2
1
=
x
2
2
x
1
=
x
2
2
−
x
4
1
−
x
4
1
=
0
2
=
2
x
4
1
1
=
x
1
1
=
x
2
1
=
1
2
λ
λ
=
1
2
The
critical
p
oint
is
1
,
1
,
1
2
.
Switching
the
order
of
the
variables,
we
compute
H
L
(
λ,
x
1
,
x
2
)
=
0
−
4
x
3
1
−
4
x
3
2
−
4
x
3
1
2
−
12
λx
2
1
0
−
4
x
3
2
0
2
−
12
λx
2
2
H
L
1
2
,
1
,
1
=
0
−
4
−
4
−
4
−
4
0
−
4
0
−
4
The
determinants
of
the
upp
er
left
squa
re
mino
rs
are
|
M
2
|
=
0
−
4
−
4
−
4
=
−
16
|
M
3
|
=
0
−
4
−
4
−
4
−
4
0
−
4
0
−
4
=
0
+
4
−
4
0
−
4
−
4
−
4
−
4
−
4
−
4
0
=
128
175
4.2.2
Using
the
Bo
rdered
Hessian
(
−
1)
2
(
−
16)
<
0
and
(
−
1)
3
(128)
<
0
.
According
to
Theorem
4.6,
(1
,
1)
is
a
strict
lo
cal
maximizer.
∇
f
(1
,
1)
x
1
x
2
Figure
4.15:
The
level
sets
of
x
2
1
+
x
2
2
and
the
constraint
x
4
1
+
x
4
2
=
2
4.2.3
The
Multi-Constraint
Bo
rdered
Hessian
F
or
mo
re
variables,
the
test
requires
more
determinants.
Theo
rem
4.7
Let
f
(
x
)
be
an
n
-variable
function
and
{
g
j
(
x
)
=
0
}
b
e
a
set
of
m
constraint
equations.
If
(
ℓ,
a
)
satisfies
the
first-o
rder
condition
of
the
Lagrangian
and
the
upp
er
left
square
minors
of
H
L
(
ℓ,
a
)
satisfy
(
−
1)
i
|
M
i
|
<
0
2
m
≤
i
≤
n
+
m,
then
a
is
a
strict
lo
cal
maximizer
of
f
(
x
)
among
the
feasible
p
oints.
176
4.2.4
The
Bo
rdered
Hessian
and
Inequality
Constraints
If
we
combine
the
determinant
of
the
b
o
rdered
Hessian
with
the
requirement
that
ℓ
>
0
,
then
w
e
can
gua
rantee
that
f
(
x
)
<
f
(
a
)
on
the
equalit
y
constraint
and
in
the
direction
of
∇
g
(
a
)
from
the
equality
constraint.
This
guarantees
a
neighb
orhoo
d
of
some
size
in
which
a
is
a
maximizer
among
feasible
p
oints.
4.2.5
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
definition
of
the
b
o
rdered
Hessian
(Definition
4.5)
The
b
o
rdered
Hessian
determinant
test
fo
r
a
maximizer
(Theorems
4.6)
177
4.3
Sepa
ration
Goals:
1
Use
normal
vectors
to
describ
e
half-spaces
of
a
hyp
erplane.
2
Apply
optimization
by
separation
arguments.
4.3.1
Upp
er
Level
Sets
and
Optimization
Recall
our
necessa
ry
conditions
fo
r
a
maximizer
of
f
(
x
)
subject
to
g
(
x
)
≥
0
.
If
the
constraint
is
binding,
we
lea
rned
to
check
that
∇
f
(
a
)
is
parallel
to
∇
g
(
a
)
,
meaning
the
level
sets
a
re
pa
rallel.
We
also
check
that
λ
≥
0
,
meaning
that
∇
f
(
a
)
p
oints
a
wa
y
from
the
feasible
set.
Ho
wever,
these
checks
a
re
not
sufficient.
W
e
will
construct
an
example
that
passes
these
conditions
but
isn’t
a
maximizer.
Recall
the
follo
wing
definition
Definition
2.13
The
upp
er
level
sets
of
a
function
f
(
x
)
with
domain
D
a
re
the
sets
{
x
∈
D
|
f
(
x
)
≥
c
}
for
some
numb
er
c
The
lo
wer
level
sets
are
{
x
∈
D
|
f
(
x
)
≤
c
}
for
some
numb
er
c
The
follo
wing
characterization
will
be
important
to
our
arguments.
Lemma
2.14
Supp
ose
∇
f
(
x
)
=
0
,
and
x
(
t
)
is
a
path
that
passes
through
a
in
the
level
set
f
(
x
)
=
c
at
t
0
If
x
′
(
t
0
)
mak
es
an
acute
angle
with
∇
f
(
a
)
,
then
x
(
t
)
travels
into
the
upp
er
level
set
f
(
x
)
≥
c
.
If
x
′
(
t
0
)
mak
es
an
obtuse
angle
with
∇
f
(
a
)
,
then
x
(
t
)
travels
into
the
low
er
level
set
f
(
x
)
≤
c
.
178
Consider
the
follo
wing
diagram
of
a
feasible
set
and
an
upper
level
set.
The
point
a
satisfies
∇
f
(
a
)
=
−
λ
∇
g
(
a
)
and
λ
≥
0
,
but
we
can
see
that
there
are
also
feasible
p
oints
in
the
interior
of
the
upp
er
level
set,
lik
e
b
.
f
(
b
)
ma
y
be
greater
than
f
(
a
)
.
∇
f
(
a
)
∇
g
(
a
)
a
b
x
1
x
2
Figure
4.16:
An
upp
er
level
set
and
feasible
set
for
which
a
is
not
a
maximizer
W
e
w
ould
like
a
condition
to
rule
out
this
behavior.
Supp
ose
that
a
line
separates
the
upp
er
level
set
f
(
x
)
≥
c
from
the
feasible
set.
This
prevents
any
higher
values
of
f
from
app
earing
in
the
feasible
set.
a
x
1
x
2
Figure
4.17:
The
upp
er
level
set
of
f
,
a
feasible
set
,
and
a
separating
line
4.3.2
Hyp
erplanes
and
Half-Spaces
T
o
formalize
this
reasoning,
w
e
first
present
the
notation
for
a
sepa
rating
line
and
generalize
it
to
higher
dimensions.
Every
line
h
in
R
2
has
a
no
rmal
vector
v
.
h
divides
R
2
into
t
wo
half-planes
:
h
+
on
the
side
of
v
h
−
on
the
other
side.
179
4.3.2
Hyp
erplanes
and
Half-Spaces
h
h
+
h
−
v
a
x
1
x
2
Figure
4.18:
A
line
,
its
normal
vector
v
,
and
its
half
spaces
F
or
any
point
x
,
the
vector
that
p
oints
from
a
to
x
is
x
−
a.
The
angle
of
this
vecto
r
with
v
tells
us
which
half-plane
contains
x
.
The
dot
product
tells
us
whether
this
angle
is
acute,
obtuse
o
r
right.
Lemma
4.8
Supp
ose
h
is
a
line
in
R
2
with
no
rmal
vector
v
,
and
a
is
a
p
oint
on
h
.
F
or
any
p
oint
x
v
·
(
x
−
a
)
>
0
if
x
lies
in
h
+
=
0
if
x
lies
on
h
<
0
if
x
lies
in
h
−
Click to Load Applet
Figure
4.19:
The
angle
b
etw
een
a
vector
and
the
normal
vector
of
a
line
180
The
analogous
object
in
R
3
is
a
plane
h
.
It
has
a
normal
vector
and
divides
R
3
into
tw
o
half-spaces
h
+
and
h
−
.
The
sign
of
v
·
(
x
−
a
)
tests
which
half-space
x
lies
in.
Sp
ecifically
,
x
lies
on
the
plane,
if
v
·
(
x
−
a
)
=
0
.
Click to Load Applet
Figure
4.20:
A
plane
,
its
normal
vector
v
,
and
a
vecto
r
on
it
This
reasoning
wo
rks
in
any
dimension
to
define
a
set
of
p
oints
whose
displacement
from
a
kno
wn
p
oint
a
is
orthogonal
to
a
normal
vector
v
.
Definition
4.9
In
R
2
,
v
·
(
x
−
a
)
=
0
defines
a
line.
In
R
3
,
v
·
(
x
−
a
)
=
0
defines
a
plane.
In
R
n
,
v
·
(
x
−
a
)
=
0
defines
a
hyp
erplane
,
a
linear
(
n
−
1)
-dimensional
subspace.
When
dimension
is
general
or
ambiguous,
we
use
the
term
hyp
erplane
as
a
catch-all
term.
W
e
can
rewrite
our
dot
p
ro
duct
lemma
to
reflect
the
n
-dimensional
case.
V
ariant
of
Lemma
4.8
Supp
ose
h
is
a
hyp
erplane
with
no
rmal
vecto
r
v
and
a
is
a
p
oint
on
h
.
Fo
r
any
p
oint
x
v
·
(
x
−
a
)
>
0
if
x
lies
in
h
+
=
0
if
x
lies
on
h
<
0
if
x
lies
in
h
−
181
4.3.2
Hyp
erplanes
and
Half-Spaces
F
or
a
mo
re
concise
test,
we
can
let
k
=
v
·
a
.
We
can
rewrite
our
dot
product
v
·
(
x
−
a
)
=
v
·
x
−
v
·
a
=
v
·
x
−
k
and
the
lemma
v
·
x
>
k
if
x
lies
in
h
+
=
k
if
x
lies
on
h
<
k
if
x
lies
in
h
−
4.3.3
Optimization
b
y
Separation
In
Figure
4.17,
a
line
sepa
rates
the
upp
er
level
set
of
f
and
the
feasible
set.
The
p
oint
where
they
meet
app
ea
rs
to
b
e
a
m
aximizer
of
f
.
We
now
have
the
notation
to
formalize
this
argument.
Theo
rem
4.10
Supp
ose
we
have
a
continuous
objective
function
f
(
x
)
and
some
constraint(s).
Supp
ose
x
∗
is
in
the
feasible
set
and
let
f
(
x
∗
)
=
c
.
If
there
is
a
hyp
erplane
h
such
that
1
the
upp
er
level
set
f
(
x
)
≥
c
has
no
p
oints
in
h
−
2
the
feasible
set
has
no
p
oints
in
h
+
then
x
∗
is
a
maximizer
of
f
(
x
)
subject
to
the
constraint(s).
This
metho
d
is
called
optimization
b
y
sepa
ration
and
h
is
a
sepa
rating
hyperplane
.
It
is
a
sufficient
condition,
but
it
requires
us
to
kno
w
the
p
rop
er
hyp
erplane
h
.
Notice
b
oth
the
upper
level
set
and
the
feasible
set
may
contain
have
p
oints
on
h
.
In
fact,
x
∗
is
one
of
them.
What
if
they
share
an
additional
p
oint
a
in
h
?
a
must
lie
on
the
b
ounda
ry
of
the
upper
level
set,
a
rbitrarily
close
to
p
oints
fo
r
which
f
(
x
)
<
c
.
Since
f
(
x
)
is
continuous,
f
(
a
)
=
c
.
We
can
still
conclude
that
x
∗
is
a
maximizer,
but
not
a
unique
maximizer.
182
h
a
b
x
1
x
2
Figure
4.21:
An
upp
er
level
set
of
f
and
the
feasible
set
intersecting
multiple
times
along
their
sepa
rating
line
W
e
can
mo
dify
this
theorem
b
y
stipulating
that
x
∗
is
either
1
the
only
p
oint
of
the
upp
er
level
set
on
h
or
2
the
only
feasible
p
oint
on
h
.
In
this
case
w
e
can
conclude
that
x
∗
is
the
unique
maximizer.
If
we
want
to
avoid
talking
directly
ab
out
the
hyp
erplane
h
,
we
can
think
of
v
·
x
as
a
function
of
x
.
Its
level
sets
are
the
hyp
erplanes
v
·
x
=
k
.
The
value
of
v
·
x
increases
as
w
e
travel
in
the
direction
of
v
.
k
>
k
<
k
v
x
∗
x
1
x
2
Figure
4.22:
Some
level
sets
of
v
·
x
that
intersect
an
upp
er
level
set
of
f
and
a
feasible
set
Our
separation
argument
can
b
e
rephrased
to
require
that
the
upp
er
level
set
meets
only
higher
values
of
v
·
x
while
the
feasible
set
meets
only
low
er
values
of
v
·
x
.
We
can
verify
this
algebraically
.
Supp
ose
x
b
e
a
p
oint
in
the
upp
er
level
set
of
f
(
x
)
.
Lemma
4.8
states
that
x
do
es
not
lie
in
h
−
,
if
v
·
(
x
−
x
∗
)
≥
0
v
·
x
≥
v
·
x
∗
183
4.3.3
Optimization
by
Separation
This
inequalit
y
indicates
that
the
value
of
v
·
x
at
x
∗
is
less
than
or
equal
to
the
value
at
any
other
p
oint
in
the
upp
er
level
set.
In
other
wo
rds,
x
∗
minimizes
the
function
v
·
x
subject
to
the
constraint
f
(
x
)
≥
c
.
T
o
sho
w
that
the
feasible
set
has
no
p
oints
in
h
+
w
e
can
check
that
fo
r
each
feasible
x
:
v
·
(
x
−
x
∗
)
≤
0
v
·
x
≤
v
·
x
∗
This
means
that
the
value
of
v
·
x
at
x
∗
is
greater
than
o
r
equal
to
the
value
at
any
other
x
in
the
feasible
set.
In
other
wo
rds,
x
∗
maximizes
the
function
v
·
x
subject
to
the
constraints
of
the
original
optimization
p
roblem.
W
e
can
restate
Theorem
4.10
in
terms
of
this
new
vo
cabulary
.
Alternate
F
ormulation
of
Theo
rem
4.10
Supp
ose
we
have
a
continuous
objective
function
f
and
some
constraints.
If
f
(
x
∗
)
=
c
and
fo
r
some
v
=
0
w
e
have
1
x
∗
minimizes
v
·
x
subject
to
f
(
x
)
≥
c
2
x
∗
maximizes
v
·
x
subject
to
the
constraints
Then
x
∗
maximizes
f
(
x
)
given
the
constraints.
W
e
can
see
that
this
is
a
condition
for
a
maximizer
without
kno
wing
much
ab
out
hyp
erplanes.
If
the
feasible
p
oints
all
have
values
of
v
·
x
less
than
or
equal
to
k
and
the
upp
er
level
sets
all
have
values
greater
than
o
r
equal
to
k
,
then
the
feasible
set
and
the
upp
er
level
can
only
meet
where
v
·
x
=
k
.
4.3.4
The
T
angent
Hyp
erplane
An
observant
reader
will
have
noticed
that
our
separating
lines
have
alw
a
ys
b
een
tangent
to
the
level
curve
f
(
x
)
=
c
.
This
is
not
a
coincidence.
It
o
ccurs
in
higher
dimensions
as
well,
where
the
tangent
lines
b
ecome
tangent
hyp
erplanes.
Definition
4.11
Supp
ose
a
p
oint
a
lies
on
level
set
f
(
x
)
=
c
,
and
∇
f
(
a
)
=
0
.
The
tangent
hyperplane
to
f
(
x
)
=
c
at
a
is
the
hyp
erplane
containing
of
all
the
tangent
lines
to
f
(
x
)
=
c
at
a
.
Its
normal
vector
is
∇
f
(
a
)
.
Its
equation
is
∇
f
(
a
)
·
(
x
−
a
)
=
0
184
If
∇
f
(
a
)
=
0
,
the
only
candidate
for
a
separating
hyp
erplane
is
the
tangent
hyp
erplane
to
f
(
x
)
=
c
.
Any
other
hyp
erplane
w
ould
contain
some
vector
w
such
that
∇
f
(
a
)
·
w
>
0
,
meaning
it
would
cut
into
the
upp
er
level
set
of
f
at
a
.
On
the
other
hand,
w
e
have
seen
examples
where
a
tangent
line
fails
to
keep
the
upp
er
level
set
out
of
h
−
.
∇
f
(
a
)
a
x
1
x
2
Figure
4.23:
An
upp
er
level
set
that
crosses
its
own
tangent
line
F
or
this
reason,
optimization
by
separation
is
only
realistic
fo
r
functions
with
nicely
shap
ed
level
sets.
W
e
will
discover
a
class
of
such
functions
in
the
next
section.
4.3.5
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
Hyp
erplanes
and
the
dot
p
ro
duct
test
(Definition
4.9
and
Lemma
4.8)
Optimization
b
y
separation,
both
formulations
(Theo
rem
4.10
and
its
variant)
Equation
of
a
tangent
hyp
erplane
to
a
level
set
(Definition
4.11)
185
4.4
Concave
Programming
Goals:
1
Understand
the
geometric
consequences
of
concavity
on
constrained
optimizations.
2
Apply
sufficient
conditions
for
maximizers
of
concave
functions
with
concave
constraints.
4.4.1
Kuhn-T
uck
er
Sufficiency
When
is
a
separation
argument
p
ossible?
When
do
es
the
tangent
hyp
erplane
separate
the
upp
er
level
set
from
the
feasible
set?
Concavity
is
one
wa
y
to
guarantee
this.
There
is
an
a
rich
theory
and
to
olset
fo
r
optimization
that
is
sp
ecific
to
concave
functions.
This
is
the
field
of
concave
programming
.
Here
is
our
main
result.
Theo
rem
4.12
Given
an
objective
function
f
(
x
)
and
constraints
g
j
(
x
)
≥
0
,
suppose
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
ucker
conditions:
L
x
i
(
x
∗
,
λ
∗
)
=
0
fo
r
all
i
g
j
(
x
∗
)
≥
0
and
λ
∗
j
≥
0
with
complementary
slackness
for
each
j
If
f
(
x
)
and
the
g
j
(
x
)
are
all
concave
functions,
then
x
∗
maximizes
f
(
x
)
,
subject
to
the
constraints.
It
is
wo
rth
understanding
the
full
argument
of
this
theorem
that
follows,
but
the
essential
ideas
are:
1
The
upp
er
level
sets
of
a
concave
function
are
convex
sets.
This
applies
not
only
to
f
(
x
)
≥
c
but
also
to
the
feasible
set,
which
is
an
intersection
of
the
upp
er
level
sets:
g
j
(
x
)
≥
0
.
2
Using
this
convexity
,
we
can
sho
w
that
the
tangent
hyp
erplane
to
f
(
x
)
=
c
sepa
rates
the
upper
level
set
f
(
x
)
≥
c
from
the
feasible
set.
186
∇
f
(
x
∗
)
x
1
x
2
Figure
4.24:
A
feasible
region
defined
by
multiple
concave
inequalities
separated
from
the
upp
er
level
set
of
f
by
a
tangent
line
Rema
rk
This
sufficient
condition
is
esp
ecially
p
o
werful
b
ecause
once
we
find
a
p
oint
that
satisfies
Kuhn-T
ucker,
w
e
can
stop
lo
oking.
If,
by
luck
o
r
cleverness,
we
find
a
solution
in
the
first
case
of
complementary
slackness
w
e
try
,
then
we
have
found
the
maximizer.
We
need
not
examine
the
other
cases
at
all.
4.4.2
Applying
the
Kuhn-T
uck
er
Sufficient
Condition
Find
the
maximum
value
of
f
(
x
1
,
x
2
)
=
4
x
1
+
x
2
on
the
region
S
=
{
(
x
1
,
x
2
)
:
25
−
7
x
1
−
x
2
≥
0
,
x
2
≥
0
,
x
2
+
x
2
1
−
5
≥
0
,
and
25
−
x
2
1
−
x
2
2
≥
0
}
x
2
+
x
2
1
−
5
=
0
25
−
x
2
1
−
x
2
2
=
0
25
−
7
x
1
−
x
2
=
0
x
1
x
2
Figure
4.25:
The
feasible
set
S
187
4.4.2
Applying
the
Kuhn-T
uck
er
Sufficient
Condition
Solution
Increasing
x
1
seems
to
b
e
the
most
imp
o
rtant
factor
in
increasing
f
,
but
larger
x
2
helps
to
o.
We
should
dra
w
and
examine
the
region
S
.
The
set
x
2
+
x
2
1
−
5
≥
0
app
ears
to
b
e
nonconvex.
On
the
other
hand,
based
on
our
diagram,
that
inequalit
y
app
ears
not
to
bind.
We
will
instead
maximize
f
over
the
region
T
=
{
(
x
1
,
x
2
)
:
25
−
7
x
1
−
x
2
≥
0
,
x
2
≥
0
,
and
25
−
x
2
1
−
x
2
2
≥
0
}
S
is
a
subset
of
T
so
a
maximizer
in
T
that
lies
in
S
is
a
maximizer
in
S
.
This
mo
dified
p
roblem
has
the
follo
wing
Lagrangian:
L
(
x
1
,
x
2
,
λ
1
,
λ
2
,
λ
3
)
=
4
x
1
+
x
2
+
λ
1
(25
−
7
x
1
−
x
2
)
+
λ
2
x
2
+
λ
3
(25
−
x
2
1
−
x
2
2
)
.
The
Kuhn-T
uck
er
conditions
are
L
x
1
=
4
−
7
λ
1
−
2
λ
3
x
1
=
0
L
x
2
=
1
−
λ
1
+
λ
2
−
2
λ
3
x
2
25
−
7
x
1
−
x
2
≥
0
λ
1
≥
0
λ
1
(25
−
7
x
1
−
x
2
)
=
0
x
2
≥
0
λ
2
≥
0
λ
2
x
2
=
0
25
−
x
2
1
−
x
2
2
≥
0
λ
3
≥
0
λ
3
(25
−
x
2
1
−
x
2
2
)
=
0
W
e
might
guess
that
25
−
7
x
1
−
x
2
=
0
and
25
−
x
2
1
−
x
2
2
=
0
a
re
binding
at
the
maximizer
and
x
2
≥
0
is
not.
Based
on
that
guess,
w
e
first
consider
the
case
where
λ
2
=
0
.
Next
we
use
the
binding
constraints
to
solve
fo
r
x
1
and
x
2
.
25
−
7
x
1
−
x
2
=
0
25
−
x
2
1
−
x
2
2
=
0
25
−
7
x
1
=
x
2
25
−
x
2
1
−
(25
−
7
x
1
)
2
=
0
25
−
x
2
1
−
49
x
2
1
+
350
x
1
−
625
=
0
−
50
x
2
1
+
350
x
1
−
600
=
0
−
50(
x
1
−
3)(
x
1
−
4)
=
0
x
1
=
3
o
r
4
25
−
7(3)
=
4
=
x
2
o
r
25
−
7(4)
=
−
3
=
x
2
W
e
can
lo
ok
ahead
and
see
that
x
2
=
−
3
will
not
satisfy
our
inequalities,
so
we
set
(
x
1
,
x
2
)
=
(3
,
4)
.
W
e
use
the
remaining
equations
to
solve
for
the
λ
1
and
λ
3
.
4
−
7
λ
1
−
2
λ
3
x
1
=
0
1
−
λ
1
−
2
λ
3
x
2
=
0
4
−
7
λ
1
−
6
λ
3
=
0
1
−
λ
1
−
8
λ
3
=
0
1
−
8
λ
3
=
λ
1
4
−
7
+
56
λ
3
−
6
λ
3
=
0
50
λ
3
=
3
λ
3
=
3
50
1
−
24
50
=
λ
1
13
25
=
λ
1
188
Our
solution
is
3
,
4
,
13
25
,
0
,
3
50
.
We
check,
and
it
satisfies
the
remaining
inequalities.
λ
1
=
13
25
≥
0
x
2
=
4
≥
0
λ
3
=
3
50
≥
0
There
are
7
other
cases
of
complementary
slackness
to
check,
but
we
can
avoid
them.
Our
sufficiency
theo
rem
applies
to
(3
,
4)
.
We
only
need
to
check
fo
r
concavity
of
the
relevant
functions.
4
x
1
+
x
2
is
concave
b
ecause
it
is
linea
r.
25
−
7
x
1
−
x
2
is
concave
b
ecause
it
is
linea
r.
x
2
is
concave
b
ecause
it
is
linea
r.
25
−
x
2
1
−
x
2
2
is
strictly
concave
b
ecause
its
hessian
is
−
2
0
0
−
2
,
which
is
negative
definite
for
all
(
x
1
,
x
2
)
.
W
e
ignored
the
constraint
x
2
+
x
2
1
−
5
≥
0
.
Based
on
these
checks,
the
theo
rem
applies,
and
(3
,
4)
must
be
a
maximizer
of
f
on
T
.
Since
(3
,
4)
satisfies
x
2
+
x
2
1
−
5
=
4
+
9
−
5
≥
0
it
lies
in
S
as
w
ell.
Since
S
⊆
T
,
we
conclude
(3
,
4)
is
also
a
maximizer
of
f
in
S
.
The
maximum
value
is
f
(3
,
4)
=
4(3)
+
(4)
=
16
.
Main
Idea
The
most
common
metho
ds
to
check
concavit
y
a
re
linea
r
functions
are
concave
functions
with
negative
definite
Hessians
a
re
strictly
concave
If
the
functions
are
concave,
and
w
e
guess
the
right
combination
of
binding
constraints,
then
w
e
only
need
to
check
the
Kuhn-T
uck
er
conditions
for
that
case.
189
4.4.2
Applying
the
Kuhn-T
uck
er
Sufficient
Condition
Rema
rk
The
underlying
separation
a
rgument
w
as
betw
een
the
convex
upper
level
set
f
(
x
1
,
x
2
)
≥
16
and
the
convex
set
T
=
{
(
x
1
,
x
2
)
:
25
−
7
x
1
−
x
2
≥
0
,
x
2
≥
0
and
25
−
x
2
1
−
x
2
2
≥
0
}
Since
(3
,
4)
is
a
maximizer
on
T
,
it
is
a
maximizer
on
S
.
∇
f
(3
,
4)
x
1
x
2
Figure
4.26:
A
convex
set
T
containing
S
,
an
upp
er
level
set
,
and
a
separating
line
The
reasoning
of
this
example
also
suggests
the
following
variant.
V
ariant
of
Theo
rem
4.12
Given
an
objective
function
f
(
x
)
and
constraints
g
j
(
x
)
≥
0
,
supp
ose
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
ucker
conditions.
If
f
(
x
)
and
the
binding
g
j
(
x
)
are
all
concave,
then
x
∗
maximizes
f
(
x
)
,
subject
to
the
constraints.
4.4.3
Proving
Kuhn-T
uck
er
Sufficiency
This
will
b
e
an
extensive
a
rgument
with
many
parts,
but
there
are
t
wo
reasons
to
give
it
our
attention.
1
Some
of
the
lemmas
along
the
wa
y
are
useful
or
interesting
in
their
own
right.
2
There
are
many
wa
ys
to
mo
dify
this
argument
for
different
circumstances.
If
you
understand
the
o
riginal
argument,
y
ou
can
understand
or
even
generate
these
variants.
190
Our
first
step
is
to
understand
the
upp
er
level
sets
of
concave
functions.
Lemma
4.13
If
f
(
x
)
is
a
concave
function,
then
the
upp
er
level
set
f
(
x
)
≥
c
is
a
convex
set.
W
e
can
argue
this
directly
from
the
definition
of
an
upp
er
level
set,
the
definition
of
a
convex
set,
and
the
follo
wing
inequality
fo
r
concave
functions:
f
((1
−
t
)
a
+
t
b
)
≥
(1
−
t
)
f
(
a
)
+
tf
(
b
)
.
Click to Load Applet
Figure
4.27:
The
segment
from
a
to
b
in
the
upp
er
level
set
of
f
(
x
)
Pro
of
Let
a
and
b
be
p
oints
in
the
upp
er
level
set
f
(
x
)
≥
c
.
We
will
show
that
the
segment
b
etw
een
a
and
b
also
lies
in
this
upp
er
level
set.
The
p
oints
on
the
segment
from
a
to
b
a
re
parametrized
b
y
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
If
w
e
evaluate
f
along
these
p
oints
we
get
f
((1
−
t
)
a
+
t
b
)
≥
(1
−
t
)
f
(
a
)
+
tf
(
b
)
(
f
is
concave
)
≥
(1
−
t
)
c
+
tc
(
a
and
b
lie
in
the
upp
er
level
set
)
≥
c
Since
f
((1
−
t
)
a
+
t
b
)
≥
c
,
(1
−
t
)
a
+
t
b
lies
in
the
upp
er
level
set.
Since
this
is
true
for
every
t
b
etw
een
0
and
1
,
the
entire
segment
from
a
to
b
lies
in
the
upp
er
level
set.
Since
this
is
true
for
all
a
and
b
in
the
upp
er
level
set,
w
e
conclude
that
the
upp
er
level
set
is
convex.
191
4.4.3
Proving
Kuhn-T
ucker
Sufficiency
W
e
now
know
that
if
f
is
concave,
then
its
upp
er
level
sets
are
convex.
Fo
r
a
separation
argument,
w
e
also
would
lik
e
to
know
they
do
not
cross
their
tangent
hyperplane.
Fo
rtunately
,
this
is
the
case.
Lemma
4.14
Let
f
(
x
)
b
e
a
continuously
differentiable
function
a
b
e
a
p
oint
such
that
f
(
a
)
=
c
and
∇
f
(
a
)
=
0
h
b
e
the
tangent
hyp
erplane
to
f
(
x
)
=
c
at
a
.
If
upp
er
level
set
f
(
x
)
≥
c
is
convex,
then
it
do
es
not
intersect
h
−
.
Pro
of
Supp
ose
b
is
any
p
oint
in
the
upper
level
set
f
(
x
)
≥
c
.
W
e
w
ant
to
show
that
b
do
es
not
lie
in
h
−
.
Since
the
upp
er
level
set
is
convex,
the
entire
segment
x
(
t
)
=
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
=
a
+
t
(
b
−
a
)
must
lie
within
it.
This
specifically
requires
that
the
direction
vector
x
′
(0)
=
b
−
a
p
oints
into
the
upp
er
level
set
at
a
.
Lemma
2.14
tells
us
that
if
∇
f
(
a
)
·
x
′
(0)
<
0
,
then
x
(
t
)
travels
into
the
low
er
level
set.
Thus
∇
f
·
(
b
−
a
)
≥
0
.
By
Lemma
4.8,
b
do
es
not
lie
in
h
−
.
∇
f
(
a
)
a
b
x
1
x
2
Figure
4.28:
The
vector
b
−
a
in
a
convex
upp
er
level
set
making
an
acute
angle
with
∇
f
(
a
)
∇
f
(
a
)
a
b
x
1
x
2
Figure
4.29:
The
vector
b
−
a
leaving
an
upp
er
level
set
and
making
an
obtuse
angle
with
∇
f
(
a
)
No
w
we
turn
our
attention
to
the
constraint
functions.
Lemma
4.13
applies
as
well
to
each
g
j
as
it
do
es
to
f
.
If
each
function
g
j
(
x
)
is
concave,
then
each
upp
er
level
set
g
j
(
x
)
≥
0
is
convex.
With
some
very
familia
r
lo
oking
conditions
on
∇
g
j
(
a
)
,
we
can
ensure
that
the
feasible
set
sta
ys
on
one
side
of
the
tangent
hyp
erplane
to
f
(
x
)
=
c
.
192
Co
rollary
4.15
Let
f
(
a
)
=
c
and
∇
f
(
a
)
=
0
.
Let
h
b
e
the
tangent
hyp
erplane
to
f
(
x
)
=
c
at
a
.
If
1
The
upp
er
level
sets
g
j
(
x
)
≥
0
are
convex
2
∇
f
(
a
)
=
−
X
j
λ
j
∇
g
j
(
a
)
fo
r
some
numb
ers
λ
j
3
Fo
r
each
j
,
λ
j
≥
0
if
g
j
(
a
)
=
0
and
λ
j
=
0
otherwise
then
the
intersection
of
the
upp
er
level
sets
g
j
(
x
)
≥
0
do
es
not
intersect
h
+
.
The
geometry
is
easiest
to
visualize
if
we
consider
a
single
constraint
g
(
x
)
≥
0
.
In
this
case,
∇
f
(
a
)
=
−
λ
∇
g
(
a
)
.
Fo
r
any
b
in
g
(
x
)
≥
0
,
the
vector
b
−
a
makes
an
acute
angle
with
∇
g
(
a
)
.
Thus
it
mak
es
an
obtuse
angle
with
∇
f
(
a
)
,
meaning
b
lies
in
h
−
.
∇
f
(
a
)
∇
g
(
a
)
a
b
x
1
x
2
Figure
4.30:
The
vector
b
−
a
in
a
convex
feasible
set
making
an
obtuse
angle
with
∇
f
(
a
)
F
or
multiple
constraints,
angles
a
re
too
difficult
to
discern.
The
dot
p
ro
duct
p
rovides
a
cleaner
a
rgument.
Pro
of
Let
b
b
e
a
p
oint
in
the
intersection
of
the
upp
er
level
sets.
F
or
each
g
j
(
x
)
that
is
binding
at
a
,
we
can
apply
Lemma
4.14
to
the
convex
upp
er
level
set
g
j
(
x
)
≥
0
.
As
that
proof
argued,
we
can
conclude
that
∇
g
j
·
(
b
−
a
)
≥
0
.
By
Lemma
4.8,
the
sign
of
∇
f
(
a
)
·
(
b
−
a
)
will
tell
us
whether
b
lies
in
h
+
.
W
e
can
substitute
as
193
4.4.3
Proving
Kuhn-T
ucker
Sufficiency
follo
ws:
∇
f
(
a
)
·
(
b
−
a
)
=
−
X
j
λ
j
∇
g
j
(
a
)
·
(
b
−
a
)
=
−
X
j
λ
j
∇
g
j
(
a
)
·
(
b
−
a
)
W
e
can
determine
the
sign
of
each
term
of
the
summation.
1
Fo
r
nonbinding
g
j
,
λ
j
=
0
.
2
Fo
r
binding
g
j
,
λ
j
≥
0
and
∇
g
j
(
a
)
·
(
b
−
a
)
≥
0
.
W
e
conclude
that
∇
f
(
a
)
·
(
b
−
a
)
≤
0
.
Thus
b
do
es
not
lie
in
h
+
.
W
e
now
have
the
ingredients
to
p
rove
Theo
rem
4.12.
There
a
re
tw
o
cases
of
complementary
slackness
to
consider
when
p
roving
this
theorem.
1
If
all
λ
∗
j
=
0
then
∇
f
(
x
∗
)
=
0
.
Since
f
is
concave,
a
variant
of
Corolla
ry
1.23
tells
us
that
x
∗
is
an
unconstrained
maximizer
of
f
.
Co
rollary
2.2
tells
us
that
since
it
lies
in
the
feasible
set,
is
also
a
maximizer
there.
2
If
some
λ
∗
j
>
0
then
w
e
can
put
together
the
results
of
this
section
to
conclude
the
upper
level
set
of
f
and
the
feasible
set
lie
on
opp
osite
sides
of
h
.
This
is
the
requirement
fo
r
optimization
b
y
separation
established
in
Theo
rem
4.10,
so
x
∗
is
a
maximizer.
Here
is
a
diagram
containing
the
reasoning
fo
r
each
case
.
binding
λ
∗
j
>
0
all
λ
∗
j
=
0
∇
f
(
x
∗
)
=
0
f
(
x
)
is
concave
f
(
x
)
≥
c
is
convex
f
(
x
)
≥
c
do
es
not
intersect
h
−
g
j
(
x
)
are
concave
g
j
(
x
)
≥
0
a
re
convex
∇
f
(
x
∗
)
=
−
P
λ
∗
j
∇
g
j
(
x
∗
)
feasible
set
do
es
not
intersect
h
+
x
∗
is
a
max
x
∗
is
a
max
in
the
feasible
set
Lem
4.13
Lem
4.14
Lem
4.13
Cor
4.15
Thm
4.10
Cor
1.38
Cor
2.2
194
4.4.4
V
ariants
of
Kuhn-T
uck
er
Sufficiency
In
a
separation
argument,
the
upp
er
level
set
and
the
feasible
set
may
meet
at
many
p
oints
in
h
.
Fo
r
example,
we
could
have
an
entire
line
segment
of
intersection,
and
every
point
on
that
segment
would
satisfy
Kuhn-T
uck
er.
h
∇
f
(
x
∗
)
x
1
x
2
Figure
4.31:
An
upp
er
level
set
of
f
and
the
feasible
set
intersecting
along
a
separating
line
This
pathology
only
exists
if
b
oth
sets
contain
multiple
points
on
the
separating
hyp
erplane.
If
one
of
the
sets
is
strictly
convex,
this
will
not
happ
en.
W
e
can
achieve
this
with
strict
concavity
.
Each
lemmas
w
e
used
has
a
variant
for
strict
concavity
.
V
ariant
of
Lemma
4.13
If
f
(
x
)
is
a
strictly
concave
function
then
the
upp
er
level
set
f
(
x
)
≥
c
is
strictly
convex.
V
ariant
of
Lemma
4.14
Supp
ose
w
e
have
a
continuously
differentiable
function
f
(
x
)
a
p
oint
a
such
that
f
(
a
)
=
c
and
∇
f
(
a
)
=
0
the
tangent
hyp
erplane
to
f
(
x
)
=
c
at
a
,
denoted
h
.
If
upp
er
level
set
f
(
x
)
≥
c
is
strictly
convex,
then
it
lies
entirely
within
h
+
,
except
fo
r
the
point
a
.
195
4.4.4
Va
riants
of
Kuhn-T
ucker
Sufficiency
V
ariant
of
Co
rollary
4.15
Let
f
(
a
)
=
c
and
∇
f
(
a
)
=
0
.
Let
h
b
e
the
tangent
hyp
erplane
to
f
(
x
)
=
c
at
a
.
If
1
The
upp
er
level
sets
g
j
(
x
)
≥
0
are
strictly
convex
2
∇
f
(
a
)
=
−
X
j
λ
j
∇
g
j
(
a
)
fo
r
some
numb
ers
λ
j
3
Fo
r
each
j
,
λ
j
≥
0
if
g
j
(
a
)
=
0
and
λ
j
=
0
otherwise
then
the
intersection
of
the
upp
er
level
sets
g
j
(
x
)
≥
0
lies
entirely
within
h
−
except
the
p
oint
a
.
W
e
can
use
these
lemmas
to
guarantee
a
unique
maximizer.
We
can
either
k
eep
the
upp
er
level
set
o
r
the
feasible
set
from
having
multiple
p
oints
on
h
.
Theo
rem
4.12
for
a
Unique
Maximizer
Given
an
objective
function
f
(
x
)
and
constraints
g
j
(
x
)
≥
0
,
supp
ose
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
ucker
conditions.
If
f
(
x
)
and
the
binding
g
j
(
x
)
are
concave,
and
additionally
either
1
f
(
x
)
is
strictly
concave,
or
2
at
least
one
constraint
binds
and
the
binding
g
j
(
x
)
are
strictly
concave,
then
x
∗
is
the
unique
maximizer
of
f
(
x
)
,
subject
to
the
constraints.
Another
avenue
of
mo
dification
is
to
include
equalit
y
constraints.
One
metho
d
is
to
treat
the
equalit
y
constraint
as
an
inequalit
y
constraint.
The
level
set
is
a
subset
of
the
upp
er
level
set.
By
Corolla
ry
2.2,
a
maximizer
over
an
inequality
constraint
that
happens
to
bind
is
also
a
maximizer
over
the
equality
constraint.
This
requires
the
constraint
function
to
b
e
concave
and
its
λ
j
to
b
e
p
ositive.
Alternately
,
if
the
equalit
y
constraint
is
linear,
then
its
level
set
is
a
hyperplane,
which
is
convex.
Thus
the
feasible
set
is
still
convex.
Corolla
ry
4.15
still
holds
regardless
of
the
sign
of
λ
j
,
because
∇
g
j
(
a
)
·
(
b
−
a
)
=
0
.
We
can
formalize
this
p
ossibility
with
the
following
variant.
Theo
rem
4.12
with
Equality
Constraints
Given
an
objective
function
f
(
x
)
and
constraints
of
the
forms
g
j
(
x
)
≥
0
and
g
j
(
x
)
=
0
,
supp
ose
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
uck
er
conditions.
If
f
(
x
)
and
the
binding
g
j
(
x
)
are
all
concave,
and
the
equality
constraints
a
re
linea
r,
then
x
∗
maximizes
f
(
x
)
,
given
the
constraints.
196
This
section
contains
just
a
few
p
ossibilities.
There
a
re
other
w
ays
to
mo
dify
our
sufficiency
theorems
that
allo
w
them
to
apply
in
more
situations
or
to
draw
stronger
conclusions.
4.4.5
Slater’s
Condition
Concavit
y
has
produced
a
convenient
sufficient
condition
fo
r
constrained
optimization.
It
can
also
help
simplify
our
necessa
ry
conditions.
Recall
that
the
Kuhn-T
ucker
conditions
are
a
necessary
condition,
but
they
have
an
exception.
Theorem
2.23
states
that
a
maximizer
a
must
either
satisfy
the
Kuhn-T
ucker
conditions
o
r
have
binding
∇
g
j
(
a
)
that
a
re
linearly
dep
endent.
Checking
for
linearly
dependent
gradient
vecto
rs
is
difficult,
esp
ecially
if
we
do
not
kno
w
where
to
lo
ok
fo
r
them.
Slater’s
condition
allows
us
to
skip
that
check
in
some
situations.
Slater’s
condition
requires
that
the
objective
and
constraint
functions
are
concave.
It
also
requires
the
feasible
set
to
have
an
interio
r,
rather
than
collapsing
to
a
lo
wer
dimensional
set.
F
ormally
,
it
demands
the
existence
of
a
p
oint
b
in
the
interior.
This
p
oint
is
not
sp
ecial.
If
the
feasible
region
has
an
interio
r,
then
you
can
identify
infinitely
many
p
oints
inside
it
b
y
lo
oking
at
a
diagram.
Theo
rem
4.16
[Slater’s
Condition]
Supp
ose
that
the
functions
f
(
x
)
and
g
j
(
x
)
satisfy
Slater’s
condition
:
1
f
(
x
)
and
the
g
j
(
x
)
are
all
concave
functions
2
there
is
at
least
one
p
oint
b
in
the
interio
r
of
the
feasible
set,
meaning
g
j
(
b
)
>
0
for
all
j
If
a
is
a
lo
cal
maximizer
of
f
(
x
)
subject
to
g
j
(
x
)
≥
0
,
then
a
satisfies
the
Kuhn-T
ucker
conditions.
Rema
rk
Slater’s
condition
checks
fo
r
concavity
,
just
like
Theorem
4.12
(our
sufficient
condition).
This
means
that
if
f
(
x
)
and
g
j
(
x
)
satisfy
Slater’s
condition,
then
the
Kuhn-T
uck
er
conditions
are
b
oth
necessa
ry
and
sufficient
fo
r
a
maximizer.
Slater’s
condition
doesn’t
prevent
the
gradient
vectors
of
the
binding
g
j
from
b
eing
linea
rly
dep
en-
dent.
Instead
it
a
rgues
that
even
if
the
gradient
vectors
are
linea
rly
dep
endent
at
a
,
the
Kuhn-T
ucker
conditions
must
still
b
e
satisfied
in
o
rder
fo
r
a
to
b
e
a
maximizer.
Slater’s
condition
can
b
e
strengthened
to
handle
equality
constraints
in
the
case
that
those
are
linear.
197
4.4.5
Slater’s
Condition
V
ariant
of
Theo
rem
4.16
Supp
ose
that
w
e
a
re
maximizing
f
(
x
)
subject
to
constraints
of
the
fo
rm
g
j
(
x
)
≥
0
o
r
g
j
(
x
)
=
0
.
Supp
ose
further
that
1
f
(
x
)
and
the
inequality
constraint
functions
g
j
(
x
)
are
concave
2
the
equality
constraint
functions
g
j
(
x
)
are
linea
r
3
there
is
at
least
one
p
oint
b
in
the
relative
interio
r
of
the
feasible
set,
meaning
g
j
(
b
)
>
0
for
all
inequality
constraints
g
j
(
b
)
=
0
fo
r
all
equalit
y
constraints
If
a
is
a
lo
cal
maximizer,
then
it
satisfies
the
Kuhn-T
ucker
conditions.
4.4.6
The
Sepa
rating
Hyp
erplane
Theorem
Throughout
this
section,
we
have
used
tangent
hyp
erplanes
to
sepa
rate
convex
sets.
It
is
p
ossible
to
mak
e
these
arguments
without
sepa
ration,
how
ever.
Here
is
a
short
alternative
proof
of
Theorem
4.12.
Pro
of
x
∗
satisfies
the
first-o
rder
condition
of
a
stubb
orn
lagrangian
,
in
which
λ
is
held
constant
at
λ
∗
.
L
∗
(
x
)
=
f
(
x
)
+
X
j
λ
∗
j
g
j
(
x
)
Since
λ
∗
j
≥
0
,
L
∗
is
a
sum
of
concave
functions.
By
Theo
rem
1.20,
L
∗
is
concave.
Thus
x
∗
is
a
maximizer
of
L
∗
.
x
∗
also
satisfies
λ
∗
j
g
j
(
x
∗
)
=
0
b
y
complementa
ry
slackness.
If
w
e
compa
re
x
∗
and
any
other
feasible
p
oint
a
,
we
have
L
∗
(
x
∗
)
≥
L
∗
(
a
)
f
(
x
∗
)
+
X
j
λ
∗
j
g
j
(
x
∗
)
≥
f
(
a
)
+
X
j
λ
∗
j
g
j
(
a
)
f
(
x
∗
)
+
0
≥
f
(
a
)
+
X
j
λ
∗
j
|{z}
≥
0
g
j
(
a
)
|
{z
}
≥
0
f
(
x
∗
)
≥
f
(
a
)
Since
this
holds
fo
r
any
feasible
a
,
we
can
conclude
that
x
∗
is
a
maximizer
of
f
(
x
)
subject
t
o
g
j
(
x
)
≥
0
.
198
This
argument
do
es
not
establish
a
sepa
rating
hyp
erplane,
even
though
w
e
know
one
exists
from
our
longer
p
ro
of.
It
turns
out
that
in
any
successful
maximization
argument
for
a
concave
function
over
a
convex
feasible
set,
the
hyp
erplane
is
there,
whether
w
e
use
it
or
not.
The
following
famous
theorem
gua
rantees
that.
Theo
rem
4.17
[The
Separating
Hyperplane
Theo
rem]
If
S
and
U
are
tw
o
convex
sets,
at
least
one
has
a
non-empt
y
interior,
and
they
share
no
interior
p
oints
in
common,
then
there
is
a
vecto
r
v
and
a
numb
er
k
such
that
1
Fo
r
all
x
in
U
,
v
·
x
≥
k
2
Fo
r
all
x
in
S
,
v
·
x
≤
k
Rema
rk
This
theorem
tells
us
that
if
a
convex
upp
er
level
set
U
and
a
convex
feasible
set
S
meet
only
at
their
b
ounda
ries,
then
the
hyp
erplane
v
·
x
=
k
sepa
rates
them.
It
do
es
not
tell
us
what
v
is
or
ho
w
to
construct
it.
As
y
ou
might
infer
from
this
remark,
the
applications
of
the
separating
hyp
erplane
theorem
tend
to
b
e
abstract.
The
proof
requires
ideas
from
analysis.
4.4.7
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
Upp
er
level
sets
of
concave
functions
a
re
convex
(Lemma
4.13)
Convex
upp
er
level
sets
lie
in
one
half
space
of
their
tangent
hyp
erplanes
(Lemma
4.14)
Kuhn-T
uck
er
sufficiency
for
concave
functions
(Theo
rem
4.12)
Slater’s
condition
(Theo
rem
4.16)
This
is
also
a
go
o
d
time
to
summa
rize
the
sufficient
conditions
w
e
have
covered.
They
differ
in
when
they
can
b
e
applied
and
what
conclusion
they
dra
w.
199
4.4.7
Section
Summa
ry
Sufficient
Conditions
fo
r
Constrained
Optimization
Condition
Limited
to
Conclusion
EVT
b
ounded
sets
one
of
the
critical
p
oints
is
a
maximizer
Bo
rdered
Hessian
binding
constraint
this
critical
point
is
a
lo
cal
maximizer
KT
Sufficiency
concave
functions
this
critical
p
oint
is
a
maximizer
200
4.5
Quasiconcavit
y
Goals:
1
Recognize
a
quasiconcave
function
from
its
graph
or
level
sets
2
Apply
quasiconcavity
in
sufficient
conditions
for
a
maximizer
3
Use
concavity
and
nondecreasing
comp
ositions
to
test
for
quasiconcavity
4
Use
a
determinant
test
to
verify
quasiconcavity
4.5.1
Limitations
of
Concave
Programming
Concave
programming
p
rovided
sufficient
conditions
for
a
maximizer
on
one
or
mo
re
constraints.
Our
a
rgument
excluded
functions
with
badly
behaved
level
sets
in
order
to
guarantee
separation,
but
did
it
exclude
to
o
many?
The
function
f
(
x
)
=
3
x
2
−
2
x
+2
has
convex
upp
er
level
sets,
but
it
is
not
a
concave
function.
Could
w
e
make
a
sepa
ration
argument
for
it?
Click to Load Applet
Figure
4.32:
The
graph
of
y
=
3
x
2
−
2
x
+2
and
a
convex
upp
er
level
set
Another
example:
f
(
x
1
,
x
2
)
=
x
1
x
2
(restricted
to
x
1
,
x
2
>
0
)
has
strictly
convex
upp
er
level
sets.
It
is
not
a
concave
function,
b
ecause
H
f
(
x
1
,
x
2
)
=
0
1
1
0
.
201
4.5.1
Limitations
of
Concave
Programming
∇
f
(
x
1
,
x
2
)
x
1
x
2
Figure
4.33:
Several
level
sets
and
one
upp
er
level
set
of
f
(
x
1
,
x
2
)
=
x
1
x
2
4.5.2
Quasiconcave
F
unctions
Theo
rem
4.12
requires
that
the
objective
function
and
constraint
functions
a
re
concave,
but
the
only
pa
rt
of
the
p
ro
of
that
used
their
concavity
was
the
first
step:
that
concave
functions
have
convex
upp
er
level
sets.
W
e
have
now
seen
examples
of
other
functions
that
have
this
property
.
These
examples
b
elong
to
a
broader
class
of
functions,
on
which
the
main
ideas
of
concave
programming
still
apply
.
We
call
them
quasiconcave.
Definition
4.18
A
function
f
(
x
)
is
quasiconcave
if
for
any
p
oints
a
and
b
in
the
domain
of
f
(
x
)
,
f
((1
−
t
)
a
+
t
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
≤
t
≤
1
It
is
quasiconvex
if
f
((1
−
t
)
a
+
t
b
)
≤
max
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
≤
t
≤
1
Quasiconcavit
y
is
a
statement
ab
out
the
height
of
the
graph
over
a
path
that
go
es
across
and
then
up.
f
((1
−
t
)
a
+
t
b
|
{z
}
line
from
a
to
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
|
{z
}
horizontal
line
fo
r
all
0
≤
t
≤
1
202
Click to Load Applet
Figure
4.34:
The
graph
of
a
quasiconcave
function
and
the
across
and
up
path
Rema
rk
“Across
and
up”
is
designed
so
that
the
path
has
a
height
that
is
the
minimum
of
f
(
a
)
and
f
(
b
)
for
each
x
b
etw
een
them.
If
it
lies
b
elow
the
graph
then
f
((1
−
t
)
a
+
t
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
≤
t
≤
1
.
Do
not
make
an
“across
and
do
wn”
o
r
“up
and
across”
path
b
y
mistak
e.
Those
will
test
a
different
inequalit
y
.
The
name
quasiconcavity
suggests
that
these
functions
should
b
e
simila
r
to
concave
functions.
In
fact,
concavit
y
is
a
stronger
condition
than
quasiconcavit
y
.
A
visual
a
rgument
is
b
est
here.
Theo
rem
1.19
sho
wed
that
a
function
is
concave,
if
and
only
if
its
graph
lies
ab
ove
its
secants.
The
across
and
up
path
lies
b
elo
w
the
secant
from
(
a,
f
(
a
))
to
(
b,
f
(
b
))
.
If
y
=
f
(
x
)
lies
ab
ove
the
secant,
then
it
also
lies
ab
ove
the
across
and
up
path.
Theo
rem
4.19
If
f
(
x
)
is
a
concave
function,
then
it
is
also
a
quasiconcave
function.
203
4.5.2
Quasiconcave
F
unctions
Click to Load Applet
Figure
4.35:
The
secant
and
the
across
and
up
path
b
elow
y
=
f
(
x
)
Quasiconcavit
y
is
easiest
to
recognize
in
single-variable
functions.
We
can
verify
that
across
and
up
paths
sta
y
b
elow
the
graph
y
=
f
(
x
)
for
functions
with
the
following
shap
es.
Lemma
4.20
Supp
ose
f
(
x
)
is
a
one-variable
function.
f
(
x
)
is
quasiconcave,
if
one
of
the
follo
wing
is
true
about
f
(
x
)
:
1
f
(
x
)
is
non-decreasing.
2
f
(
x
)
is
non-increasing.
3
Fo
r
some
a
,
f
(
x
)
is
non-decreasing
b
efore
a
and
nonincreasing
after
a
.
F
unctions
that
satisfy
1
o
r
2
are
called
monotone
.
Rema
rk
These
conclusions
only
mak
e
sense
fo
r
single-va
riable
functions.
Increasing
and
decreasing
are
not
w
o
rds
that
describ
e
multiva
riable
functions.
204
Click to Load Applet
Figure
4.36:
The
across
and
up
path
b
elow
a
non-increasing
y
=
f
(
x
)
Our
goal
in
defining
quasiconcavit
y
was
to
produce
a
large
class
of
functions
whose
upp
er
level
sets
allo
wed
fo
r
separation
a
rguments.
In
fact,
the
quasiconcave
functions
are
the
b
roadest
p
ossible
such
class.
Every
function
with
convex
upp
er
level
sets
is
quasiconcave,
and
every
quasiconcave
function
has
convex
upp
er
level
sets.
Lemma
4.21
f
(
x
)
is
a
quasiconcave
function,
if
and
only
if
every
upp
er
level
set
f
(
x
)
≥
c
is
a
convex
set.
Click to Load Applet
Figure
4.37:
The
segment
from
a
to
b
in
the
upp
er
level
set
of
f
(
x
)
The
p
ro
of
relies
on
combining
the
definitions
of
quasiconcavity
,
upp
er
level
sets,
and
convex
sets.
205
4.5.2
Quasiconcave
F
unctions
Pro
of
There
a
re
tw
o
directions
of
argument
that
we
need
to
mak
e.
1
Supp
ose
that
f
(
x
)
is
quasiconcave.
Let
a
and
b
b
e
in
the
upp
er
level
set
f
(
x
)
≥
c
.
By
definition,
w
e
know
that
f
(
a
)
≥
c
and
f
(
b
)
≥
c
.
The
p
oints
on
the
segment
from
a
to
b
a
re
parametrized
by
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
If
w
e
evaluate
f
along
these
p
oints
we
get
f
((1
−
t
)
a
+
t
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
(
f
is
quasiconcave
)
≥
c
(
a
and
b
lie
in
the
upp
er
level
set
)
Thus
the
segment
lies
in
the
upper
level
set
f
(
x
)
≥
c
.
Since
this
holds
fo
r
any
a
and
b
in
any
upp
er
level
set,
w
e
c
onclude
that
every
upp
er
level
set
of
f
(
x
)
is
convex.
2
Now
supp
ose
that
every
upp
er
level
set
of
(
x
)
is
convex.
Let
a
and
b
b
e
any
p
oints
in
the
domain.
Let
c
=
min
{
f
(
a
)
,
f
(
b
)
}
,
so
a
and
b
b
oth
lie
in
the
upp
er
level
set
f
(
x
)
≥
c
.
Since
the
upper
level
set
is
convex,
the
segment
b
et
w
een
them
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
lies
in
this
set.
Thus
for
0
≤
t
≤
1
,
f
((1
−
t
)
a
+
t
b
)
≥
c
=
min
{
f
(
a
)
,
f
(
b
)
}
Since
this
holds
fo
r
all
a
and
b
,
w
e
conclude
f
(
x
)
is
quasiconcave.
4.5.3
Quasiconcave
Programming
Quasiconcave
programming
describ
es
the
metho
ds
w
e
use
to
solve
constrained
optimization
when
the
objective
functions
and
constraint
functions
are
quasiconcave.
Most
metho
ds
a
re
analogs
of
results
fo
r
concavit
y
,
but
b
ecause
quasiconcavity
is
a
w
eaker
condition,
they
often
need
additional
conditions
o
r
draw
w
eaker
conclusions.
W
e
b
egin
with
a
result
for
unconstrained
optimization.
206
Theo
rem
4.22
If
x
∗
is
a
strict
lo
cal
maximizer
of
a
quasiconcave
function
f
(
x
)
,
then
x
∗
is
the
unique
global
maximizer
of
f
(
x
)
.
a
b
x
∗
x
1
x
2
Figure
4.38:
A
segment
,
a
neighb
orhoo
d
in
which
x
∗
is
a
maximizer,
and
a
p
oint
a
that
lies
in
b
oth
Pro
of
Let
b
b
e
any
other
p
oint
in
the
domain
of
f
(
x
)
.
x
∗
is
the
unique
maximizer
in
some
neighb
o
rho
o
d,
and
the
segment
from
x
∗
to
b
travels
through
this
neighb
orhoo
d.
Let
a
=
x
∗
b
e
a
p
oint
that
lies
b
oth
on
the
segment
and
in
the
neighb
o
rho
o
d.
f
(
a
)
≥
min
{
f
(
x
∗
)
,
f
(
b
)
}
(
f
(
x
)
is
quasiconcave
)
f
(
x
∗
)
>
min
{
f
(
x
∗
)
,
f
(
b
)
}
(
f
(
x
∗
)
>
f
(
a
))
f
(
x
∗
)
>
f
(
b
)
(
f
(
x
∗
)
is
not
less
than
itself
)
Since
this
holds
fo
r
any
b
,
w
e
conclude
that
x
∗
is
the
unique
maximizer
of
f
(
x
)
,
Rema
rk
F
or
a
quasiconcave
function,
knowing
a
is
a
critical
p
oint
is
not
enough
to
conclude
that
a
is
a
maximizer.
F
or
instance
f
(
x
)
=
x
3
is
increasing,
so
it
is
quasiconcave.
0
is
a
critical
p
oint
but
not
a
maximizer.
This
means
that
our
Kuhn-T
ucker
sufficiency
theorem
cannot
translate
directly
to
quasiconcavity
,
b
ecause
there
is
no
wa
y
to
cover
the
∇
f
(
x
)
=
0
case.
The
simplest
wo
rka
round
is
to
write
a
theo
rem
that
only
applies
when
∇
f
(
x
∗
)
=
0
.
207
4.5.3
Quasiconcave
Programming
Theo
rem
4.23
Given
an
objective
function
f
(
x
)
and
constraints
g
j
(
x
)
≥
0
,
supp
ose
∇
f
(
x
∗
)
=
0
and
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
ucker
conditions.
If
f
(
x
)
and
the
g
j
(
x
)
are
all
quasiconcave,
then
x
∗
maximizes
f
(
x
)
,
subject
to
the
constraints.
W
e
can
use
almost
the
same
argument
that
w
e
used
for
the
λ
j
>
0
case
with
concave
functions.
Only
the
first
steps
need
to
change.
Once
we
establish
that
the
upp
er
level
sets
are
convex,
the
rest
of
the
p
ro
of
is
identical
to
Theo
rem
4.12.
binding
λ
∗
j
>
0
all
λ
∗
j
=
0
f
(
x
)
is
quasiconcave
f
(
x
)
≥
c
is
convex
f
(
x
)
≥
c
do
es
not
intersect
h
−
g
j
(
x
)
are
quasiconcave
g
j
(
x
)
≥
0
a
re
convex
∇
f
(
x
∗
)
=
−
P
λ
∗
j
∇
g
j
(
x
∗
)
feasible
set
do
es
not
intersect
h
+
x
∗
is
a
max
in
the
feasible
set
Lem
4.21
Lem
4.14
Lem
4.21
Cor
4.15
Thm
4.10
4.5.4
P
ositive
T
ransformations
and
Quasiconcavit
y
The
condition
of
quasiconcavity
is
ordinal
rather
than
cardinal
in
nature.
W
e
care
that
certain
p
oints
attain
greater
values
than
others,
but
not
how
much
greater.
Contrast
this
to
concavit
y
where
f
((1
−
t
)
a
+
t
b
)
has
to
b
e
at
least
t
(
f
(
b
)
−
f
(
a
))
greater
than
f
(
a
)
.
F
or
this
reason,
quasiconcavity
is
a
relevant
property
of
utility
functions.
The
values
of
a
utilit
y
function
reflect
no
inherent
information
b
eyond
relative
preferences.
We
can
rescale
a
quasiconcave
function
f
(
x
)
without
affecting
whether
f
(
a
)
>
f
(
b
)
.
The
follo
wing
definition
and
theorem
formalizes
this
flexibilit
y
.
Definition
4.24
A
function
f
(
x
)
is
a
p
ositive
transformation
of
a
function
g
(
x
)
if
there
is
an
increasing
function
p
(
x
)
such
that
g
(
x
)
=
p
(
f
(
x
))
208
Example
Consider
the
function
f
(
x
1
,
x
2
)
=
x
1
x
2
on
the
domain
R
2
+
=
{
(
x
1
,
x
2
)
:
x
1
>
0
,
x
2
>
0
}
.
p
(
x
)
=
ln
x
is
an
increasing
function
so
g
(
x
1
,
x
2
)
=
p
(
f
(
x
1
,
x
2
))
=
ln
x
1
+
ln
x
2
is
a
p
ositive
transfo
rmation
of
f
.
P
ositive
transformation
is
a
symmetric
relation.
If
p
(
x
)
is
increasing,
then
so
is
p
−
1
(
x
)
.
That
means
that
if
g
(
x
)
=
p
(
f
(
x
))
is
a
p
ositive
transfo
rmation
of
f
(
x
)
,
then
f
(
x
)
=
p
−
1
(
g
(
x
))
is
also
a
p
ositive
transfo
rmation
of
g
(
x
)
.
Theo
rem
4.25
Let
f
(
x
)
b
e
a
function,
and
let
g
(
x
)
b
e
a
p
ositive
transformation
of
f
(
x
)
.
f
(
x
)
is
quasiconcave
if
and
only
if
g
(
x
)
is
quasiconcave.
Pro
of
Since
p
(
x
)
is
an
increasing
function,
larger
values
of
f
(
x
)
correspond
to
larger
values
of
g
(
x
)
.
This
means
that
f
((1
−
t
)
a
+
t
b
)
≥
f
(
a
)
,
if
and
only
if
g
((1
−
t
)
a
+
t
b
)
≥
g
(
a
)
,
and
f
((1
−
t
)
a
+
t
b
)
≥
f
(
b
)
,
if
and
only
if
g
((1
−
t
)
a
+
t
b
)
≥
g
(
b
)
.
Putting
these
together
gives
f
((1
−
t
)
a
+
t
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
,
if
and
only
if
g
((1
−
t
)
a
+
t
b
)
≥
min
{
g
(
a
)
,
g
(
b
)
}
.
If
f
(
x
)
is
quasiconcave,
then
this
inequality
is
satisfied
for
all
a
and
b
in
the
domain
and
t
in
[0
,
1]
.
This
means
that
g
(
x
)
is
also
quasiconcave.
If
f
(
x
)
is
not
quasiconcave,
then
this
inequalit
y
is
not
satisfied
fo
r
some
a
,
b
and
t
.
That
means
that
g
(
x
)
is
also
not
quasiconcave.
W
e
will
use
this
theo
rem
to
verify
that
p
ositive
transfo
rmations
of
concave
functions
a
re
quasiconcave.
Here
is
an
example.
209
4.5.4
Positive
T
ransformations
and
Quasiconcavity
Example
Consider
the
function
f
(
x
1
,
x
2
)
=
x
1
x
2
on
the
domain
R
2
+
=
{
(
x
1
,
x
2
)
:
x
1
>
0
,
x
2
>
0
}
.
W
e
can
reason
that
f
(
x
1
,
x
2
)
is
quasiconcave
as
follo
ws:
1
g
(
x
1
,
x
2
)
=
ln
x
1
+
ln
x
2
is
a
p
ositive
transfo
rmation
of
f
(
x
1
,
x
2
)
.
2
Apply
Theorem
1.41.
g
(
x
1
,
x
2
)
is
concave,
b
ecause
its
Hessian
is
negative
definite.
H
g
(
x
1
,
x
2
)
=
"
−
1
x
2
1
0
0
−
1
x
2
2
#
3
Apply
Theorem
4.19.
Since
g
(
x
1
,
x
2
)
is
concave,
it
is
also
quasiconcave.
4
Apply
Theorem
4.25.
Since
g
(
x
1
,
x
2
)
is
quasiconcave,
f
(
x
1
,
x
2
)
is
quasiconcave.
Rema
rk
Notice
this
reasoning
do
es
not
apply
to
concavit
y
.
g
(
x
1
,
x
2
)
=
ln
x
1
+
ln
x
2
is
concave
f
(
x
1
,
x
2
)
=
x
1
x
2
is
not
concave
T
o
use
this
metho
d,
we
must
find
a
function
g
(
x
)
that
is
a
positive
transformation
of
f
(
x
)
and
is
also
concave.
If
we
do
not
have
an
obvious
candidate,
random
guessing
is
not
practical.
We
would
like
a
mo
re
straightforw
ard
procedure.
4.5.5
The
Bo
rdered
Hessian
and
Quasiconcavity
W
e
can
verify
quasiconcavity
b
y
direct
computation.
W
e
will
derive
that
metho
d
here.
Recall
Lemma
4.14.
It
show
ed
that
convex
upp
er
level
sets
lie
in
the
p
ositive
half-space
of
their
tangent
hyp
erplanes.
Another
wa
y
to
sa
y
this
is
that
a
is
the
maximizer
of
f
(
x
)
among
p
oints
on
the
tangent
hyp
erplane
to
its
level
set.
210
∇
f
(
a
)
a
x
x
1
x
2
Figure
4.39:
A
maximizer
of
f
(
x
)
on
the
tangent
line
to
its
level
set
∇
f
(
a
)
a
x
x
1
x
2
Figure
4.40:
A
lo
cal
minimizer
of
f
(
x
)
on
the
tangent
line
to
its
level
set
W
e
can
prove
a
reverse
version
of
this
lemma
with
some
mo
difications.
1
We
generalize
the
tangent
hyp
erplane
to
“p
oints
that
satisfy
∇
f
(
a
)
·
(
x
−
a
)
=
0
.”
2
We
require
that
each
p
oint
is
a
strict
lo
cal
maximizer
subject
to
that
constraint.
If
this
holds
fo
r
all
a
,
w
e
can
conclude
that
f
(
x
)
is
quasiconcave.
Lemma
4.26
Supp
ose
f
(
x
)
is
a
continuously
differentiable
function
on
a
convex
domain.
If
every
a
in
the
domain
of
f
(
x
)
is
a
strict
lo
cal
maximizer
of
f
(
x
)
subject
to
∇
f
(
a
)
·
(
x
−
a
)
=
0
,
then
f
(
x
)
is
quasiconcave.
The
p
ro
of
shows
that
f
(
x
)
satisfies
the
inequalit
y
that
defines
quasiconcavit
y
.
Pro
of
Let
f
(
x
)
b
e
a
function
and
g
(
x
)
b
e
a
positive
transformation
of
f
(
x
)
.
Let
a
and
b
be
p
oints
in
the
domain
of
f
(
x
)
.
In
order
to
show
f
(
x
)
is
quasiconcave,
we
need
to
evaluate
f
(
x
)
along
the
segment
x
(
t
)
=
(1
−
t
)
a
+
t
b
0
≤
t
≤
1
Consider
the
comp
osition
f
(
x
(
t
))
.
By
the
extreme
value
theorem,
this
has
a
minimizer
on
[0
,
1]
.
Let
0
<
t
0
<
1
.
We
claim
that
t
0
is
not
a
minimizer
of
f
(
x
(
t
))
,
there
are
tw
o
cases:
1
If
d
f
(
x
(
t
0
))
dt
=
0
then
t
0
do
es
not
satisfy
the
first-o
rder
condition,
so
it
is
not
a
minimizer.
2
If
d
f
(
x
(
t
0
))
dt
=
0
then
b
y
the
chain
rule
∇
f
(
x
(
t
0
))
·
x
′
(
t
0
)
=
0
∇
f
(
x
(
t
0
))
·
(
b
−
a
)
=
0
211
4.5.5
The
Bordered
Hessian
and
Quasiconcavity
F
or
any
t
,
the
vector
x
(
t
)
−
x
(
t
0
)
is
pa
rallel
to
b
−
a
.
Thus
every
x
(
t
)
satisfies
∇
f
(
x
(
t
0
))
·
(
x
(
t
)
−
x
(
t
0
))
=
0
.
By
the
hyp
othesis
of
this
lemma,
x
(
t
0
)
is
a
strict
lo
cal
maximizer
subject
to
∇
f
(
x
(
t
0
))
·
(
x
−
x
(
t
0
))
=
0
.
We
can
conclude
that
t
0
is
not
a
minimizer
of
f
(
x
(
t
))
.
Since
there
is
a
minimizer
of
f
(
x
(
t
))
on
[0
,
1]
,
and
it
cannot
o
ccur
fo
r
0
<
t
<
1
,
the
minimizer
must
b
e
t
=
0
or
t
=
1
.
That
means
that
f
((1
−
t
)
a
+
t
b
)
≥
min
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
≤
t
≤
1
Since
this
holds
fo
r
any
a
and
b
,
w
e
conclude
f
(
x
)
is
quasiconcave.
Click to Load Applet
Figure
4.41:
A
critical
p
oint
of
the
comp
osition
f
(
x
(
t
))
,
its
gradient
,
and
its
upp
er
level
set
Acco
rding
to
thus
lemma,
we
now
have
a
new
w
ay
to
verify
quasiconcavity
.
W
e
must
check
that
every
p
oint
a
is
a
strict
lo
cal
maximizer
of
f
(
x
)
subject
to
∇
f
(
a
)
·
(
x
−
a
)
=
0
.
W
e
use
different
metho
ds
at
different
p
oints
a
.
Which
metho
d
we
use
dep
ends
on
∇
f
(
a
)
.
1
If
∇
f
(
a
)
=
0
then
every
point
x
satisfies
the
constraint,
making
it
meaningless.
We
can
check
that
a
is
an
unconstrained
strict
lo
cal
maximizer
b
y
checking
the
determinants
of
the
minors
of
H
f
(
a
)
(Theorem
1.31).
2
If
∇
f
(
a
)
=
0
then
the
Lagrangian
of
f
(
x
)
subject
to
∇
f
(
a
)
·
(
x
−
a
)
=
0
is
L
(
λ,
x
)
=
f
(
x
)
+
λ
(
∇
f
(
a
)
·
(
x
−
a
))
=
f
(
x
)
+
λ
n
X
i
=1
f
i
(
a
)(
x
i
−
a
i
)
W
e
can
check
the
determinants
of
the
minors
of
H
L
(
λ,
a
)
(Theorem
4.6).
212
The
Hessian
of
L
(
λ,
x
)
=
f
(
x
)
+
λ
n
X
i
=1
f
i
(
a
)(
x
i
−
a
i
)
has
a
memo
rable
form.
Example
F
or
n
=
2
,
the
b
o
rdered
Hessian
is
H
L
(
λ,
x
)
=
0
f
1
(
a
)
f
2
(
a
)
f
1
(
a
)
f
11
(
x
)
f
12
(
x
)
f
2
(
a
)
f
21
(
x
)
f
22
(
x
)
.
Evaluating
at
x
=
a
gives
H
L
(
λ,
a
)
=
0
f
1
(
a
)
f
2
(
a
)
f
1
(
a
)
f
11
(
a
)
f
12
(
a
)
f
2
(
a
)
f
21
(
a
)
f
22
(
a
)
.
Due
to
its
imp
ortance
in
this
test,
we
call
H
L
(
λ,
x
)
the
b
ordered
Hessian
of
f
.
W
e
denote
it
B
H
f
(
x
)
,
since
it
do
es
not
dep
end
on
λ
.
Theo
rem
4.27
Supp
ose
f
(
x
)
is
a
continuously
differentiable
function
on
a
convex
domain.
If
for
each
x
in
the
domain
either
1
H
f
(
x
)
satisfies
(
−
1)
i
|
M
i
|
>
0
for
1
≤
i
≤
n
,
or
2
B
H
f
(
x
)
satisfies
(
−
1)
i
|
M
i
|
<
0
for
2
≤
i
≤
n
+
1
,
then
f
(
x
)
is
a
quasiconcave
function.
Here
is
a
diagram
of
the
steps
of
the
proof.
H
f
(
a
)
satisfies
the
alternating
condition
B
H
f
(
a
)
satisfies
the
alternating
condition
a
is
a
strict
lo
cal
maximizer
subject
to
∇
f
(
a
)
·
(
x
−
a
)
=
0
f
(
x
)
is
quasiconcave
either
Thm
4.6
Local
SOC
Lem
4.26
213
4.5.5
The
Bordered
Hessian
and
Quasiconcavity
The
requireme
nt
that
a
b
e
a
strict
lo
cal
maximizer
of
f
(
x
)
subject
to
∇
f
(
a
)
·
(
x
−
a
)
=
0
is
actually
stronger
than
w
e
need.
W
e
could
make
a
valid
argument
only
requiring
a
nonstrict
lo
cal
maximizer.
This
w
ould
add
some
extra
complexit
y
to
the
proof
of
Lemma
4.26.
In
exchange
we
would
gain
the
abilit
y
to
use
the
test
for
a
negative
semidefinite
matrix
instead
of
negative
definite
(Theorem
1.42).
That
test
is
unpleasant
to
p
erfo
rm,
so
such
a
result
ma
y
not
b
e
wo
rth
the
effort.
4.5.6
V
erifying
Qusiconcavity
Using
the
Bo
rdered
Hessian
Let
f
(
x
1
,
x
2
)
=
x
1
x
2
on
the
domain
R
2
+
=
{
(
x
1
,
x
2
)
:
x
1
>
0
,
x
2
>
0
}
.
Sho
w
that
f
(
x
1
,
x
2
)
is
quasiconcave.
Solution
W
e
need
to
check
the
Hessian
where
∇
f
(
x
1
,
x
2
)
=
(
x
2
,
x
1
)
=
(0
,
0)
.
This
only
occurs
at
(0
,
0)
,
which
is
not
in
the
domain.
Everywhere
else
we
can
check
the
b
ordered
Hessian.
B
H
f
(
x
)
=
0
f
1
(
x
)
f
2
(
x
)
f
1
(
x
)
f
11
(
x
)
f
12
(
x
)
f
2
(
x
)
f
21
(
x
)
f
22
(
x
)
=
0
x
2
x
1
x
2
0
1
x
1
1
0
The
mino
r
determinants
we
need
to
check
a
re
|
M
2
|
=
0
x
2
x
2
0
=
−
x
2
2
<
0
|
M
3
|
=
0
x
2
x
1
x
2
0
1
x
1
1
0
=
0
−
x
2
x
2
1
x
1
0
+
x
1
x
2
0
x
1
1
=
2
x
1
x
2
>
0
F
or
all
x
1
,
x
2
>
0
,
this
satisfies
(
−
1)
i
|
M
i
|
<
0
fo
r
2
≤
i
≤
3
.
W
e
conclude
that
f
(
x
1
,
x
2
)
=
x
1
x
2
is
quasiconcave
on
R
2
+
.
214
4.5.7
Strict
Quasiconcavit
y
Lik
e
with
strict
concavity
,
strict
quasiconcavit
y
is
defined
by
taking
the
inequality
condition
of
qua-
siconcavit
y
and
making
the
inequalities
strict.
Definition
4.28
A
function
f
(
x
)
is
strictly
quasiconcave
if
for
any
p
oints
a
and
b
in
the
domain
of
f
(
x
)
,
f
((1
−
t
)
a
+
t
b
)
>
min
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
<
t
<
1
It
is
strictly
quasiconvex
if
f
((1
−
t
)
a
+
t
b
)
<
max
{
f
(
a
)
,
f
(
b
)
}
fo
r
all
0
<
t
<
1
W
e
can
see
from
this
definition
that
it
is
stronger
than
quasiconcavit
y
.
Every
strictly
quasiconcave
function
is
quasiconcave.
The
condition
on
the
upp
er
level
sets
is
more
complicated.
V
ariant
of
Lemma
4.21
f
(
x
)
is
a
strictly
quasiconcave
function,
if
and
only
if
1
every
upp
er
level
set
f
(
x
)
≥
c
is
a
strictly
convex
set,
and
2
there
are
no
neighb
orhoo
ds
on
which
f
(
x
)
is
constant
A
function
with
a
thick
level
set
will
fail
condition
2
.
215
4.5.7
Strict
Quasiconcavity
Click to Load Applet
Figure
4.42:
A
thick
level
set
of
f
(
x
)
As
w
e
might
exp
ect,
strict
quasiconcavity
is
a
weak
er
condition
than
strict
concavity
.
V
ariant
of
Theo
rem
4.19
If
a
function
f
(
x
)
is
strictly
concave,
then
it
is
also
strictly
quasiconcave.
W
e
can
summarize
the
relationships
different
fo
rms
of
(quasi)-concavity
in
the
following
diagram.
f
(
x
)
is
strictly
concave
f
(
x
)
is
concave
f
(
x
)
is
strictly
quasiconcave
f
(
x
)
is
quasiconcave
Lik
e
with
strict
concavit
y
,
strict
quasiconcavity
can
ensure
that
the
Kuhn-T
ucker
conditions
generate
a
unique
maximizer.
V
ariant
of
Theo
rem
4.23
Given
an
objective
function
f
(
x
)
and
constraints
g
j
(
x
)
≥
0
,
supp
ose
∇
f
(
x
∗
)
=
0
and
(
x
∗
,
λ
∗
)
satisfies
the
Kuhn-T
uck
er
conditions.
If
f
(
x
)
and
the
binding
g
j
(
x
)
are
quasiconcave,
and
additionally
either
1
f
(
x
)
is
strictly
quasiconcave
or
2
the
binding
g
j
(
x
)
are
strictly
quasiconcave,
then
x
∗
is
the
unique
maximizer
of
f
(
x
)
,
subject
to
the
constraints.
216
Lik
e
regula
r
quasiconcavity
,
strict
quasiconcavity
is
preserved
under
positive
transfo
rmation.
V
ariant
of
Theo
rem
4.25
Let
f
(
x
)
be
a
function
and
g
(
x
)
b
e
a
p
ositive
transformation
of
f
(
x
)
.
f
(
x
)
is
strictly
quasiconcave
if
and
only
if
g
(
x
)
is
strictly
quasiconcave.
The
proof
of
Lemma
4.26
could
instead
produce
a
strict
inequalit
y
with
no
additional
reasoning.
As
a
result
we
can
strengthen
our
Hessian/bordered
Hessian
test
to
guarantee
strict
quasiconcavit
y
with
no
mo
difications.
Imp
rovement
on
Theorem
4.27
Supp
ose
f
(
x
)
is
a
continuously
differentiable
function
on
a
convex
domain.
If
fo
r
each
a
in
the
domain
either
1
H
f
(
x
)
satisfies
(
−
1)
i
|
M
i
|
>
0
for
1
≤
i
≤
n
,
or
2
B
H
f
(
x
)
satisfies
(
−
1)
i
|
M
i
|
<
0
for
2
≤
i
≤
n
+
1
,
then
f
(
x
)
is
strictly
quasiconcave.
4.5.8
Section
Summa
ry
The
most
imp
o
rtant
definitions
and
results
from
this
section
were
The
definition
of
quasiconcavit
y
(De
finition
4.18)
Concave
functions
a
re
quasiconcave
(Theo
rem
4.19)
The
upp
er
level
sets
of
quasiconcave
function
a
re
convex
(Lemma
4.21)
Sufficient
conditions
fo
r
a
maxim
izer
of
quasiconcave
functions
(Theo
rems
4.22
and
4.23)
V
erifying
quasiconcavity
via
composition
(Theorem
4.25)
V
erifying
quasiconcavity
with
the
bordered
Hessian
(Theo
rem
4.27)
Definition
and
va
riants
for
strict
quasiconcavit
y
(Definition
4.28
etc.)
217
>
Back to Contents