<
Chapter 2
Advanced Integration and
Applications
This chapter covers a variety of methods and applications for single-variable integrals. The first two
sections lay the groundwork for multivariable integration by exploring the connections between integration
and geometry. One section touches on approximation methods for integrals. Other sections prepare us
for our goal: applying integration to probability and statistics.
Contents
2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 158
Example 2.1.5
The Area Enclosed by Two Curves that Intersect More than Twice
Solution
To find the intersections we set f(x) = g(x) and solve:
x
3
− 10x = 3x
2
x
3
− 3x
2
− 10x = 0
x(x − 5)(x + 2) = 0
x =0, 5, or − 2
Our region is bounded between x = −2 and x = 5, but one graph does not need to be above the other
for the entire region. The graphs intersect at x = 0 so one graph might be on top for [−2, 0], while the
other is on top for [0, 5]. To find out which is which we could evaluate at test points (we would need
two). Alternately, since we’ve already factored f(x) − g(x) = x(x − 5)(x + 2) we can perform a sign
analysis:
x − − + +
(x − 5) − − − +
(x + 2) − + + +
f(x) − g(x) − + − +
−2 0 5
Thus x
3
− 10x > 3x
2
on [−2, 0] and x
3
− 10x < 3x
2
on [0, 5]. The enclosed area is computed by:
Area =
Z
0
−2
x
3
− 10x − 3x
2
dx +
Z
5
0
3x
2
− x
3
+ 10x dx
=
x
4
4
− 5x
2
− x
3
0
−2
+ x
3
−
x
4
4
+ 5x
2
5
0
= (0 − 0 − 0 − 4 + 20 − 8) +
125 −
625
4
+ 125 − 0 + 0 − 0
=
407
4
Main Ideas
With more intersections, we must check the region between each pair of intersections to see which
graph is on top.
It can be more efficient to make a sign analysis chart.
Sketching the graphs may be more difficult. If you can do it, it will corroborate (or correct) your
calculations.
66
y
∗
i
= 2
√
x y
∗
i
=
16
x
y
∗
i
2
=
√
x xy
∗
i
= 16
(y
∗
i
)
2
4
= x x =
16
y
∗
i
These computations should be familiar. Finding x in terms of y is called finding the inverse function.
These inverse functions give the left and right bounds of our region. To find the area, we take a sum
of the areas of these rectangles of different widths. Then we take a limit. Notice that to make the
width positive we subtract the smaller x value from the larger x value. Geometrically, this is the right
endpoint
16
y
∗
i
minus the left endpoint
(y
∗
i
)
2
4
.
lim
∆y→0
X
i
16
y
∗
i
−
(y
∗
i
)
2
4
| {z }
width
∆y
|{z}
height
=
Z
4
1
16
y
−
y
2
4
dy
This limit is an integral, but the variable of integration is y, not x. The bounds of integration are
the set of y values in the region. The lowest point in the region is at y = 1. The highest is at y = 4.
We evaluate the integral using the Fundamental Theorem of Calculus, but with y instead of x.
Area Enclosed =
Z
4
1
16
y
−
y
2
4
dy
= 16 ln |y| −
y
3
12
4
1
=
16 ln 4 −
64
12
−
16 ln 1 −
1
12
= 16 ln 4 −
63
12
Main Idea
The area to the right of x = f
−1
(y) and to the left of x = g
−1
(y) for y from a to b can be computed
Z
b
a
g
−1
(y) − f
−1
(y) dy.
Strategy
Changing an integral to dy may be more work than breaking it into two or more parts. When solving
an area problem, consider both methods and use the one that seems more promising. If you run into
problems with your chosen approach, give the other method a try.
69
Q9
The expressions
Z
b
a
|f(x)| dx and
Z
b
a
f(x) dx
are not equivalent. Explain why, and draw the graph of a function on which these expressions
disagree.
Q10
Given a differentiable function f(x), the signed area between the graph y = f
′
(x) and the x-axis
from x = a to x = b is denoted
R
b
a
f
′
(x) dx and is equal to the change in f(x) from x = a to
x = b. In what sense does the geometric area between the graph of y = f
′
(x) and the x-axis
represent a change in f (x)?
2.1.2
Q11
Suppose y = f (x) and y = g(x) are below the x-axis. What integral computes the geometric
area between them. How does this compare to the situation when they are above the x-axis?
Q12
Here is another way to derive the formula for the area between curves. Consider the functions
graphed here:
71
2.1.5
Q19
Compute the area between y = sin x and y = cos x over the interval [0, 2π].
Q20
Erica and Carter were asked to compute the area enclosed by y = 4x and y = x
3
. They agree
that 4x = x
3
when x = −2 and when x = 2. Erica thinks the area is
Z
2
−2
4x − x
3
dx
Carter thinks it is
Z
2
−2
x
3
− 4x dx
a
Who is correct?
b
How do you think the mistake could reasonably have happened, and how can you avoid it?
Q21
Compute the area enclosed by y = xe
x
2
, and y = ex.
Q22
Set up an integral or integrals to compute the region enclosed by the curves f (x) = x
2
(x
2
− 4)
and g(x) = x
4
(x
2
− 4).
Q23
Often the top curve of an enclosed region alternates between f(x) and g(x) at each intersection.
Can you explain what about the previous problem caused this pattern to fail?
Q24
Suppose y = f(x) and y = g(x) intersect multiple times, with x = a their leftmost intersection
and x = b their rightmost. We can express the area enclosed between them by
R
b
a
|g(x)−f(x)| dx.
a
Explain why this formula works.
b
Explain why this formula isn’t partcilaularly helpful.
73
Section 2.1
Exercises
2.1.6
Q25
Compute the area enclosed by y = 6, y =
√
x and y = −2x
Q26
Compute the area enclosed by y = e
x
, y = e
4−x
, and y = 1.
Q27
You have been taught at least three ways to set up an expression that will compute the area
enclosed by (all of) y = 3, y = 3x, y = 9 and x + y = −5. Set up all the methods you know
that will do this. You do not need to evaluate them.
Q28
Write the area in the first quadrant enclosed by y =
√
3x, y = 0, and x
2
+ y
2
= 4 as a single
integral.
Q29
Write the area enclosed by y =
√
x and y = x
2
as
a
an integral in x
b
an integral in y
Q30
Write the area in the first quadrant enclosed by y = x
2
, y = 3x
2
, and y = 18 − 3x as
a
a sum of integrals in x
b
a sum of integrals in y
Extension and Synthesis
Q31
Suppose you’ve found that y = f(x) and y = g(x) intersect at x = a (along with perhaps other
places). What could knowing the values of f
′
(a) and g
′
(a) tell you about where each graph is
above the other? Be as specific as possible.
Q32
Suppose you are given that for all x:
f
′
(x) > 0
g
′
(x) < 0
We approximate area between y = f(x) and y = g(x) from x = a to x = b by rectangles,
letting the x
∗
i
be the right endpoints of each subinterval. What can we say about whether the
approximation will overestimate or underestimate the true area?
74
Section 2.2
Goals:
1 Recognize cross sections of a solid object.
2 Write the area of each cross section as a function.
3 Compute the volume of a solid.
4 Visualize and compute the volume of a solid of revolution.
The motivation for the definite integral was computing an area. However, the definition turns out
to be more useful than that. With the correct setup, we can express a volume as an integral as well.
Question 2.2.1
Dimension
In mathematics, we define the dimension of an object. Dimension measures the number of degrees of
freedom available to a point traveling in the object.
The definition may not match your intuition for dimension. For example, you only encounter a
parabola in two (or more)-dimensional space. However, the parabola itself is one-dimensional. If you
imagine that you are an insect crawling on the parabola, you can only travel forward or backward, not
side to side. If you were small enough, the parabola would seem indistinguishable from a line.
Example
1 A plane is two dimensional. You can travel left/right or up/down.
2 A circle is one dimensional. You can only travel clockwise/counterclockwise.
3 A point is zero dimensional. There is nowhere to travel within it.
We measure objects of different dimensions differently. In all cases, measuring is counting how many
units of measurement fit inside the object. A 6 unit by 3 unit rectangle has area 18 square units, because
18 unit squares can fit inside it. For less regular objects we need to consider parts of square units. This
requires a lot of work to do formally, but the intuition should be straightforward.
75
Question 2.2.1
What Is Volume?
Figure: Objects of several dimensions and their units of measurement
We use different names to describe objects and their measurements in different dimensions:
Dimension Names Measurement
0 point none
1 line, circle, curve length
2 square, polygon, disc, sphere, surface area
3 cube, polyhedron, ball, solid volume
Vocabulary Check
It doesn’t make sense to talk about the volume of a surface. No unit cubes will fit inside it.
Similarly it doesn’t make sense to talk about the area of a solid. Infinitely many unit squares will fit
in any solid. However, solids have boundary surfaces, and we do sometimes measure their areas.
The simplest solid to measure is a (right) prism. If a prism has height h, we can see that each unit
square (or part thereof) in the base has h unit cubes stacked above it. Thus we have
76
Formula for Volume of a Prism
volume = area of base ×height
Figure: A prism divided into unit cubes and its base divided into unit squares.
Here we see the base of the prism and the square units (or parts thereof) that it contains. The prism
has height 3.5. We can see there are 3.5 cubic units above each square unit in the base.
You may be questioning the relevance of studying areas and volumes in the 21st century. Few people
need to compute geometric measurements in their careers. However, geometry is not the end goal of
this investigation.
Remark
Our motivation for studying solids is not to solve geometry problems. Recall that the definite integral
allowed us to express total change as an area:
total change = rate of change × time
f(b) − f (a) =
Z
b
a
f
′
(t) dt
This allowed us to use our geometric intuition of areas to better understand rates of change. Similarly,
volume will allow us to use geometry understand different types of rates later on.
77
2.2.3
Q13
Suppose I’m trying to approximate the volume of a solid S of height 12 using four prisms of equal
height. Supoose those prisms have volumes 5.1, 6, 7.2 and 9.6
a
What is the approximate volume of S?
b
What are the areas of the cross sections I used to produce each prism?
Q14
Suppose I’m trying to approximate the volume of the half-ball below by prisms. I subdivide the
height into n subheights and use the cross section at the left hand side of each as the base of each
prism. Will I overestimate or underestimate the volume? Explain how you know in a sentence or
two.
Q15
Produce an approximation of the volume of a pyramid with height 9 and square base of side
length 6 using 3 prisms. There are multiple correct answer to this, corresponding to different
choices of where to take the cross sections.
Q16
Suppose a solid S has height 16. Suppose all of its cross-sections perpendicular to the height
have a different shape, but all of those shapes have area 5.
a
What is the volume of S?
b
Do you really need calculus to solve
a
? Discuss.
2.2.4
Q17
Compute the volume of the solid between x = 0 and x = 3 whose cross sections at each x are
squares of side length e
x
.
Q18
Compute the volume of the solid between x = 0 and x = 2 whose cross sections at each x are
trapezoids of bases x + 1 and x + 3 and height x
2
.
Q19
Compute the volume of the solid whose cross sections, perpendicular to the x-axis, are triangles
whose bases lie between y = 3x and y = x
2
from x = 0 to x = 3 and whose heights are equal
to the length of their bases.
87
2.2.6
Q25
Compute the volume of a solid whose base is the triangle under y = −
1
2
x+3 in the first quadrant
and whose cross sections, perpendicular to the x-axis are triangles of height 8.
Q26
Compute the volume of a solid whose base is the region enclosed by y =
√
x and y =
x
2
and
whose cross sections, perpendicular to the x-axis are squares.
Q27
Compute the volume of a solid whose base is a right triangle with legs 4 and 3 and whose cross
sections, perpendicular to the leg of length 4, are semicircles with their diameter in the base.
Q28
Compute the volume of a solid S whose base is the unit disc and whose cross sections perpendicular
to the x-axis are isosceles right triangles, with one leg in the base.
Extension and Synthesis
Q29
Let D be the region enclosed by y = x
2
− 6x and the x-axis.
a
Set up an integral that will compute the geometric area of D. You do not need to evaluate
it.
b
Let S be a solid whose base is D and whose cross sections perpendicular to the x-axis are
semicircles with their diameter in D. Set up an integral that will compute the volume of S.
You do not need to evaluate it.
Q30
Consider the solid obtained by rotating the triangle below around the x-axis.
a
Describe the shape of the cross sections. Which measurements of this shape depend on x?
b
Compute a formula for A(x), the area of the cross section at each value of x.
c
Compute the volume of the solid.
89
I.L.A.T.E.
When deciding which factor of a product should be u and which should be dv, put them into the chart
below.
Inverse
functions
Logarithms
Algebraic
expressions
(polyniomials)
Trig
functions
Exponential
functions
better u’s better dv’s
Let’s apply I.L.A.T.E to the following products:
1
Z
x
5
ln x dx
x
5
is algebraic. ln x is a logarithm. We should let u = ln x and dv = x
5
dx.
2
Z
x sin x dx
x is algebraic. sin x is trigonometric. We should let u = x and dv = sin x dx.
3
Z
x
2
tan
−1
(x) dx
x
2
is algebraic. tan
−1
(x) is an inverse function. We should let u = tan
−1
(x) and dv = x
2
dx.
Z
x
2
tan
−1
(x) dx
=
1
3
x
3
tan
−1
(x) −
Z
1
3
x
3
1
1 + x
2
dx
=
1
3
x
3
tan
−1
(x) −
Z
1
3
x
3
1
1 + x
2
dx
=
1
3
x
3
tan
−1
(x) −
Z
1
6
x
2
1 + x
2
2x dx
=
1
3
x
3
tan
−1
(x) −
Z
1
6
u − 1
u
du
=
1
3
x
3
tan
−1
(x) −
1
6
Z
1 −
1
u
du
=
1
3
x
3
tan
−1
(x) −
1
6
(u − ln |u|) + c
=
1
3
x
3
tan
−1
(x) −
1
6
(1 + x
2
− ln |1 + x
2
|) + c
u = tan
−1
(x) dv = x
2
dx
du =
1
1+x
2
dx v =
1
3
x
3
by parts
u = 1 + x
2
du = 2x dx
u-substitution
95
2.3.1
Q5
Compute
Z
sin x
1 + x
2
+ cos x tan
−1
x dx
Q6
Which of the following can be integrated using u-substitution?
R
e
x
dx
R
xe
x
dx
R
x
2
e
x
dx
R
x
3
e
x
dx
R
e
x
2
dx
R
xe
x
2
dx
R
x
2
e
x
2
dx
R
x
3
e
x
2
dx
R
e
x
3
dx
R
xe
x
3
dx
R
x
2
e
x
3
dx
R
x
3
e
x
3
dx
R
e
x
4
dx
R
xe
x
4
dx
R
x
2
e
x
4
dx
R
x
3
e
x
4
dx
2.3.3
Q7
Evaluate
Z
ln x
x
3
dx.
Q8
Evaluate
Z
x sin x dx.
Q9
Use integration by parts to compute
Z
tan
−1
x dx. Note that
d
dx
tan
−1
x =
1
1+x
2
Q10
We can write
Z
ln x dx as a product:
Z
(1)(ln x) dx.
a
How does I.L.A.T.E. suggest we proceed?
b
Use integration by parts to compute the antiderivative.
Q11
Compute
R
sin
−1
x dx.
Q12
Compute
R
π/4
0
tan
−1
x dx.
99
Section 2.4
Goals:
1 Use several methods to approximate definite integrals.
2 Assess the accuracy of an approximation.
3 Approximate integrals given incomplete information.
One of the first applications of integration is to measure total change. If v(t) is our velocity,
R
b
a
f(t) dt
computes the total displacement between the times a and b. In practice, to evaluate such an integral,
we need to know the antiderivative of f. Can we realistically expect to do this? Except in theoretical
situations (say a physics experiment), we cannot. A person driving a car will not produce a velocity
function that can be expressed in terms of algebra or trigonometry. While every continuous function has
an antiderivative, it doesn’t help us if we don’t know what it is or how to evaluate it.
Our best option in these situations is to approximate the integral. For instance, if we measure
velocity once per second, we could multiply each velocity by one second to approximate the distance
traveled in that second. Adding these up would approximate the total displacement. What we’ve done
is approximated the integral by rectangles of width 1. The natural question to ask is: how accurate is
such an approximation? How can we make it more accurate? These are the questions we’ll need to
address whenever we want to apply calculus to data sets instead of abstract functions.
Question 2.4.1
What x
∗
i
Can We Use when Approximating an Integral?
Recall the following
Definition
The definite integral is given by the formula
Z
b
a
f(x) dx = lim
∆x→0
n
X
i=1
f(x
∗
i
)∆x
where ∆x are the lengths of the subintervals of [a, b], and x
∗
i
is a number in the i
th
subinterval.
Without the limit (which is difficult or impossible to compute anyway) the sums on the right are
approximations of the integral. Once we choose an x
∗
i
for each i, we can evaluate this approximation.
The simplest idea is to just use the left endpoint of each subinterval as x
∗
i
.
101
Solution
a
A decreasing function will be overestimated by L
n
.
b
An increasing function will be underestimated by L
n
.
c
If L
n
is always exact, then f (x) is a constant function.
d
Functions can be classified as increasing, decreasing or constant by their first derivative. f
′
(x)
seems to determine the sign (and maybe size) of the error.
Figure: The error of an L
n
approximation
Let’s use the results of the exercise to formulate an error bound for L
n
.
Higher derivatives seem to produce more negative errors. If we allow for steeper and steeper slopes,
there is no limit to how large the error could be. So let’s put a bound on how big the derivative is.
Suppose we know that f
′
(x) ≤ S on [a, b]. Over each interval [x
i
, x
i+1
] we know that f(x) lies below
the line of slope S through (x
i
, f(x
i
)):
f(x) ≤ S(x − x
i
) + f (x
i
)
105
Remark
The argument that the line of slope S is the “worst case” scenario is a useful heuristic, but you may be
unsatisfied with its lack of rigor. A formal argument relies on the following ideas:
Larger functions have larger integrals. If f(x) ≤ g(x), then
R
b
a
f(x) dx ≤
R
b
a
g(x) dx as long as
a ≤ b.
The Fundamental Theorem of Calculus tells us we can write f (x) = f (x
i
) +
R
x
x
i
f
′
(t)dt.
The line of slope S would be L(x) = f (x
i
) +
R
x
x
i
S dt. Over the interval [x
i
, x
i+1
], comparing these
integrals shows that f(x) ≤ L(x). Thus
R
x
i+1
x
i
f(x) dx ≤
R
x
i+1
x
i
L(x) dx. This tells us that there is
more error, and thus a larger underestimate in the left hand approximation of L(x) than there is in the
left hand approximation of f (x).
Example 2.4.4
Suppose we want to understand the error of an L
n
approximation of
Z
16
1
√
x dx.
a
What bounds can we put on |f
′
(x)| for our error calculation?
b
What bound can we put on the error of the L
5
approximation?
c
What n would we need in order to guarantee that the L
n
approximation has error at most
1
100
.
d
What problem would result, if we tried to bound the error of an L
n
approximation of
Z
16
0
√
x dx?
How might you resolve this?
Solution
a
f
′
(x) =
1
2
√
x
. This is always positive, and it decreases as x increases. The largest value of f
′
(x)
on [1, 16] occurs when x = 1. If we let S = f
′
(1) =
1
2
, we are guaranteed that for all x in [1, 16],
|f
′
(x)| <
1
2
.
107
Example 2.4.4
Computing an E
L
Bound
b
By our theorem
|E
L
| ≤
S(b − a)
2
2n
=
1
2
(16 − 1)
2
2(5)
=
45
4
So the error lies between −
45
4
and
45
4
.
c
We can set our error bound (with n as a variable) to be less than
1
100
and solve for n.
|E
L
| ≤
1
2
(16 − 1)
2
2n
≤
1
100
225
4n
≤
1
100
(225)(100) ≤ 4n
(225)(25) ≤ n
5625 ≤ n
We conclude that the error will be less than
1
100
as long as n is at least 5625. Note that since this
is an error bound, the actual error may shrink below
1
100
with fewer rectangles. We would need a
different method to verify that, though.
d
If we want apply our theorem to
Z
16
0
√
x dx, we need an S such that |f
′
(x)| ≤ S. This derivative
is f
′
(x) =
1
2
√
x
, which increases without bound as x → 0
+
. Thus there is no S, and we cannot
apply the error bound theorem.
To get around this problem we could break the interval into two parts and bound them by different
methods. We can bound the error on rectangles 2 through n over the interval [∆x, 16] using the
theorem as above. In this case S =
1
2
√
∆x
will work. To bound the error over the first rectangle
[0, ∆x], note that f (x) is increasing. The first rectangle of L
n
will underestimate the integral,
while the first rectangle of R
n
will overestimate it. Thus the actual error can be no bigger than
the difference between them, which is
√
∆x∆x −0∆x. The total error can be no larger than the
sum of the error bound over [0, ∆x] and the error bound over [∆x, 16].
108
2.4.3
Q11
Compute the theoretical error bound on the L
14
approximation of
Z
8
1
3
√
x dx.
Q12
Compute the theoretical error bound on the R
5
approximation of
Z
15
0
1
x
2
+ 1
dx.
Q13
How large would n need to be to guarantee that the L
n
approximation of
Z
8
2
log
2
x dx is within
1
10000
of the actual value?
Q14
How large would n need to be to guarantee that the R
n
approximation of
Z
2
−1
x
3
dx is within
1
1000
of the actual value?
2.4.4
Q15
Suppose we make the following approximations of
Z
30
15
4x + 7 dx. Without computing them, put
them in order from least to greatest (some may be equal).
L
4
L
8
R
4
R
8
M
4
M
8
The actual value
Q16
Yiming has a great idea. He approximates
R
b
a
f(x) dx by 12 rectangles. In order to mitigate the
error of left and right hand approximations, he takes the right endpoint of the first subinterval as
a test point, but the left endpoint of the second subinterval. He continues to alternate for all 12
subintervals. What is another name for the approximation Yiming has produced?
115
2.4.7
Q23
Let f (x) =
1
x
3
. If you wanted to use a midpoint approximation with n rectangles to approximate
Z
5
3
f(x) dx. How large must n be to guarantee your approximation had an error of no more
than
1
10000
? Your answer should have the form n ≥ . . ., but you do not need to simplify any
arithmetic.
Q24
Suppose we want to approximate
Z
9
1
√
x dx.
a
Produce the T
4
approximation. Don’t bother simplifying the arithmetic.
b
Solve for a value n such that T
n
has an error of at most
1
1000000
. Don’t simplify the arithmetic.
Q25
Consider the following data about an unknown function g(x).
x 0 2 4 6 8 10 12 14
g(x) 3 5 8 9 7 4 3 1
a
Compute a M
3
approximation of
Z
12
0
g(x) dx.
b
If you are given that |g
′′
(x)| <
1
4
, what bound can you put on the error of the previous
approximation?
Q26
Sasha is trying to bound the error of her M
10
approximation of
Z
π
0
sin x dx. She computes
f
′′
(0) = 0 and f
′′
(π) = 0 and so decides to use K = 0.
a
What does her choice of K imply about the accuracy of her approximation.
b
Explain what is wrong with Sasha’s reasoning.
c
Compute the actual error bound for the M
10
approximation.
117
a
What formula that we learned would give a bound on the error of this approximation? Fill in
all the information you can, and indicate the information that you would need to complete
the calculation. Be as specific as possible.
b
Suppose that, instead of the information you need for the formula, you were only given that
f is an increasing function on [−7, 13]. How could you compute an error bound in this case?
Justify your answer.
119
Exercise
Evaluate the following limits:
a
lim
x→∞
1
x
2
b
lim
x→∞
√
x
c
lim
t→−∞
e
t
d
lim
y→∞
sin y
e
lim
w→∞
ln w
f
lim
x→−∞
3x
2
+ 7
x
2
− 5x
Solution
a
lim
x→∞
1
x
2
= 0.
b
lim
x→∞
√
x = ∞.
c
lim
t→−∞
e
t
= 0.
d
lim
y→∞
sin y does not exist.
e
lim
w→∞
ln w = ∞.
f
lim
x→−∞
3x
2
+ 7
x
2
− 5x
= 3.
121
Remarks
We might worry that the approximations are so bad, that the limit lim
∆x→0
n
X
i=1
f(x
∗
i
)∆x does not
exist. Fortunately, it does, as long as there are only finitely many discontinuities..
f(x) almost has an antiderivative function. F (x) =
Z
x
0
f(t) dt has derivative f(x) at all x,
except perhaps at the points of discontinuity.
While it may be comforting to know that an antiderivative function exists, it doesn’t help us evaluate
the integral. We don’t know what number to assign to F (x) for many values of x. So how do we compute
Z
5
0
f(x) dx? Instead of dealing with a a function whose antiderivative we don’t know, we break this
into two integrals that we do know.
Z
5
0
f(x) dx =
Z
2
0
f(x) dx +
Z
5
2
f(x) dx
=
Z
2
0
3x
2
dx +
Z
5
2
f(x) dx
Why can’t we replace
R
5
2
f(x) dx with
R
5
2
10 − 2x dx? At x = 2, f(x) = 3x
2
, not 10 − 2x. This is
unfortunate, because for any number t > 2 we could replace
R
5
t
f(x) dx with
R
5
t
10 − 2x dx. We will
need to break our integral down further.
Z
5
0
f(x) dx =
Z
2
0
f(x) dx +
Z
t
2
f(x) dx +
Z
5
t
f(x) dx
=
Z
2
0
3x
2
dx +
Z
t
2
f(x) dx +
Z
5
t
10 − 2x dx
We still don’t know the value of the middle integral, but we know that as t approaches 2, the domain
of integration shrinks to 0. We can take advantage of this by taking a limit.
Z
5
0
f(x) dx = lim
t→2
+
Z
2
0
3x
2
dx +
Z
t
2
f(x) dx +
Z
5
t
10 − 2x dx
= lim
t→2
+
x
3
2
0
dx +
Z
t
2
f(x) dx + 10x − x
2
5
t
= lim
t→2
+
8 − 0 +
Z
t
2
f(x) dx + (50 − 25) − (10t − t
2
)
= lim
t→2
+
33 − 10t + t
2
+
Z
t
2
f(x) dx
= 33 − 10(2) + 2
2
+
Z
2
2
f(x) dx
123
Question 2.5.2
How Do We Integrate a Discontinuous Function?
= 17
Notice that we had to evaluate an integral with the variable t as a bound. Once we had applied the
Fundamental Theorem of Calculus and plugged in t, this integral became a continuous function and we
could evaluate the limit.
Notice also the strange role the limit played in this computation. Usually we take limits to see what
value a changing function approaches. Our function has the same value for any choice of t (make sure
you see why), so technically we were taking the limit of a constant function. The limit was a purely
computational tool.
Remark
The discontinuity at x = 2 meant that we were stuck with an integral
R
t
2
f(x) dx. With a less well-
behaved function we might have also needed an integral on the left side of 2, like
R
2
s
f(x) dx. However,
these two integrals can always be sent to zero by a limit, so when solving integrals of discontinuous
functions, we can leave these out of our calculations.
We can summarize the method as follows:
Integrating discontinuous functions
If f (x) is discontinuous at x = c and a ≤ c ≤ b, then
Z
b
a
f(x) dx = lim
t→c
−
Z
t
a
f(x) dx + lim
s→c
+
Z
b
s
f(x) dx
provided that both of these limits exist.
A removable discontinuity should not slow us down even this much. The area under a single point
of discontinuity is zero. We can use the following theorem for a function with any finite number of
removable discontinuities.
Theorem
If f (x) and g(x) are equal on [a, b] except at a finite number of points, then
Z
b
a
f(x) dx =
Z
b
a
g(x) dx.
This theorem eliminates the need to use limits in our example
Z
5
0
f(x) dx =
Z
2
0
f(x)
|{z}
=3x
2
dx +
Z
5
2
f(x)
|{z}
= 10 − 2x
except at x = 2
dx
=
Z
2
0
3x
2
dx +
Z
5
2
10 − 2x dx
Most discontinuities can be handled this way, but there is one type that will still require limits.
124
Q14
Evaluate
Z
1
0
ln x dx.
Q15
Evaluate
Z
4
0
1
√
x
+
1
√
4 − x
dx.
Q16
Evaluate
Z
3
0
2
w
2
dw.
2.5.4
Q17
How large will the base (∆x) of each rectangle be, if we want to approximate:
a
The area over the interval [4, 16] with 3 rectangles?
b
The area over the interval [a, b] with n rectangles?
c
The area over the interval [a, ∞) with n rectangles?
Q18
Compute
Z
∞
3
2
x
dx.
Q19
Compute
Z
0
−∞
e
x
dx.
Q20
Evaluate
Z
∞
0
e
−2x
dx.
Q21
Evaluate
Z
1
0
ln x dx. You may need l’Hˆopital’s rule.
Q22
Compute
Z
∞
3
1
x
3
dx, showing all necessary steps.
135
Q29
Consider the region R below y =
1
x
, above y = 0 and to the right of x = 1.
a
Try to compute the area of R using an integral.
b
Suppose R is rotated around the x-axis to create a solid S. Compute the volume of S.
c
How annoying are the conclusions of
a
and
b
?
Q30
Consider the region in the first quadrant whose boundary is the curves y =
3
x
, y = 2x − 1 and
y = 0.
a
Write the area of this region as an integral in the variable y. Do not evaluate.
b
Suppose this region is rotated around the x-axis. Write the resulting volume using one or
more integrals. Do not evaluate.
137
Section 2.6
Goals:
1 Test the properties of a probability density function.
2 Use probability density function to describe the underlying random variable.
3 Use the uniform, exponential, and normal distributions.
4 Compute probabilities and expected values.
The main problem facing every planner is uncertainty. When will the next epidemic strike? Will the
stock market go up or down? How many rare particles will flow through a detection device? These
outcomes cannot be known ahead of time, but they can be modeled as probabilities. Knowing when the
epidemic is likely to happen can guide our decision of how much to invest in mitigation. Knowing how
many particles are likely to pass through an area can inform us how sensitive our detection device needs
to be.
On the other hand, probabilities can also help us understand what has already happened. Probabilities
tell us whether the results of an experiment are likely to be a coincidence. Is an apparent pattern just
the variation inherent in random sampling, or is it likely to be present if the procedure is repeated? This
is in fact the basic model for statistical reasoning:
1 Assume that the type of pattern you’re looking for does not exist (a null hypothesis).
2 Collect observations.
3 Compute the probability of seeing those observations, given your assumption.
4 If the probability is very low, then the assumption is probably false.
Such reasoning allows us to conclude that survey is representative of the population as a whole. It
allows us understand what outcome will occur on average, or how much outcomes are likely to vary.
Such statistics help us understand the way the world works. We can design our next experiment or plan
our future behavior around that understanding. For example, on average, the stock market goes up.
This is one of the most powerful financial facts available to long-term investors, and it can be grounded
in a probabilistic study of past performance.
Question 2.6.1
What Is a Continuous Probability Distribution?
Definition
A random variable encodes the possible outcomes of a random selection. We use the notation
P (outcome) to denote the probability that a particular outcome occurs. If an outcome is impossible,
we write P (outcome) = 0. If it is certain we write P (outcome) = 1.
138
Example
Our outcome can be any expression concerning the random variable, for instance:
If S is the sum of the rolls of two six-sided dice, then
P (S = 8) =
5
36
.
If T is the number of tails when two coins are flipped then
P (T ≥ 1) =
3
4
.
We can encode these probabilities with a distribution function. The value of the function at each
number a is the probability that the outcome is a.
Example
If T is the number of tails obtained from two fair coins then
f
T
(t) =
1
4
if t = 0
1
2
if t = 1
1
4
if t = 2
0 if t = anything else
Notice
The sum of the probabilities adds to 1.
There are only finitely many values of T that are possible.
What if we wanted to model height with a random variable? No one is exactly 68 inches tall. Even
people who say they are “five feet eight inches” are slightly taller or shorter. A distribution function
like we made for coins is unsuitable. It would have the property f
H
(h) = 0 for all h. To handle this
situation, we need to define a different kind of random variable with a different relationship to a defining
function.
139
Question 2.6.3
What Density Functions Arise Naturally?
Figure: The density function of a uniform distribution
An intuitive but imprecise way to describe a random variable with a uniform distribution is to say that
all outcomes in [a, b] are equally likely. Since every outcome of a continuous random variable occurs with
probability 0, this is unhelpful. X is remarkable, because all outcomes in [a, b] have equal probability
density. To connect this to actual probabilities, we might say that all subintervals of [a, b] are equally
likely to contain the outcome of X, but this is incorrect. X is 3 times as likely to have an outcome in
an interval of length 6 as an interval of length 2. A precise statement would be: the likelihood of the
outcome of X occurring in each subinterval of [a, b] is proportional to the length of the subinterval.
Our second family of random variables naturally measures waiting time. This answer questions like:
when will the next customer come in? When will this device next detect a certain type of ambient
particle? Here is the formal definition.
Definition
Suppose an event happens randomly and uniformly at an average rate of λ times per unit of time (x).
Then the amount of time until it next occurs is given by the exponential distribution:
f
X
(x) =
(
λe
−λx
if 0 ≤ x
0 if x < 0
Observe the following
1 Higher λ means that X is likely to be smaller, as the event occurs sooner.
2 The probability of the event occurring in given interval, given that it did not occur before that
interval, depends only on the length of the interval.
144
Example
Suppose we average our rolls of a six-sided die. As the number of rolls n gets large, we’ll roll each
number close to
n
6
times. The sum of the rolls will be approximately
1
n
6
+ 2
n
6
+ 3
n
6
+ 4
n
6
+ 5
n
6
+ 6
n
6
to compute the average, we divide by n. Fortunately, every term already has an n.
µ = 1
1
6
+ 2
1
6
+ 3
1
6
+ 4
1
6
+ 5
1
6
+ 6
1
6
= 3.5
In general dividing the number of occurrences of the result a in n evaluations of X will be nf
X
(a).
When we divide out n, we obtain the following weighted average:
Formula
The expected value of a (discrete) random variable X with probability distribution function f
X
is
E[X] =
X
x
xf
X
(x)
where x is summed over all possible outcomes of X.
To produce the corresponding formula for a continuous random variable, instead of multiplying
each outcome by its probability and summing, we multiply each output by its density and integrate
Formula
The expected value of a continuous random variable X with probability density function f
X
is
E[X] =
Z
∞
−∞
xf
X
(x) dx
147
Solution
a
We will use the formula. Even after removing the region of 0 density, we are left with an improper
integral. We therefore will compute a limit.
E[X] =
Z
∞
−∞
xf
X
(x) dx
=
Z
0
−∞
x(0) dx +
Z
∞
0
xλe
−λx
dx
= lim
t→∞
Z
t
0
xλe
−λx
dx
= lim
t→∞
− xe
−λx
t
0
−
Z
t
0
−e
−λx
dx
= lim
t→∞
− xe
−λx
−
1
λ
e
−λx
t
0
= lim
t→∞
−te
−λt
− e
−λt
+ 0e
0
+
1
λ
e
0
= lim
t→∞
−te
−λt
− 0 + 0 +
1
λ
=
1
λ
+ lim
t→∞
−
t
e
λt
∞
∞
form
=
1
λ
+ lim
t→∞
−
1
λe
λt
(l’Hˆopital’s rule)
=
1
λ
+ 0
u = x dv = λe
−λx
dx
du = dx v = −e
−λx
by parts
Our final answer is
E[X] =
1
λ
b
X measures the time until an event with average frequency λ occurs. Thus on average, we expect
to wait
1
λ
for it. For example, if an event occurs three times per hour, we would expect to wait
about 20 minutes for it to occur.
149
iii. If you pick a random person, what is the probability that her height is exactly 68 inches?
iv. If I spin a wheel of names, what is the probability that it takes exactly 7 spins to land on my
own name?
Q7
Let X be a continuous random variable. Compute P (X = 13).
Q8
Another book might teach you that P (a < X < b) =
Z
b
a
f
X
(x) dx, instead of P (a ≤ X ≤ b) =
Z
b
a
f
X
(x) dx. Why shouldn’t this bother you?
Q9
Let f
T
(t) be a probability density function of a random variable T . What quantity is represented
by
Z
5
−∞
f
T
(t) dt?
Q10
Let f
X
(x) be a probability density function of a random variable X. What quantity is represented
by
Z
∞
2
f
X
(x) dx?
Q11
Given a density function f
U
(u) for a random variable U, write an integral or integrals to compute
P (4 ≤ U
2
≤ 9).
Q12
Suppose the height of a mature sunflower is given by the random variable H with density function
f
H
(h). If you friend tells you that her sunflower is in the top quintile in height, explain how you
could use f
H
to determine a range that the height of her sunflower must lie in.
2.6.2
Q13
Let W be a random variable with density function
f
W
(w) =
(
36−w
2
144
if 0 ≤ w ≤ 6
0 otherwise
Compute P (2 ≤ W ≤ 9)
Q14
Let T be a random variable with density function
f
T
(t) =
(
3
√
t
2
if 0 ≤ t ≤ 1
0 otherwise
Compute (0 ≤ T ≤
1
4
)
153
Section 2.6
Exercises
2.6.3
Q15
If U is a uniform random variable on [4, 7.5], compute is the probability that U ≤ 5.5.
Q16
If X is a uniform random variable on [2, c] and P (0 ≤ X ≤ 4) = 0.25, what is c?
Q17
If W is an exponential random variable such that P (W ≥ 1) =
2
7
, then compute the value of the
parameter λ in its density function f
W
.
Q18
Juan looks at the density function of an exponential random variable X and says “X is more
likely to have the value 1 than 5.” “That’s silly,” replies Neha, “X has exactly zero probability
of being either of those. They are equally likely.” What do you think of their argument?
2.6.4
Q19
Let f (x) =
(
bx
−3
x ≥ 2
0 x < 2
.
a
Compute a number b so that f is a probability density function.
b
If f is the density function for some random variable Z, compute E[Z].
Q20
Suppose X is a random variable with density function f
X
(x). Suppose f
X
(x) is 0 outside [3, 11]
and decreasing on [3, 11]. Is E[X] greater or less than 7? Explain.
Q21
Suppose X is a continuous random variable with probability density function
f
X
(x) =
(
3
√
x
16
if 0 ≤ x ≤ 4
0 if x > 4 or x < 0
a
In a sentence or two, state what you would need to check to ensure that f
X
(x) is a valid
probability density function. You do not need to actually perform the calculations.
b
Compute E[X].
154
Q22
Explain how you can use the graph of a normal random variable to identify the expected value.
Then compute that value using the expected value formula.
2.6.5
Q23
Give the expected value of a uniform random variable on [5.2, 9.4].
Q24
If the uniform random variable on [a, b] has expected value 7, and a = 3, what is b?
Q25
In this example, we divided by (b − a). What would happen if b − a = 0?
Q26
If you know the expected value µ of a uniform random variable X, what is the probability that
≥ µ? Is this problem answerable without the assumption that X is uniform? Explain.
2.6.6
Q27
Suppose X and Y are two different exponential random variables modeling events that occur on
average p and 2p times per day respectively. How are their expected values related?
Q28
Does our expected value formula result sense if λ < 0? Why should this not bother us.
Q29
On bus route 70, 3 buses come per hour, on average.
a
Write a probability density function for X, the amount of time until the next bus arrives.
b
What is the expected amount of time until the next bus comes?
c
How likely is it that you will wait more than an hour for the bus?
Q30
If X is an exponential random variable, what is the probability that X ≤ E[X].
155
d
What is the average value of W ?
e
Can you compute the median value of W ? This might be easier with geometry than with
calculus.
Q37
Suppose that g(x) is a probability distribution for a random variable X and g(x) = 0 for all
x ≥ 0.
a
What is the value of
Z
0
−∞
g(x) dx? Justify your answer with a sentence or computation.
b
Give a formula for E[X]. Is it positive or negative? Justify your answer in a sentence or two.
Q38
Recall that an even function f (x) has the property that f(x) = f(−x) for all x. If the density
function of a random variable is even, what does that say about the expected value and median
of X? Explain your answer.
157
2.7.4
Q17
Suppose that you are told that the average value of f(x) from x = a to x = b is 0.
a
What geometric information does this give you about the graph y = f(x). Be specific.
b
Suppose you are told that f (x) is non-negative for all x. How does that affect your answer
to
a
?
Q18
Suppose you know that f(x) =
3
√
x has a positive average value over [a, b]. What does this tell
you about a and b?
2.7.5
Q19
Compute the average value of f(x) = x
2
over [0, 3].
Q20
Compute the average value of g(x) = x sin x over [0, π].
Q21
Compute the average value of f(x) = x
2
e
3x
over [0, 2]
Q22
What happens if we try to compute the average value of h(x) =
1
x
2
over [−2, 2]?
2.7.6
Q23
Compute the variance of an exponential random variable X. Note that you may already know
some components of this computation from earlier examples and exercises.
Q24
Compute the variance of a uniform random variable on [2, 7].
167
>