Advanced Caclulus for Data Science

Chapter 2

Advanced Integration and

Applications

This chapter covers a variety of methods and applications for single-variable integrals. The ﬁrst two

sections lay the groundwork for multivariable integration by exploring the connections between integration

and geometry. One section touches on approximation methods for integrals. Other sections prepare us

for our goal: applying integration to probability and statistics.

Contents

2.1 Area Between Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.2 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

2.3 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

2.4 Approximate Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

2.6 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

2.7 Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 158

Section 2.1

Area Between Curves

Goals:

1 Use integrals to calculate the geometric area of a region.

The Fundamental Theorem of Calculus relates the change in a function to the area under a curve.

Modern scientists have seized upon integration as a way to study change, whether they are measuring

a chemical reaction, the position of a particle, or economic activity. The geometric applications are

irrelevant to most consumers of calculus.

Historically, these methods were exciting to scholars who had been limited to area formulas for circles

and triangles. Now any shape that was deﬁned by an algebraic function was fair game. In this section

we push integration beyond areas under a curve to areas bounded by two or more curves. This gives us

the ability to measure a wide variety of shapes, but geometry is not our end goal. Instead the goal is

to study how integration works on these oddly shaped regions. We will ﬁnd that the methods of this

section return to relevance when it is time to integrate functions of more than one variable.

Question 2.1.1

How Is the Integral Related to Geometric Area?

When we deﬁned the deﬁnite integral, we were attempting to compute the area under a curve.

However, our methods introduced a glitch. Consider the following example.

This region has an area of

, but

f(x) dx = −

Figure: A region below the x-axis and above y = f (x)

We were taught that the integral does not measure geometric area, but instead signed area. Area

below the x axis counts as negative.

Why does this happen? Recall the deﬁnition of the deﬁnite integral.

Deﬁnition

The integral is computed by the following limit

f(x) dx = lim

∆x→0

f(x

∗

)∆x

This limit takes better and better approximations of the area. The approximation is a sum of

rectangles, whose area is height × width. All the rectangles have width ∆x, but their heights vary, and

we used the height of the graph y = f(x) to measure them. This works ﬁne when f(x) is positive.

When f (x) < 0, the product f(x

∗

)∆x computes a negative “area” for each rectangle.

Figure: An approximation by rectangles of negative height

In this example the resolution of this glitch is straightforward. Eliminating the negative sign, we

obtain the correct area. However, we can imagine a region that requires a more sophisticated approach.

Question 2.1.2

What Integral Computes the Geometric Area Between Two Graphs?

Suppose we want to know the area between the graphs y = f (x) and y = g(x) for some interval

a ≤ x ≤ b. We can approximate this by rectangles. As the number of rectangles increases, the

approximation becomes more accurate.

Question 2.1.2

What Integral Computes the Geometric Area Between Two Graphs?

Figure: The region between y = f(x) and y = g(x), approximated by rectangles

Let’s derive a formula for this rectangle approximation.

We let x

∗

denote the left endpoint of each subin-

terval. The rectangles have width ∆x and height

g(x

∗

) − f (x

∗

). We compute:

Area = lim

∆x→0

(g(x

∗

) − f (x

∗

))∆x

This limit exactly matches the deﬁnition of a deﬁnite

integral. The function being integrated is g(x) −

f(x). Thus we can compute the area below y =

g(x) and above y = f(x) by integrating g(x) −f(x)

from a to b.

Main Idea

The area above y = f(x) and below y = g(x) from x = a to x = b is computed

g(x) − f(x) dx.

Example 2.1.3

The Area Between Two Curves

Suppose we want to compute the area between y =

√

x and y = x −

√

x from x = 6 to x = 12.

How do we know which graph is on top and which is on the bottom?

The height of a graph is the value of the function. We can evaluate the function at some x in the

interval [6, 12]. The most convenient x is x = 9.

√

9 = 3 9 −

√

9 = 6

So at x = 9, y = x −

√

x is above y =

√

Exercise

We’ve established that at x = 9, y = x −

√

x is above y =

√

x. Unfortunately there are inﬁnitely many

points between x = 6 and x = 12. How can we decide which graph is on top at each of them?

1 Does the graph of y =

√

x intersect the graph of y = x −

√

x between x = 6 and x = 12?

2 What theorem could we use to argue that if y =

√

x is ever above y = x −

√

x then the graphs

must have intersected?

Solution

1 To test where the graphs intersect, we set the functions equal to each other.

√

x = x −

√

0 = x − 2

√

0 =

√

x − 2) (factor)

x = 0 or

√

x − 2 = 0

x = 0 or 4

Neither of these is in [6, 12].

2 The Intermediate Value Theorem tells us that these functions cannot switch places without inter-

secting. Switching places means that the diﬀerence (x −

√

x) −(

√

x) would change from positive

to negative. As this is a continuous function, the Intermediate Value Theorem says there must

be some point along the way where (x −

√

x) − (

√

x) = 0. We’ve already shown that all those

points lie outside the interval, so we can conclude that y = x −

√

x is above y =

√

x over the

entire interval [6, 12].

The ﬁgure below conﬁrms that y = x −

√

x is on top for all x in [6, 12].

Example 2.1.3

The Area Between Two Curves

Figure: An approximation of the area between y = x −

√

x and y =

√

Main Ideas

Plugging a test point into f(x) and g(x) tells us which graph is above the other.

If the functions are continuous, then solving f (x) = g(x) computes the only points where the

graphs can change positions.

Example 2.1.4

The Area Enclosed by Two Curves

Set up an integral that computes the area enclosed between the curves y = x

and y = 3 − x − x

Figure: The area enclosed by two parabolas

Solution

These are parabolas. If they enclose any area, the downward facing parabola must lie above the upward

facing parabola. This tells us we are integrating

3 − x − x

− x

But what are the bounds of integration? To know this we must ﬁnd the points where the graphs

intersect.

3 − x − x

= x

0 = 2x

+ x − 3

0 = (2x + 3)(x − 1)

x = −

or 1

The area is computed

Area=

−3/2

3 − x − x

− x

Main Ideas

To determine the range of x values that deﬁne an enclosed region, solve for the intersection points

between the graphs.

Sketching the graphs can be a time-saver and a reality check for your answer.

Example 2.1.5

The Area Enclosed by Two Curves that Intersect More than Twice

Compute the area enclosed by f(x) = x

− 10x and g(x) = 3x

Example 2.1.5

The Area Enclosed by Two Curves that Intersect More than Twice

Solution

To ﬁnd the intersections we set f(x) = g(x) and solve:

− 10x = 3x

− 3x

− 10x = 0

x(x − 5)(x + 2) = 0

x =0, 5, or − 2

Our region is bounded between x = −2 and x = 5, but one graph does not need to be above the other

for the entire region. The graphs intersect at x = 0 so one graph might be on top for [−2, 0], while the

other is on top for [0, 5]. To ﬁnd out which is which we could evaluate at test points (we would need

two). Alternately, since we’ve already factored f(x) − g(x) = x(x − 5)(x + 2) we can perform a sign

analysis:

x − − + +

(x − 5) − − − +

(x + 2) − + + +

f(x) − g(x) − + − +

−2 0 5

Thus x

− 10x > 3x

on [−2, 0] and x

− 10x < 3x

on [0, 5]. The enclosed area is computed by:

Area =

−2

− 10x − 3x

dx +

− x

+ 10x dx

− 5x

− x



−2

+ x

−

+ 5x



= (0 − 0 − 0 − 4 + 20 − 8) +



125 −

625

+ 125 − 0 + 0 − 0



407

Main Ideas

With more intersections, we must check the region between each pair of intersections to see which

graph is on top.

It can be more eﬃcient to make a sign analysis chart.

Sketching the graphs may be more diﬃcult. If you can do it, it will corroborate (or correct) your

calculations.

Example 2.1.6

A Region without a Single Top Curve

Compute the area enclosed by the curves y = 1, y =

and y = 2

√

We should start by drawing this region and ﬁnding the coordinates of the intersections.

There are three intersections to solve for, one using each pair of equations.

= 2

√

= 1 2

√

x = 1

16 = 2x

16 = x

√

x =

8 = x

x = 4 x = 16 x =

If we write this area as an integral

g(x) − f(x) dx, the top function would need to be piece-wise:

g(x) =

(

√

x if

≤ x ≤ 4

if 4 ≤ x ≤ 16

We don’t know the anti-derivative of a piece-wise function. Instead, we consider a few diﬀerent ap-

proaches. Since the upper boundary is deﬁned by a diﬀerent function for diﬀerent values of x, one

approach is to break the region into two integrals.

Figure: Two subregions whose areas can be expressed by integrals

The area of the region on the left is

√

x−1 dx. The are of the region on the right is

−1 dx.

Adding these together gives the total enclosed area.

Another approach would be to obtain the area by subtraction. Find the following two areas on the

diagram:

√

x − 1 dx

√

x −

Example 2.1.6

A Region without a Single Top Curve

You should be able to convince yourself that

Enclosed Area =

√

x − 1 dx −

√

x −

Both of these approaches require us to evaluate two integrals. That is unavoidable because our inte-

grals are limits of an approximation by rectangles of diﬀerent heights, and those heights are determined

by diﬀerent enclosing graphs, depending on which x value we measure at. For this particular region,

there is a way to avoid this.

Instead we can approximate the region by rectangles of diﬀerent widths.

Notice the left endpoint always lies on y = 2

√

x and the right endpoint always lies on y =

. As

the height of the rectangles goes to 0, the approximation becomes exact.

Let’s derive a formula for this rectangle approximation and compute the exact area.

Let ∆y be the height of each rectangle. The widths are given by the horizontal distance between

the graph y = 2

√

x and y =

at the heights y

∗

corresponding to the bottom of each rectangle.

Horizontal distance is the diﬀerence in x values. What x values correspond to y

∗

? We can plug in y

∗

and solve for x.

∗

= 2

√

x y

∗

√

x xy

∗

= 16

∗

)

= x x =

∗

These computations should be familiar. Finding x in terms of y is called ﬁnding the inverse function.

These inverse functions give the left and right bounds of our region. To ﬁnd the area, we take a sum

of the areas of these rectangles of diﬀerent widths. Then we take a limit. Notice that to make the

width positive we subtract the smaller x value from the larger x value. Geometrically, this is the right

endpoint



∗



minus the left endpoint



∗

)



lim

∆y→0



∗

−

∗

)



| {z }

width

∆y

|{z}

height



−



This limit is an integral, but the variable of integration is y, not x. The bounds of integration are

the set of y values in the region. The lowest point in the region is at y = 1. The highest is at y = 4.

We evaluate the integral using the Fundamental Theorem of Calculus, but with y instead of x.

Area Enclosed =



−



= 16 ln |y| −





16 ln 4 −



−



16 ln 1 −



= 16 ln 4 −

Main Idea

The area to the right of x = f

−1

(y) and to the left of x = g

−1

(y) for y from a to b can be computed

−1

(y) − f

−1

(y) dy.

Strategy

Changing an integral to dy may be more work than breaking it into two or more parts. When solving

an area problem, consider both methods and use the one that seems more promising. If you run into

problems with your chosen approach, give the other method a try.

Section 2.1

Exercises

Summary Questions

What is the geometric signiﬁcance of f(x)−g(x) in the formula for the area between two graphs?

How do we determine which curve is the top of a region and which is the bottom? Describe the

diﬃculties that can arise.

How do we use boundaries of the form y = g(x) and y = f(x) in an dy-integral to compute

geometric area?

When setting up a dy-integral, how can we visually identify which graph’s function will be sub-

tracted from which?

An integral can be positive or negative. If we are solving for area (which may not be negative)

describe the steps we take to guarantee our area is positive.

Explain the diﬀerence between “The region enclosed by y = f(x) and y = g(x)” and “The region

f(x) ≤ y ≤ g(x).”

2.1.1

Suppose the graph y = f(x) is above the x-axis.

How much would the geometric area between y = f (x) and the x-axis for a ≤ x ≤ b increase

if the graph were shifted up by k units. Try to argue geometrically or with a visual.

Would shifting the graph down by k instead decrease the area by the same amount? Draw

a graph for which it wouldn’t.

How would we use integrals to calculate the geometric area of the shaded region below?

The expressions

|f(x)| dx and



f(x) dx



are not equivalent. Explain why, and draw the graph of a function on which these expressions

disagree.

Q10

Given a diﬀerentiable function f(x), the signed area between the graph y = f

′

(x) and the x-axis

from x = a to x = b is denoted

′

(x) dx and is equal to the change in f(x) from x = a to

x = b. In what sense does the geometric area between the graph of y = f

′

(x) and the x-axis

represent a change in f (x)?

2.1.2

Q11

Suppose y = f (x) and y = g(x) are below the x-axis. What integral computes the geometric

area between them. How does this compare to the situation when they are above the x-axis?

Q12

Here is another way to derive the formula for the area between curves. Consider the functions

graphed here:

Section 2.1

Exercises

Indicate on the graph what areas are denoted by

f(x) dx and

g(x) dx. How are they

related to the region between y = f(x) and y = g(x).

g(x) dx −

f(x) dx equivalent to the expression for area we derived in 2.1.2? What

integral rule(s) would you apply to justify this?

If y = f(x) is below the x-axis, how does this change the meaning of

f(x) dx? Does the

formula from

still work? Explain.

2.1.3

Q13

Compute the area between y = 4x and y = x

from x = 3 to x = 5

Q14

Compute the area between y = e

and y = sin(πx) from x = −1 to x = 0

2.1.4

Q15

Compute the area enclosed by y =

√

x and y = x

Q16

Compute the area enclosed by y = x

− 5 and y = 4x.

Q17

Compute the area enclosed by y = x

, y = 2x − 1 and x = −3.

Q18

Compute the area enclosed by y = x + 2 and y = 3

√

2.1.5

Q19

Compute the area between y = sin x and y = cos x over the interval [0, 2π].

Q20

Erica and Carter were asked to compute the area enclosed by y = 4x and y = x

. They agree

that 4x = x

when x = −2 and when x = 2. Erica thinks the area is

−2

4x − x

Carter thinks it is

−2

− 4x dx

Who is correct?

How do you think the mistake could reasonably have happened, and how can you avoid it?

Q21

Compute the area enclosed by y = xe

, and y = ex.

Q22

Set up an integral or integrals to compute the region enclosed by the curves f (x) = x

− 4)

and g(x) = x

− 4).

Q23

Often the top curve of an enclosed region alternates between f(x) and g(x) at each intersection.

Can you explain what about the previous problem caused this pattern to fail?

Q24

Suppose y = f(x) and y = g(x) intersect multiple times, with x = a their leftmost intersection

and x = b their rightmost. We can express the area enclosed between them by

|g(x)−f(x)| dx.

Explain why this formula works.

Explain why this formula isn’t partcilaularly helpful.

Section 2.1

Exercises

2.1.6

Q25

Compute the area enclosed by y = 6, y =

√

x and y = −2x

Q26

Compute the area enclosed by y = e

, y = e

4−x

, and y = 1.

Q27

You have been taught at least three ways to set up an expression that will compute the area

enclosed by (all of) y = 3, y = 3x, y = 9 and x + y = −5. Set up all the methods you know

that will do this. You do not need to evaluate them.

Q28

Write the area in the ﬁrst quadrant enclosed by y =

√

3x, y = 0, and x

+ y

= 4 as a single

integral.

Q29

Write the area enclosed by y =

√

x and y = x

an integral in x

an integral in y

Q30

Write the area in the ﬁrst quadrant enclosed by y = x

, y = 3x

, and y = 18 − 3x as

a sum of integrals in x

a sum of integrals in y

Extension and Synthesis

Q31

Suppose you’ve found that y = f(x) and y = g(x) intersect at x = a (along with perhaps other

places). What could knowing the values of f

′

(a) and g

′

(a) tell you about where each graph is

above the other? Be as speciﬁc as possible.

Q32

Suppose you are given that for all x:

′

(x) > 0

′

(x) < 0

We approximate area between y = f(x) and y = g(x) from x = a to x = b by rectangles,

letting the x

∗

be the right endpoints of each subinterval. What can we say about whether the

approximation will overestimate or underestimate the true area?

Section 2.2

Volumes

Goals:

1 Recognize cross sections of a solid object.

2 Write the area of each cross section as a function.

3 Compute the volume of a solid.

4 Visualize and compute the volume of a solid of revolution.

The motivation for the deﬁnite integral was computing an area. However, the deﬁnition turns out

to be more useful than that. With the correct setup, we can express a volume as an integral as well.

Question 2.2.1

What Is Volume?

Dimension

In mathematics, we deﬁne the dimension of an object. Dimension measures the number of degrees of

freedom available to a point traveling in the object.

The deﬁnition may not match your intuition for dimension. For example, you only encounter a

parabola in two (or more)-dimensional space. However, the parabola itself is one-dimensional. If you

imagine that you are an insect crawling on the parabola, you can only travel forward or backward, not

side to side. If you were small enough, the parabola would seem indistinguishable from a line.

Example

1 A plane is two dimensional. You can travel left/right or up/down.

2 A circle is one dimensional. You can only travel clockwise/counterclockwise.

3 A point is zero dimensional. There is nowhere to travel within it.

We measure objects of diﬀerent dimensions diﬀerently. In all cases, measuring is counting how many

units of measurement ﬁt inside the object. A 6 unit by 3 unit rectangle has area 18 square units, because

18 unit squares can ﬁt inside it. For less regular objects we need to consider parts of square units. This

requires a lot of work to do formally, but the intuition should be straightforward.

Question 2.2.1

What Is Volume?

Figure: Objects of several dimensions and their units of measurement

We use diﬀerent names to describe objects and their measurements in diﬀerent dimensions:

Dimension Names Measurement

0 point none

1 line, circle, curve length

2 square, polygon, disc, sphere, surface area

3 cube, polyhedron, ball, solid volume

Vocabulary Check

It doesn’t make sense to talk about the volume of a surface. No unit cubes will ﬁt inside it.

Similarly it doesn’t make sense to talk about the area of a solid. Inﬁnitely many unit squares will ﬁt

in any solid. However, solids have boundary surfaces, and we do sometimes measure their areas.

The simplest solid to measure is a (right) prism. If a prism has height h, we can see that each unit

square (or part thereof) in the base has h unit cubes stacked above it. Thus we have

Formula for Volume of a Prism

volume = area of base ×height

Figure: A prism divided into unit cubes and its base divided into unit squares.

Here we see the base of the prism and the square units (or parts thereof) that it contains. The prism

has height 3.5. We can see there are 3.5 cubic units above each square unit in the base.

You may be questioning the relevance of studying areas and volumes in the 21st century. Few people

need to compute geometric measurements in their careers. However, geometry is not the end goal of

this investigation.

Remark

Our motivation for studying solids is not to solve geometry problems. Recall that the deﬁnite integral

allowed us to express total change as an area:

total change = rate of change × time

f(b) − f (a) =

′

(t) dt

This allowed us to use our geometric intuition of areas to better understand rates of change. Similarly,

volume will allow us to use geometry understand diﬀerent types of rates later on.

Question 2.2.2

How Do We Visualize 3-Dimensional Solids?

Without computer graphics, it can be diﬃcult to visualize anything but the simplest solids. Taking

an arbitrary solid like a lamp or a sculpture, computing its volume by ﬁlling it with cubes is a hopeless

endeavor (though a computer could make a decent estimate using small enough cubes). In the absence

of a computer rendering, how do we give our brains a visual reference, and how can we leverage this to

make measurements? We use cross sections.

Deﬁnition

A cross section of a solid object is its intersection with some transversal plane.

Transversal means the plane cuts across the solid. In the case of this square-based pyramid, a

transversal plane parallel to the base intersects the pyramid in a square. If it intersects at a diﬀerent

height, the intersection would be larger or smaller. If it intersects at a diﬀerent angle, it wouldn’t produce

a square at all.

Figure: A cross section of a pyramid

A solid can be reassembled from its cross sections. This is valuable because cross sections are two-

dimensional, making them easier to draw or visualize. If you have a set of parallel cross sections, you

can imagine them side by side and infer the shape of the original solid.

Figure: A set of parallel cross sections of a solid

Question 2.2.3

How Can We Approximate or Compute the Volume of a Non-Prism Solid?

Suppose we want to ﬁnd the volume of a pyramid. Diﬀerent square units of the base have a diﬀerent

number of cubic units above them. Thus we need a more robust approach than counting cubes.

Figure: A pyramid with its base divided into unit squares

We will approximate the pyramid by prisms, whose bases are cross sections.

Question 2.2.3

How Can We Approximate or Compute the Volume of a Non-Prism Solid?

Figure: A pyramid approximated by prisms

The key insight is to represent the diﬀerent heights of these cross sections by the variable x. We can

imagine the x-axis running through the solid in the direction of its height. The bases of the prisms are

cross sections. We let x

∗

denote the height at which the i

prism’s base lies. The distance between the

heights x

∗

is denoted ∆x, which is also the height of each prism. At diﬀerent heights, we have diﬀerent

cross sections with diﬀerent areas. Area is what we really care about, since we want to compute the

volume of these prisms. We write cross sectional area as a function.

A(x) = Area of the cross section at height x

The sum of the volumes of these prisms can be written:

A(x

∗

)∆x.

Taking a limit gives the exact volume of the solid:

Volume = lim

∆x→0

A(x

∗

)∆x

Notice that this is ﬁts the deﬁnition of a deﬁnite integral, where A(x) is the function being integrated.

That is excellent news for us. Instead of having to learn a new way of evaluating this limit, we can use

the tools of integration that we already know.

Theorem

If the cross section of a solid, perpendicular to the x-axis, has area A(x) at each x, then the volume of

the solid is

A(x) dx

where a and b are the values of x at the bottom and top of the solid.

Example 2.2.4

A Solid with Its Cross-Sections Given

Suppose a solid S extends from x = 2 to x = 6 and the cross section at each x is a right triangle

of height

and base x

. Compute the volume of S.

Solution

We will let the x direction be the height of our solid. Then the cross sectional area at each x is the area

of the triangle at that x.

A(x) =

bh =

Integrating this from x = 2 to x = 6 gives the volume.

Volume =

A(x) dx

x dx



36 −

= 8

The volume is 8 cubic units.

Example 2.2.5

A Solid Obtained by Rotation

Suppose the region under the graph y =

x+1

from x = 1 to x = 4 is rotated around the x-axis.

Compute the volume of the resulting solid.

Example 2.2.5

A Solid Obtained by Rotation

Figure: The solid obtained by rotating the region under y =

x+1

about the x-axis

Solution

When we cut the region under the graph perpendicular to the x-axis, we obtain a line segment whose

height is the value of the function. When that line segment is rotated around the axis, it sweeps out a

circle, with the line segment as the radius. We can use the formula for the area of a circle.

A(x) = πr

= π



x + 1



25π

(x + 1)

We apply our volume formula.

Volume =

A(x) dx

25π

(x + 1)

25π

= −

25π



= −

25π

u = x + 1 x = 1 ⇒ u = 2

du = dx x = 5 ⇒ u = 6

u-substitution

The volume of the solid is

25π

cubic units.

Main Idea

When the region under a graph y = f (x) is rotated around the x-axis, the cross sections are discs of

radius f (x). Their areas are π[f(x)]

Example 2.2.6

A Solid Deﬁned by Its Base

Suppose we have a solid S with the following properties:

The base of S is the region enclosed by y = 0 and y = 4x − x

The cross-sections of S perpendicular to the x-axis are trapezoids which have one base in the base

of S, another base twice as long, and whose heights are 6 units.

Compute the volume of S.

Solution

We ﬁnd the x-bounds of S by computing the x-bounds of the base. We solve

0 = 4x − x

0 = x(4 − x)

x − 0 or 4

So x ranges from 0 to 4. The base of the trapezoid at each x is the height from y = 0 to y = 4x −x

Note 4x −x

> 0 when 0 < x < 4. Thus the base b

= 4x − x

. The other base is twice as long, so it

is 8x − 2x

. The height is 6, regardless of x.

A(x) =

+ b

)h area of a trapezoid

(4x − x

+ 8x − 2x

))6

= 36x − 9x

Volume =

36x − 9x

= 18x

− 3x



= 96

Figure: A solid with base between two graphs and trapezoidal cross-sections

Example 2.2.6

A Solid Deﬁned by Its Base

Main Idea

The cross section of the base of a solid is a segment. If we know what role this segment plays in the

cross section of the solid, we can use the expression for the length of this segment to derive an expression

for A(x).

Remark

Notice it is not necessary to be able to visualize the solid to compute its volume from cross sections. It

is not even necessary to know what the cross-sections look like precisely. For instance, our trapezoids

may or may not have a right angle. As long as we can compute the area, the exact shape is irrelevant.

Example 2.2.7

A Solid Described by Measurements

Compute the volume of a pyramid with a square base of side length s and a height of h.

Solution

Let x = 0 be the base of the pyramid and x = h be the vertex. The cross sections are squares. Since

the edges of the pyramid are straight, the squares shrink linearly from s at x = 0 to 0 at x = h. The

line that goes through these two points is

Side length = −

x + s

The cross sections have area

A(x) = (Side length)



−

x + s



= s



−

x + 1



We can plug this into the formula for volume.

Volume =



−

x + 1



= s



−

+ x





= s



−

+ h − 0



= s



− 1 + 1



The volume of the pyramid in cubic units is V =

Section 2.2

Exercises

Summary Questions

Describe how a cross section of a solid is produced.

What is the signiﬁcance of the function A(x) in the formula for the volume of a solid?

What shapes do we use to approximate the volume of a solid? Why do we choose that shape?

When we rotate the region under y = f(x) around the x axis, how do we compute the area of

each cross-section?

2.2.1

Which of the following shapes have (nonzero) volume?

a square

a ball

a sphere

a cube

a cone

a triangle

Suppose I have a solid S. I tried to ﬁt a unit cube into S but I couldn’t do it, no matter where

I placed the cube or how I rotated it. I conclude that the volume of S is less than 1 unit cube.

What do you think of my conclusion?

Will the volume of an object be greater is measured in cubic centimeters or cubic inches? Explain

using the deﬁnition of how we measure volume.

Suppose I create a solid by stacking a cone on top of a cylinder. How is the volume of my

new solid related to the volume of the cone and the volume of the cylinder? Explain using the

deﬁnition of how we measure volume.

Section 2.2

Exercises

2.2.2

Let S be a sphere of radius 5 centered at the origin. What are the cross sections, perpendicular

to the x-axis? How do they change as you travel along the axis from −5 to 5?

Q10

Describe or draw the cross sections of the pyramid below when it is cut by planes parallel to the

one pictured.

Q11

Suppose all of the cross sections of a solid S, perpendicular to the height, are identical (same

shape and same size). What kind of solid is S?

Q12

Describe the cross sections of a cube

perpendicular to an edge.

perpendicular to the line connecting the midpoints to two opposite edges.

perpendicular to the diagonal that connects two opposite vertices.

2.2.3

Q13

Suppose I’m trying to approximate the volume of a solid S of height 12 using four prisms of equal

height. Supoose those prisms have volumes 5.1, 6, 7.2 and 9.6

What is the approximate volume of S?

What are the areas of the cross sections I used to produce each prism?

Q14

Suppose I’m trying to approximate the volume of the half-ball below by prisms. I subdivide the

height into n subheights and use the cross section at the left hand side of each as the base of each

prism. Will I overestimate or underestimate the volume? Explain how you know in a sentence or

two.

Q15

Produce an approximation of the volume of a pyramid with height 9 and square base of side

length 6 using 3 prisms. There are multiple correct answer to this, corresponding to diﬀerent

choices of where to take the cross sections.

Q16

Suppose a solid S has height 16. Suppose all of its cross-sections perpendicular to the height

have a diﬀerent shape, but all of those shapes have area 5.

What is the volume of S?

Do you really need calculus to solve

? Discuss.

2.2.4

Q17

Compute the volume of the solid between x = 0 and x = 3 whose cross sections at each x are

squares of side length e

Q18

Compute the volume of the solid between x = 0 and x = 2 whose cross sections at each x are

trapezoids of bases x + 1 and x + 3 and height x

Q19

Compute the volume of the solid whose cross sections, perpendicular to the x-axis, are triangles

whose bases lie between y = 3x and y = x

from x = 0 to x = 3 and whose heights are equal

to the length of their bases.

Section 2.2

Exercises

Q20

Compute the volume of a solid between x = 1 and x = e

whose cross sections perpendicular to

the x-axis are rectangles of base ln x and height

ln x

2.2.5

Q21

Compute the volume of the solid created by rotating the region under y =

√

x from x = 0 to

x = 9 around the x-axis.

Q22

Consider the semidisk of radius 3 below:

Write a function y = f(x) that deﬁnes the boundary of this semidisk.

Suppose this semidisk is rotated around the x-axis. Describe the resulting solid.

Compute A(x), the area of the cross section at each value of x.

Write and evaluate an integral that computes the volume the solid of rotation.

Q23

Compute the volume of the solid created by rotating the region y = 4 − x

from x = −2 to

x = 2 about the x-axis.

Q24

Compute the volume of the solid created by rotating a trapezoid with vertices (2, 0), (5, 0), (5, 8)

and (2, 2) around the x-axis.

2.2.6

Q25

Compute the volume of a solid whose base is the triangle under y = −

x+3 in the ﬁrst quadrant

and whose cross sections, perpendicular to the x-axis are triangles of height 8.

Q26

Compute the volume of a solid whose base is the region enclosed by y =

√

x and y =

and

whose cross sections, perpendicular to the x-axis are squares.

Q27

Compute the volume of a solid whose base is a right triangle with legs 4 and 3 and whose cross

sections, perpendicular to the leg of length 4, are semicircles with their diameter in the base.

Q28

Compute the volume of a solid S whose base is the unit disc and whose cross sections perpendicular

to the x-axis are isosceles right triangles, with one leg in the base.

Extension and Synthesis

Q29

Let D be the region enclosed by y = x

− 6x and the x-axis.

Set up an integral that will compute the geometric area of D. You do not need to evaluate

it.

Let S be a solid whose base is D and whose cross sections perpendicular to the x-axis are

semicircles with their diameter in D. Set up an integral that will compute the volume of S.

You do not need to evaluate it.

Q30

Consider the solid obtained by rotating the triangle below around the x-axis.

Describe the shape of the cross sections. Which measurements of this shape depend on x?

Compute a formula for A(x), the area of the cross section at each value of x.

Compute the volume of the solid.

Section 2.2

Exercises

Q31

A solid S of height 12 has the following cross sections areas A(x) at height x. How would you

approximate the volume?

x A(x)

1 10

5 12

7 11

10 7

12 2

Section 2.3

Integration by Parts

Goals:

1 Use the integration by parts formula to ﬁnd anti-derivatives and deﬁnite integrals.

2 Choose appropriate decompositions for integrating by parts.

3 Recognize when applying the formula multiple times will be fruitful.

The product rule gives us a reliable method for computing derivatives of products. If you can

diﬀerentiate each factor in a product, you can diﬀerentiate the entire product. This is not the case for

integration. In this section we add another tool to our limited tool set for integrating a product of two

functions. Even with this method, many problems will be permanently out of reach.

Question 2.3.1

How Do We Compute an Anti-Derivative of a Product of Two Functions?

We reversed the chain rule (which computes derivatives) to compute anti-derivatives of certain

functions. This method is called u-substitution. The du term means that we often end up integrating

a product of functions with this method.

Example

Compute the integral:

Solution

dx =



− 1)

u = x

x = 0 ⇒ u = 0

du = 2x dx x = 3 ⇒ u = 9

u-substitution

Main Idea

u-substitution is extremely fragile. Our example relies on the fact that the factor x is a constant multiple

of the derivative of the inner function, x

Since the chain rule can only produce certain products, we should look for other diﬀerentiation rules

that could produce other products. The product rule is the obvious candidate.

Question 2.3.1

How Do We Compute an Anti-Derivative of a Product of Two Functions?

Reminder

The Product Rule states that if f(x) and g(x) are diﬀerentiable, then

[f(x)g(x)]

′

= f

′

(x)g(x) + g

′

(x)f(x).

Example

Compute

cos x + 2x sin x dx

Solution

This integrand looks like it might be the output of the product rule. If we write

′

(x)g(x) + g

′

(x)f(x) = x

cos x + 2x sin x

we can match up the factors as

f(x) = sin x f

′

(x) = cos x

g(x) = x

′

(x) = 2x

Since

(sin(x)x

) = x

cos x + 2x sin x we can conclude

cos x + 2x sin x dx = sin(x)x

+ c

If anything, this is more fragile than u-substitution. It requires a sum of compatible products. How

can we make the formula [f(x)g(x)]

′

= f

′

(x)g(x) + g

′

(x)f(x) more useful?

A formula that applies to a single product instead of a sum of two products would be much more

useful. We can obtain it by subtracting.

′

(x)g(x) + g

′

(x)f(x) = [f(x)g(x)]

′

product rule

′

(x)g(x) + g

′

(x)f(x) dx = f(x)g(x) + c integrate both sides

′

(x)g(x) dx +

′

(x)f(x) dx = f(x)g(x) + c sum rule of integrals

′

(x)f(x) dx = f(x)g(x) −

′

(x)g(x) subtract from both sides

Notice we don’t need the “+c” anymore. Both sides contain an indeﬁnite integral so the possible

constant of diﬀerence is built in on both sides. We can make one further move to simplify the equation.

Since g

′

(x)dx is the diﬀerential of g(x) and f

′

(x)dx is the diﬀerential of f(x), it is convenient to

represent these functions with variables. u and v are the traditional choices here.

This method is called integration by parts. Here is the formal statement.

Theorem

Suppose an integral can be written

u dv where

u is a function (more precisely u(x)),

and dv is a diﬀerential (more precisely v

′

(x)dx).

We can apply the following formula:

u dv = uv −

v du

The integration by parts formula was not diﬃcult to derive. The more pressing question is whether

it is useful. It replaces the problem of evaluating

u dv with a new problem: evaluating

v du. We

need to see some examples to determine whether it is ever any help at all.

Example 2.3.2

Computing an Anti-derivative Using Integration by Parts

Compute

dx.

Solution

To use integration by parts, we need to look at the integrand xe

and decide which part is u and which

part is dv. Let’s try letting u = x and dv = e

dx. The formula says

u dv = uv −

v du.

We can replace

dx by the right hand side, but we need to know what du and v are. We ﬁnd du

by taking the diﬀerential of u. We ﬁnd v by taking the antiderivative of dv.

u = x =⇒ du = dx

dv = e

dx =⇒ v = e

Now we can apply the integration by parts formula.

dx = xe

−

Notice the integrand vdu is not a product. It is a function whose antiderivative we know. Thus

integration by parts allowed us to replace a product we couldn’t integrate with something we could.

Evaluating the integral, we obtain:

dx = xe

− e

+ c

Example 2.3.2

Computing an Anti-derivative Using Integration by Parts

We can always verify our antiderivatives by diﬀerentiating them. In this case

(xe

− e

+ c) = xe

+ e

(1)

| {z }

product rule

−e

= xe

This veriﬁes that we have found the correct antiderivative of xe

Remark

The most general antiderivative of dv = e

dx would be v = e

+ c. However, we can get away

with using a speciﬁc antiderivative instead. To convince yourself of this, try redoing the problem with

v = e

+ c, and see that the c cancels out of your answer.

Question 2.3.3

How Do We Choose u and dv?

What would happen if we again solved

dx by parts, but set

u = e

dv = x dx?

In this case we compute

−

u = e

dv = x dx

du = e

dx v =

by parts

This is no less correct than our previous application of the formula. It is, however, much less useful.

To evaluate this we need to know an anti-derivative of

, which seems like an even harder problem

than the one we started with. As we can see, the choice of u and dv can determine the success or failure

of integration by parts. So what makes a good choice of u and dv?

In integration by parts, u is going to be diﬀerentiated. This usually makes functions simpler if

anything. dv is going to be integrated. This could make

v du diﬃcult to compute. The following

mnemonic helps us decide which factor to choose as u and which as v.

I.L.A.T.E.

When deciding which factor of a product should be u and which should be dv, put them into the chart

below.

Inverse

functions

Logarithms

Algebraic

expressions

(polyniomials)

Trig

functions

Exponential

functions

better u’s better dv’s

Let’s apply I.L.A.T.E to the following products:

ln x dx

is algebraic. ln x is a logarithm. We should let u = ln x and dv = x

dx.

x sin x dx

x is algebraic. sin x is trigonometric. We should let u = x and dv = sin x dx.

tan

−1

(x) dx

is algebraic. tan

−1

(x) is an inverse function. We should let u = tan

−1

(x) and dv = x

dx.

tan

−1

(x) dx

tan

−1

(x) −

1 + x

tan

−1

(x) −

1 + x

tan

−1

(x) −

1 + x

2x dx

tan

−1

(x) −

u − 1

tan

−1

(x) −

1 −

tan

−1

(x) −

(u − ln |u|) + c

tan

−1

(x) −

(1 + x

− ln |1 + x

|) + c

u = tan

−1

(x) dv = x

du =

1+x

dx v =

by parts

u = 1 + x

du = 2x dx

u-substitution

Example 2.3.4

Using Integration by Parts More than Once

Compute

cos x dx

Solution

I.L.A.T.E. suggests u = x

and dv = cos x dx. When we apply integration by parts to a deﬁnite integral,

the

v du maintains the same bounds of integration. The uv is evaluated at those bounds, because it

is part of the antiderivative.

cos x dx

= x

sin x



−

2x sin x dx

u = x

dv = cos x dx

du = 2x dx v = sin x

by parts

Unfortunately, we don’t know the anti-derivative of 2x sin x. It is still a product. We can try applying

integration by parts again to replace

2x sin x with something we can evaluate.

cos x dx

= x

sin x



−

2x sin x dx

= x

sin x



−



−2x cos x



−

−2 cos x dx



= x

sin x



+ 2x cos x



− 2 sin x



= (π

)(0) − (0)(0) + (2π)(−1) − (0)(1) − (0) + (0)

= −2π

u = 2x dv = sin x dx

du = 2 dx v = −cos x

by parts (again)

Change of Variables?

Notice that despite deﬁning functions u and v, we continue to work in terms of the variable x. Contrast

this with u-substitution where the variable x can be completely eliminated in a deﬁnite integral. That

approach isn’t possible here. We’d have to write v as a function of u. This would be complicated or

impossible.

Example 2.3.5

Using Integration by Parts to Produce an Equation

Compute

cos x dx

Solution

I.L.A.T.E. suggests u = cos x and dv = e

dx. To integrate dv we use a u-substitution. We apply the

integration by parts formula, factoring the −

from the integrand:

cos x dx

cos x −

−

sin x dx

cos x +

sin x dx

u = cos x dv = e

du = −sin x dx v =

by parts

Did this help? We don’t know the antiderivative of e

sin x. Even worse, it doesn’t seem to have

improved in any way. It is just as complicated as what we started with. Our intuition might be to give

up and try another approach. Perhaps I.L.A.T.E. has done us wrong and we should choose a diﬀerent

u and dv. In this case, however, we should reject that intuition and continue. We’ll apply integration

by parts again.

cos x dx

cos x +

sin x dx

cos x +



sin x −

cos x dx



cos x +

sin x −

cos x dx

u = sin x dv = e

du = cos x dx v =

by parts again

Does this help? Again the integrand does not seem to have improved, until we notice that the

integrand is exactly what we began with. We could add

cos x dx to both sides of the equation,

and we could solve for

cos x dx algebraically.

cos x dx =

cos x +

sin x −

cos x dx

cos x dx =

cos x +

sin x + c

cos x dx =



cos x +

sin x



+ c

cos x dx =

cos x +

sin x + c

Example 2.3.5

Using Integration by Parts to Produce an Equation

Main Idea

We’ve seen a variety of techniques to apply when integration by parts does not give us an immediate

answer. The success of integration by parts depends on the

v du term. You might use the following

ﬂow chart to decide how to proceed once you have applied integration by parts.

v du still a product?

Integrate it.

You are done.

Can you apply a u-sub?

Use u-sub.

You are done.

How does

v du compare

to the orginal integrand?

Apply integration by

parts again.

Use another

approach.

Write an equation

and solve.

yes

simpler

similar

complicated

constant multiple

Section 2.3

Exercises

Summary Questions

What type of integrands are good candidates for integration by parts?

How is u handled diﬀerently in integration by parts than in u-substitution?

How is the acronym I.L.A.T.E. used?

Under what conditions would we want to apply integration by parts more than once?

2.3.1

Compute

sin x

1 + x

+ cos x tan

−1

x dx

Which of the following can be integrated using u-substitution?

2.3.3

Evaluate

ln x

dx.

Evaluate

x sin x dx.

Use integration by parts to compute

tan

−1

x dx. Note that

tan

−1

x =

1+x

Q10

We can write

ln x dx as a product:

(1)(ln x) dx.

How does I.L.A.T.E. suggest we proceed?

Use integration by parts to compute the antiderivative.

Q11

Compute

sin

−1

x dx.

Q12

Compute

π/4

tan

−1

x dx.

Section 2.3

Exercises

2.3.4

Q13

Compute

cos(x + 2) dx.

Q14

Compute

dx.

Q15

Compute

−7

sin(x

−2

) dx. Hint: The easiest way to split this is not the correct way. You’ll

need some factors of x to ﬁnd an antiderivative of your trig function.

Q16

Compute

x sin x dx.

2.3.5

Q17

Compute

sin x dx.

Q18

Compute

−x

cos 2x dx.

Extension and Synthesis

Q19

Compute

dx. Choose your dv carefully. You want something that you can integrate.

Q20

Compute

sin(ln x) dx. Perform a u-substitution before trying by parts.

Q21

Compute the area enclosed by y = xe

and y = ex.

Q22

Let S be a solid between x = 0 and x = 3 whose cross-sections perpendicular to the x-axis are

triangles of base x and height e

. Compute the volume of S.

Q23

Let S be the solid obtained by rotating the region below y = ln x from x = 1 to x = 5 about

the x-axis. Compute the volume of S.

Q24

Suppose that S is a solid between x = 1 and x = 5 whose cross sections (perpendicular to the

x-axis) are triangles of height x

and base ln x at each x. Compute the volume of S.

100

Section 2.4

Approximate Integration

Goals:

1 Use several methods to approximate deﬁnite integrals.

2 Assess the accuracy of an approximation.

3 Approximate integrals given incomplete information.

One of the ﬁrst applications of integration is to measure total change. If v(t) is our velocity,

f(t) dt

computes the total displacement between the times a and b. In practice, to evaluate such an integral,

we need to know the antiderivative of f. Can we realistically expect to do this? Except in theoretical

situations (say a physics experiment), we cannot. A person driving a car will not produce a velocity

function that can be expressed in terms of algebra or trigonometry. While every continuous function has

an antiderivative, it doesn’t help us if we don’t know what it is or how to evaluate it.

Our best option in these situations is to approximate the integral. For instance, if we measure

velocity once per second, we could multiply each velocity by one second to approximate the distance

traveled in that second. Adding these up would approximate the total displacement. What we’ve done

is approximated the integral by rectangles of width 1. The natural question to ask is: how accurate is

such an approximation? How can we make it more accurate? These are the questions we’ll need to

address whenever we want to apply calculus to data sets instead of abstract functions.

Question 2.4.1

What x

∗

Can We Use when Approximating an Integral?

Recall the following

Deﬁnition

The deﬁnite integral is given by the formula

f(x) dx = lim

∆x→0

i=1

f(x

∗

)∆x

where ∆x are the lengths of the subintervals of [a, b], and x

∗

is a number in the i

subinterval.

Without the limit (which is diﬃcult or impossible to compute anyway) the sums on the right are

approximations of the integral. Once we choose an x

∗

for each i, we can evaluate this approximation.

The simplest idea is to just use the left endpoint of each subinterval as x

∗

101

Question 2.4.1

What x

∗

Can We Use when Approximating an Integral?

Notation

The notation L

refers to the approximation of

f(x) dx by n rectangles,

i=1

f(x

∗

)∆x,

where the x

∗

are the left endpoints of each subinterval.

Similarly R

refers to the approximation using the right

endpoints for x

∗

approximation

Example 2.4.2

Computing an L

Approximation

Compute an L

approximation of

−1

dx.

Does L

over or underestimate the actual value of

−1

dx?

Solution

Let f(x) = x

. The interval [−1, 5] has length 5 − (−1) = 6. Three rectangles means that

∆x =

= 2. We can divide up the interval to ﬁnd all three subintervals. A diagram is a good

way to avoid mistakes.

−1 1 3 5

The left endpoints are −1, 1 and 3. Our approximation is

i=1

f(x

∗

)∆x

= f(x

∗

)∆x + f (x

∗

)∆x + f (x

∗

)∆x

= ∆x(f(x

∗

) + f (x

∗

) + f (x

∗

))

= 2((−1)

+ 1

+ 3

)

= 22

102

When the function increases, it has more signed area beneath it than then left-endpoint rectangles.

When it decreases it has less. f(x) = x

increases and decreases, but on the interval [−1, 5], it

spends much more time increasing than decreasing. Thus we expect that L

underestimates the

true integral. We can verify our intuition with a computation.

−1

dx =



−1

126

> 22

Question 2.4.3

How Accurate is an L

or R

Approximation?

An approximation is much more useful, if we have some idea of how accurate (or inaccurate) it might

be. The way we quantify this inaccuracy is error.

103

Question 2.4.3

How Accurate is an L

or R

Approximation?

Deﬁnitions

The error in an approximation is given by

error = approximated value − actual value

In a real world approximation, we do not know the exact error (why?). We will settle for putting a

bound on error. This is a number N such that we are sure that

|error| ≤ N.

Determining error bounds can be diﬃcult. Here are some questions to ask.

1 In what circumstances is the approximation exact?

2 What property or measurement seems to correspond to the amount of error?

3 Is there a “worst case scenario” associated to that property or measurement?

The following exercise explores these questions.

Exercise

Draw a function for which L

is always an overestimate.

Draw a function for which L

is always an underestimate.

What has to be true of a function for L

to always be exact?

What familiar calculus measurement appears to measure whether you are in the situations you

described in

104

Solution

A decreasing function will be overestimated by L

An increasing function will be underestimated by L

If L

is always exact, then f (x) is a constant function.

Functions can be classiﬁed as increasing, decreasing or constant by their ﬁrst derivative. f

′

(x)

seems to determine the sign (and maybe size) of the error.

Figure: The error of an L

approximation

Let’s use the results of the exercise to formulate an error bound for L

Higher derivatives seem to produce more negative errors. If we allow for steeper and steeper slopes,

there is no limit to how large the error could be. So let’s put a bound on how big the derivative is.

Suppose we know that f

′

(x) ≤ S on [a, b]. Over each interval [x

, x

i+1

] we know that f(x) lies below

the line of slope S through (x

, f(x

)):

f(x) ≤ S(x − x

) + f (x

)

105

Question 2.4.3

How Accurate is an L

or R

Approximation?

The region below the graph y = f(x) and above the i

rectangle is smaller than the region below the

line and above the rectangle, but we can compute the area of the larger region. It is a triangle. Its base

is ∆x =

b−a

. Its height can be determined by the slope of the line.

Figure: The error and the error bound over one rectangle of an L

approximation

height

base

rise

run

= S area =

(base)(height)

height

∆x

= S =

S∆x

height = S∆x =



b − a



So the error over each subinterval can be no larger than



b−a



. There are n subintervals, so the

total L

approximation underestimates

f(x) dx by no more than

S(b−a)

We can make a similar argument that if f

′

(x) ≥ −S then L

overestimates

f(x) dx by no more

than

S(b−a)

. We can combine these two statements into one by using absolute values. −S ≤ f

′

(x) ≤ S

is rewritten |f

′

(x)| ≤ S.

We could make the same argument for the R

approximation. We’d only need to swapping the

overestimate with the underestimate. The error bounds it produces are the same. Our result can be

stated as a theorem:

Theorem

If E

and E

are the errors in an L

and R

approximations of

f(x) dx and |f

′

(x)| ≤ S on [a, b]

then

| ≤

S(b − a)

and |E

| ≤

S(b − a)

106

Remark

The argument that the line of slope S is the “worst case” scenario is a useful heuristic, but you may be

unsatisﬁed with its lack of rigor. A formal argument relies on the following ideas:

Larger functions have larger integrals. If f(x) ≤ g(x), then

f(x) dx ≤

g(x) dx as long as

a ≤ b.

The Fundamental Theorem of Calculus tells us we can write f (x) = f (x

) +

′

(t)dt.

The line of slope S would be L(x) = f (x

) +

S dt. Over the interval [x

, x

i+1

], comparing these

integrals shows that f(x) ≤ L(x). Thus

i+1

f(x) dx ≤

i+1

L(x) dx. This tells us that there is

more error, and thus a larger underestimate in the left hand approximation of L(x) than there is in the

left hand approximation of f (x).

Example 2.4.4

Computing an E

Bound

Suppose we want to understand the error of an L

approximation of

√

x dx.

What bounds can we put on |f

′

(x)| for our error calculation?

What bound can we put on the error of the L

approximation?

What n would we need in order to guarantee that the L

approximation has error at most

100

What problem would result, if we tried to bound the error of an L

approximation of

√

x dx?

How might you resolve this?

Solution

′

(x) =

√

. This is always positive, and it decreases as x increases. The largest value of f

′

(x)

on [1, 16] occurs when x = 1. If we let S = f

′

(1) =

, we are guaranteed that for all x in [1, 16],

′

(x)| <

107

Example 2.4.4

Computing an E

Bound

By our theorem

| ≤

S(b − a)

(16 − 1)

2(5)

So the error lies between −

and

We can set our error bound (with n as a variable) to be less than

100

and solve for n.

| ≤

(16 − 1)

≤

100

225

≤

100

(225)(100) ≤ 4n

(225)(25) ≤ n

5625 ≤ n

We conclude that the error will be less than

100

as long as n is at least 5625. Note that since this

is an error bound, the actual error may shrink below

100

with fewer rectangles. We would need a

diﬀerent method to verify that, though.

If we want apply our theorem to

√

x dx, we need an S such that |f

′

(x)| ≤ S. This derivative

is f

′

(x) =

√

, which increases without bound as x → 0

. Thus there is no S, and we cannot

apply the error bound theorem.

To get around this problem we could break the interval into two parts and bound them by diﬀerent

methods. We can bound the error on rectangles 2 through n over the interval [∆x, 16] using the

theorem as above. In this case S =

√

∆x

will work. To bound the error over the ﬁrst rectangle

[0, ∆x], note that f (x) is increasing. The ﬁrst rectangle of L

will underestimate the integral,

while the ﬁrst rectangle of R

will overestimate it. Thus the actual error can be no bigger than

the diﬀerence between them, which is

√

∆x∆x −0∆x. The total error can be no larger than the

sum of the error bound over [0, ∆x] and the error bound over [∆x, 16].

108

Question 2.4.5

How Can We Make our Approximation Less Sensitive to Slope?

and R

have large errors when function is increasing or decreasing rapidly. We’ll examine two

approximations that are more resilient. The ﬁrst is the midpoint approximation.

Notation

The M

approximation of

f(x) dx is calculated by

summing:

i=1

f(x

∗

)∆x

where the x

∗

are the midpoints of each subinterval.

Our ﬁnal approximation abandons rectangles entirely. Using trapezoids instead allows for shapes that

reﬂect the value of the function at both the right and left endpoint. In this construction, the trapezoids

are sideways from the way you may be used to looking at them when you learned their area formula

A =

+ b

)h. The parallel bases are vertical. The height is along the x-axis.

Notation

The T

approximation of

f(x) dx is calculated by

summing:

i=1

(f(x

) + f (x

i+1

))∆x

where x

and x

i+1

and the two endpoints of the i

subin-

terval.

can also be calculated as

+ R

Example 2.4.6

A Midpoint Approximation

Calculate the M

approximation of

−1

dx.

Solution

∆x =

5−(−1)

= 2. We can sketch the intervals:

109

Example 2.4.6

A Midpoint Approximation

−1 1 3 5

The midpoints are x

∗

= 0, x

∗

= 2 and x

∗

= 4.

i=1

f(x

∗

)∆x

= ∆x(f(x

∗

) + f (x

∗

) + f (x

∗

))

= 2(0

+ 2

+ 4

)

= 40

Example 2.4.7

A Trapezoid Approximation Using a Table of Values

Approximation has no practical use for algebraic functions. We would rather get the exact answer

by taking an antiderivative and applying the Fundamental Theorem of Calculus. In many real-world

applications, our data about a function consists of a ﬁnite number of measurements. In this case, we

don’t even have an expression for the function, let alone its antiderivative. Here is an example where

approximation is the best we can do.

Suppose we have the following table of values for a function f(x)

x 0 2 4 6 8 10 12 14 16

f(x) 2 5 3 4 7 8 5 4 1

Calculate the T

approximation of

f(x) dx.

Solution

∆x =

14−2

= 4. We can sketch the intervals:

2 6 10 14

110

i=1

(f(x

) + f (x

i+1

))∆x

∆x(f(x

) + f (x

))

∆x(f(2) + f (6) + f(6) + f(10) + f (10) + f(14))

(4)(5 + 4 + 4 + 8 + 8 + 4)

= 66

Question 2.4.8

How Do the Error Bounds of the Approximations Compare?

and M

have zero error when f (x) is a straight line, regardless of slope. Larger errors result

from high rates of curvature. You can see this by using a small number of rectangles/trapezoids and

increasing the curvature of the function. Proving an error bound involves using a quadratic as a “worst

case scenario.” Any function with second derivative smaller than the quadratic will have a smaller error.

Here is the result.

111

Question 2.4.8

How Do the Error Bounds of the Approximations Compare?

Theorem

Suppose |f

′′

(x)| ≤ K for a ≤ x ≤ b. If E

and E

are the error in the trapezoid and midpoint

approximations of

f(x) dx then

| ≤

K(b − a)

12n

and |E

| ≤

K(b − a)

24n

Remarks

1 The maximum error is smaller when the function has less curvature.

2 The error is also reduced by increasing n, the number of subintervals.

3 These formulas indicate that we can usually expect M

to have half as much error as T

4 As n increases, the error bounds for M

and T

approach 0 much more quickly than L

and R

Example 2.4.9

Choosing n to Meet an Error Target

Suppose we wish to approximate

√

x dx by a midpoint approximation. How many rectangles

must we use to guarantee that the error is smaller than

1000

Solution

The midpoint error formula requires use to have a bound K on |f

′′

(x)| on [1, 16].

′

(x) =

√

′′

(x) = −

3/2

As x gets larger, the denominator of f

′′

(x) gets larger, meaning |f

′′

(x)| gets smaller (we could also

verify this by checking the sign of f

′′′

(x)). Thus it will be largest at x = 1. We can safely use the value

there as our K

′′

(x)| ≤ |f

′′

(1)| =

= K

112

We can now apply the error bound formula, leaving n as a variable. We will set the error bound to be

less than

1000

and solve for n.

| ≤



K(b − a)

24n



≤

1000



(16 − 1)

24n



≤

1000

(16 − 1)

24n

≤

1000

all factors are postive

(1000)(15)

(4)(24)

≤ n

isolate n

140, 625

≤ n

375

≤ n square root of both sides

Thus any n bigger than 375/2, will work. We need to use at least 188 rectangles to guarantee that the

error is less than

1000

. Note that we might achieve a suﬃciently small error with fewer rectangles, but

our error bound theorem can not guarantee it.

Section 2.4

Exercises

Summary Questions

How is the error in an approximation deﬁned?

What does the ﬁrst derivative of f(x) tell you about the error in the right-hand approximation

f(x) dx?

As the number of subintervals gets large, which approximation(s) converge most quickly to the

actual value?

Under what situation is a midpoint approximation preferable to a trapezoid approximation? When

would trapezoid be preferable?

113

Section 2.4

Exercises

2.4.1

Seong-ju and Anthony are both approximating

−4

dx with 4 rectangles. They know that

they can use any combination of test points in their rectangles. What is the maximum diﬀerence

between their approximations?

Q6 a

What ∆x and x

∗

’s would you use for the L

approximation of

f(x) dx?

Can you write a general expression for ∆x and the x

∗

’s for

f(x) dx?

2.4.2

Compute the L

approximation of

3/2

dx.

Compute the R

approximation of

x sin



πx



dx.

Compute the L

approximation of

Q10

Compute the L

approximation of

dx.

114

2.4.3

Q11

Compute the theoretical error bound on the L

approximation of

√

x dx.

Q12

Compute the theoretical error bound on the R

approximation of

+ 1

dx.

Q13

How large would n need to be to guarantee that the L

approximation of

log

x dx is within

10000

of the actual value?

Q14

How large would n need to be to guarantee that the R

approximation of

−1

dx is within

1000

of the actual value?

2.4.4

Q15

Suppose we make the following approximations of

4x + 7 dx. Without computing them, put

them in order from least to greatest (some may be equal).

The actual value

Q16

Yiming has a great idea. He approximates

f(x) dx by 12 rectangles. In order to mitigate the

error of left and right hand approximations, he takes the right endpoint of the ﬁrst subinterval as

a test point, but the left endpoint of the second subinterval. He continues to alternate for all 12

subintervals. What is another name for the approximation Yiming has produced?

115

Section 2.4

Exercises

2.4.5

Q17

Compute the T

approximation of

− x dx.

Q18

Compute the M

approximation of

− x dx.

Q19

Compute the M

approximation of

cos



πx



dx.

Q20

Compute the T

approximation of

+2x

2.4.6

Q21

Given the following table of values of f (x)

x 0 3 6 9 12 15 18 21

f(x) 10 13 11 15 13 11 9 12

Compute the M

approximation of

f(x) dx.

Compute the T

approximation of

f(x) dx.

Q22

Given the following table of values of h(x)

x 1 2 3 4 5 6 7 8 9

h(x) 2 −1 3 4 2 1 −3 5 4

Compute the T

approximation of

h(x) dx.

Compute the M

approximation of

h(x) dx.

116

2.4.7

Q23

Let f (x) =

. If you wanted to use a midpoint approximation with n rectangles to approximate

f(x) dx. How large must n be to guarantee your approximation had an error of no more

than

10000

? Your answer should have the form n ≥ . . ., but you do not need to simplify any

arithmetic.

Q24

Suppose we want to approximate

√

x dx.

Produce the T

approximation. Don’t bother simplifying the arithmetic.

Solve for a value n such that T

has an error of at most

1000000

. Don’t simplify the arithmetic.

Q25

Consider the following data about an unknown function g(x).

x 0 2 4 6 8 10 12 14

g(x) 3 5 8 9 7 4 3 1

Compute a M

approximation of

g(x) dx.

If you are given that |g

′′

(x)| <

, what bound can you put on the error of the previous

approximation?

Q26

Sasha is trying to bound the error of her M

approximation of

sin x dx. She computes

′′

(0) = 0 and f

′′

(π) = 0 and so decides to use K = 0.

What does her choice of K imply about the accuracy of her approximation.

Explain what is wrong with Sasha’s reasoning.

Compute the actual error bound for the M

approximation.

117

Section 2.4

Exercises

Extension and Synthesis

Q27

Give an example of a function for which L

and R

are both overestimates on some interval. You

may want to express your function by drawing its graph.

Q28

Suppose we want to estimate

f(x) dx and have the following table of values

x 4 6 8 10 12 14 16 18 20

f(x) 3 5 4 2 −1 6 2 5 8

What estimates are possible with this data?

Would you expect the M

or the T

approximation to give you a better estimate?

Q29

Consider T

, the trapezoid approximation of

dx.

Produce this approximation. Do not simplify the arithmetic.

Compute the theoretical error bound for this approximation.

Explain in a couple sentences how you can tell whether the error is positive or negative. You

can include a diagram, if you’d like to.

Q30

Suppose you are interested in the value of

f(x) dx, but you have only the following data.

x 1 2 6 8 13 14 20 23 25

f(x) 12 19 20 20 28 34 50 57 66

How might you approximate

f(x) dx?

Q31

Suppose you invent your own approximation for a deﬁnite integral. You name it the “ultimate

approximation” and denote it U

. Its formula is

+ R

+ M

+ T

Will U

overestimate or underestimate the integral of a linear function? Justify your answer.

Q32

Suppose we compute an L

approximation of

−7

f(x) dx.

118

What formula that we learned would give a bound on the error of this approximation? Fill in

all the information you can, and indicate the information that you would need to complete

the calculation. Be as speciﬁc as possible.

Suppose that, instead of the information you need for the formula, you were only given that

f is an increasing function on [−7, 13]. How could you compute an error bound in this case?

Justify your answer.

119

Section 2.5

Improper Integrals

Goals:

1 Integrate a function that has a discontinuity.

2 Recognize when an integral is improper.

3 Determine whether an improper integral converges or diverges.

4 Compute the value of an improper integral.

5 Use comparison to determine convergence.

So far we have been content to evaluate integrals of continuous functions over bounded integrals.

Not all functions are continuous. We may be interested in the area under a discontinuous function, even

one with a vertical asymptote. We may be interested in the area under the entire graph of a function,

not just over some subset. In many cases these areas will be inﬁnite, but in some cases they are not.

We will need to develop the methods to determine which case is which.

Question 2.5.1

What Is Inﬁnity?

In this section we’ll be revisiting ideas about inﬁnity.

Notation

The symbol ∞ implies that a variable or function is increasing without bound. It eventually gets bigger

than every number.

∞ is not a number. We cannot evaluate

∞

or ∞ · 0 or tan

−1

(∞).

The main way that we’ve encountered this notation is with limits. Limits at inﬁnity will also be

relevant to improper integrals, so you may want to review them.

120

Exercise

Evaluate the following limits:

lim

x→∞

lim

x→∞

√

lim

t→−∞

lim

y→∞

sin y

lim

w→∞

ln w

lim

x→−∞

+ 7

− 5x

Solution

lim

x→∞

= 0.

lim

x→∞

√

x = ∞.

lim

t→−∞

= 0.

lim

y→∞

sin y does not exist.

lim

w→∞

ln w = ∞.

lim

x→−∞

+ 7

− 5x

= 3.

121

Question 2.5.2

How Do We Integrate a Discontinuous Function?

Consider the function

f(x) =

(

if x ≤ 2

10 − 2x if x > 2

What is

f(x) dx?

Figure: The area beneath a discontinuous graph

f(x) dx is the signed area under f (x) from x = 0 to x = 5. It is equal to a limit

f(x) dx = lim

∆x→0

i=1

f(x

∗

)∆x

If we look at the rectangle approximations in this equation, we see that they can badly estimate the

function near the point of discontinuity.

Figure: Rectangle approximations of the area beneath a discontinuous graph

122

Remarks

We might worry that the approximations are so bad, that the limit lim

∆x→0

i=1

f(x

∗

)∆x does not

exist. Fortunately, it does, as long as there are only ﬁnitely many discontinuities..

f(x) almost has an antiderivative function. F (x) =

f(t) dt has derivative f(x) at all x,

except perhaps at the points of discontinuity.

While it may be comforting to know that an antiderivative function exists, it doesn’t help us evaluate

the integral. We don’t know what number to assign to F (x) for many values of x. So how do we compute

f(x) dx? Instead of dealing with a a function whose antiderivative we don’t know, we break this

into two integrals that we do know.

f(x) dx =

f(x) dx +

f(x) dx

dx +

f(x) dx

Why can’t we replace

f(x) dx with

10 − 2x dx? At x = 2, f(x) = 3x

, not 10 − 2x. This is

unfortunate, because for any number t > 2 we could replace

f(x) dx with

10 − 2x dx. We will

need to break our integral down further.

f(x) dx =

f(x) dx +

f(x) dx

dx +

f(x) dx +

10 − 2x dx

We still don’t know the value of the middle integral, but we know that as t approaches 2, the domain

of integration shrinks to 0. We can take advantage of this by taking a limit.

f(x) dx = lim

t→2

dx +

f(x) dx +

10 − 2x dx

= lim

t→2



dx +

f(x) dx + 10x − x



= lim

t→2

8 − 0 +

f(x) dx + (50 − 25) − (10t − t

)

= lim

t→2

33 − 10t + t

f(x) dx

= 33 − 10(2) + 2

f(x) dx

123

Question 2.5.2

How Do We Integrate a Discontinuous Function?

= 17

Notice that we had to evaluate an integral with the variable t as a bound. Once we had applied the

Fundamental Theorem of Calculus and plugged in t, this integral became a continuous function and we

could evaluate the limit.

Notice also the strange role the limit played in this computation. Usually we take limits to see what

value a changing function approaches. Our function has the same value for any choice of t (make sure

you see why), so technically we were taking the limit of a constant function. The limit was a purely

computational tool.

Remark

The discontinuity at x = 2 meant that we were stuck with an integral

f(x) dx. With a less well-

behaved function we might have also needed an integral on the left side of 2, like

f(x) dx. However,

these two integrals can always be sent to zero by a limit, so when solving integrals of discontinuous

functions, we can leave these out of our calculations.

We can summarize the method as follows:

Integrating discontinuous functions

If f (x) is discontinuous at x = c and a ≤ c ≤ b, then

f(x) dx = lim

t→c

−

f(x) dx + lim

s→c

f(x) dx

provided that both of these limits exist.

A removable discontinuity should not slow us down even this much. The area under a single point

of discontinuity is zero. We can use the following theorem for a function with any ﬁnite number of

removable discontinuities.

Theorem

If f (x) and g(x) are equal on [a, b] except at a ﬁnite number of points, then

f(x) dx =

g(x) dx.

This theorem eliminates the need to use limits in our example

f(x) dx =

f(x)

|{z}

=3x

dx +

f(x)

|{z}

= 10 − 2x

except at x = 2

dx +

10 − 2x dx

Most discontinuities can be handled this way, but there is one type that will still require limits.

124

Example 2.5.3

Integrating a Function with a Vertical Asymptote

Deﬁnition

When f (x) has a vertical asymptote at c in [a, b] we call

f(x) dx an improper integral.

How can we compute

√

dx?

In this case, breaking this integral into 2 doesn’t help.

√

dx = lim

t→0

√

dx +

√

We cannot take for granted that lim

t→0

√

dx goes to 0. The interval is getting smaller, but the

values of the function may be so large that its rectangle approximations stay arbitrarily large and do not

limit to 0. If there were an unbounded amount of area in lim

t→0

√

dx, then as t → 0

√

would absorb more and more of that area and tend to ∞. Thus if (and only if) lim

t→0

√

dx exists,

we can assume that the remaining piece

√

dx limits to 0 and can be ignored.

Solution

√

dx = lim

t→0

√

= lim

t→0

√



= lim

t→0

√

4 − 2

√

= 4 − 0

Since lim

t→0

√

dx exists, we conclude that

√

dx = lim

t→0

√

dx = 4

125

Example 2.5.3

Integrating a Function with a Vertical Asymptote

Figure: The area beneath a function with a vertical asymptote

Main Idea

To compute an improper integral, we introduce a dummy variable t and take limit(s) as t → c. If the

limit(s) exist, we say the integral converges. If any do not, we say it diverges.

Remark

Convergent and divergent are the terms that describe whether the limit which deﬁnes an integral ap-

proaches a single, ﬁnite numerical value. They perform a similar role to “exists” and “does not exist”

for limits or “deﬁned” and “undeﬁned” for arithmetic.

Question 2.5.4

How Can We Compute an Integral over an Unbounded Region?

So far we have been interested in integrals over bounded intervals: a ≤ x ≤ b. We approximated

these with rectangles.

Figure: The area beneath a graph, approximated by rectangles

126

Consider how this approach would work with an unbounded interval: a ≤ x.

Rectangles will not approximate the area we want, but we can compute any ﬁnite subsection of it:

f(x) dx. Like with a discontinuity, we’ll take a limit.

Deﬁnition

An integral of the form

∞

f(x) dx is also called an improper integral. We evaluate it by computing

∞

f(x) dx = lim

t→∞

f(x) dx

assuming this limit exists. If the limit exists we say the improper integral converges. Otherwise we say

it diverges.

Similarly, we can compute

−∞

f(x) dx = lim

t→−∞

f(x) dx.

Example 2.5.5

Evaluating an Improper Integral

Compute

∞

dx.

Figure: An integral over an unbounded domain

127

Example 2.5.5

Evaluating an Improper Integral

Solution

We’ll compute the limit.

lim

t→∞

∞

dx = lim

t→∞

−



= lim

t→∞

−

+ 4

= 4

Since the limit exists, it is the value of the improper integral.

∞

dx = 4.

Example 2.5.6

An Integral over the Entire Real Line

So far we have looked at intervals unbounded in one direction. If the interval is (−∞, ∞), the entire

real line, then we use the following deﬁnition.

Deﬁnition

The improper integral

∞

−∞

f(x) dx is computed:

∞

−∞

f(x) dx =

−∞

f(x) dx +

∞

f(x) dx

for any number a, so long as both integrals on the right converge. If either integral diverges, then we

say

∞

−∞

f(x) dx diverges as well.

Let

f(x) =

(

if x < 1

√

if x ≥ 1

Compute

∞

−∞

f(x) dx.

128

Figure: An integral over the real line, broken into two limits

Solution

We break this integral into two limits. The natural breaking point is a = 1 since that is where the

function changes branches anyway. Both limits must converge for the integral to converge.

lim

s→−∞

f(x) dx lim

t→∞

f(x) dx

lim

s→−∞

dx lim

t→∞

√

= lim

s→−∞



= lim

t→∞

√



= lim

s→−∞

e − e

= lim

t→∞

√

t − 2e

= e = ∞ (diverges)

One limit converges to e. The other diverges. This means that

∞

−∞

f(x) dx diverges.

Question 2.5.7

Can We Take a Limit of

−t

f(x) dx Instead?

We might wonder whether we need to break an integral

∞

−∞

f(x) dx into two integrals. Instead

of two dummy variables, one going to −∞ and one going to ∞, could we replace them by one? The

129

Question 2.5.7

Can We Take a Limit of

−t

f(x) dx Instead?

integral

∞

−∞

dx is a useful test case. We can certainly compute

lim

t→∞

−t

dx = lim

t→∞



−t

= lim

t→∞

−

= lim

t→∞

= 0

This might even seem right because the area above the axis seems to cancel out the area below the

axis. However, intuitively, we expect that the area of a region should be preserved if we shift it in some

direction. Let’s shift this graph one unit to the left.

lim

t→∞

−t

(x + 1)

dx = lim

t→∞

(x + 1)



−t

= lim

t→∞

(t + 1)

−

(−t + 1)

= lim

t→∞

+ 4t

+ 6t

+ 4t + 1

−

− 4t

+ 6t

− 4t + 1

= lim

t→∞

−2t

− 2t

= −∞

We can see that, for any choice of t, there will be more area below the graph than above, and the

diﬀerence grows quickly as t increases. If the area of a region changes when we shift it to the side, then

that area was not well deﬁned to begin with. We thus say that these integrals diverge, not because

they go to ∞ or −∞, but because they are not deﬁned at all. The formal deﬁnition above handles this

example correctly.

−∞

dx diverges, so

∞

−∞

dx also diverges.

Figure: The area under a functions of the form f(x) = (x − a)

130

Main Idea

Do not replace the correct deﬁnition:

lim

t→−∞

f(x) dx + lim

t→∞

f(x) dx

with the “shortcut:”

lim

t→∞

−t

f(x) dx

The “shortcut” can suggest that the integral converges, when in fact it diverges.

Synthesis 2.5.8

A Comparison Test

Recall the following theorems

Theorem

If f (x) ≤ g(x) on [a, b] then

f(x) dx ≤

g(x) dx.

Theorem

Let a be a real number or ±∞. If F (x) ≤ G(x) for all x near a, then lim

x→a

F (x) ≤ lim

x→a

G(x).

Suppose we have a function f(x) whose anti-derivative we don’t know, and a function g(x) whose

anti-derivative we do know. What can the divergence or convergence of

∞

g(x) dx tell us about

∞

f(x) dx?

131

Synthesis 2.5.8

A Comparison Test

Solution

If we know that f(x) ≤ g(x) then for all t ≥ a,

f(x) dx ≤

g(x) dx. This allows us to also

compare their limits, which are the improper integrals:

∞

f(x) dx and

∞

g(x) dx. This could be

useful in a couple ways.

If lim

t→∞

g(x) dx = −∞ then lim

t→∞

f(x) dx = −∞ as well, meaning

∞

f(x) dx diverges.

If on the other hand f(x) ≥ g(x) and lim

t→∞

g(x) dx = ∞ then lim

t→∞

f(x) dx = ∞ as well,

which also means

∞

f(x) dx diverges.

We might like to reverse these and say that if

∞

g(x) dx converges,

∞

f(x) dx must as well,

but

∞

f(x) dx can diverge without going to inﬁnity. f (x) could oscillate between positive and

negative so that

f(x) dx increases and decreases and does not have a limit as t → ∞.

We can actually solve the last issue adding the assumption that f(x) is non-negative. The result is

not easy to prove, but it is useful.

Theorem

Suppose 0 ≤ f(x) ≤ g(x) for all x.

∞

f(x) dx diverges,

∞

g(x) dx diverges.

∞

g(x) dx converges, then

∞

f(x) dx converges.

There are similar versions of this theorem for integrals to −∞ or for functions that are non-positive.

132

Section 2.5

Exercises

Summary Questions

What is an improper integral?

Under what conditions were we able to conclude that

f(x) dx =

g(x) dx?

What does it mean for an improper integral to converge or diverge?

If we know that

∞

g(x) dx converges, what condition on f(x) would guarantee that

∞

f(x) dx

converges?

2.5.1

In the expressions below, which of the boxes can legally be replaced by an ∞ symbol?

lim

x→ 1

x + 2 = 3

f(x) dx = e

+ 2x − log

|x|



Evaluate lim

x→∞

− 2x + 1.

Evaluate the following limits:

lim

x→∞

+ 3x + 5

lim

x→−∞

+ 3x + 5

Evaluate lim

w→∞





133

Section 2.5

Exercises

2.5.2

Evaluate

dx. Explain how you dealt with any discontinuities.

Q10

Let

f(x) =

(

4 x = 1, 4, or 6

2 otherwise

Sketch the graph y = f(x).

Evaluate

f(x) dx. State what tool you used to deal with any discontinuities.

Q11

Let

g(x) =











√

x if 0 ≤ x ≤ 4

3 if 4 < x < 6

if 6 ≤ x

Compute

g(x) dx.

Q12

The sign function has the form

σ(x) =

(

1 if x > 0

−1 if x < 0

Write a formula (in terms of a and b) for

σ(x) dx. Your answer will be a piecewise expression.

2.5.3

Q13

Consider the integral

−2

dx.

Sketch the graph of y =

Set up the limits that would compute this integral.

Do these limits exist?

134

Q14

Evaluate

ln x dx.

Q15

Evaluate

√

4 − x

dx.

Q16

Evaluate

dw.

2.5.4

Q17

How large will the base (∆x) of each rectangle be, if we want to approximate:

The area over the interval [4, 16] with 3 rectangles?

The area over the interval [a, b] with n rectangles?

The area over the interval [a, ∞) with n rectangles?

Q18

Compute

∞

dx.

Q19

Compute

−∞

dx.

Q20

Evaluate

∞

−2x

dx.

Q21

Evaluate

ln x dx. You may need l’Hˆopital’s rule.

Q22

Compute

∞

dx, showing all necessary steps.

135

Section 2.5

Exercises

2.5.5

Q23

Compute

∞

−∞

−x

dx.

Q24

Show how to evaluate

∞

−∞

1/3

dx or show that it diverges.

Q25

Let

f(x)

(

if x < −2

(x+4)

if x ≥ −2

Evaluate

∞

−∞

f(x) dx.

Q26

How would you write

∞

−∞

1 + x

dx as a sum of two limits? You might recall that

1 + x

dx =

tan

−1

x + c. Use this to evaluate the integral.

Extension and Synthesis

Q27

Let

f(x)

(

√

x if x < 8

10 − x if x ≥ 8

Is f (x) continuous? Justify your answer with a calculation

What is the area enclosed by y = f(x) and y = 0?

Q28

Let

f(x)











−4/3

if x < −8

√

if −8 ≤ x < 0

−x

if x ≥ 0

Evaluate

∞

−∞

f(x) dx.

136

Q29

Consider the region R below y =

, above y = 0 and to the right of x = 1.

Try to compute the area of R using an integral.

Suppose R is rotated around the x-axis to create a solid S. Compute the volume of S.

How annoying are the conclusions of

and

Q30

Consider the region in the ﬁrst quadrant whose boundary is the curves y =

, y = 2x − 1 and

y = 0.

Write the area of this region as an integral in the variable y. Do not evaluate.

Suppose this region is rotated around the x-axis. Write the resulting volume using one or

more integrals. Do not evaluate.

137

Section 2.6

Probability

Goals:

1 Test the properties of a probability density function.

2 Use probability density function to describe the underlying random variable.

3 Use the uniform, exponential, and normal distributions.

4 Compute probabilities and expected values.

The main problem facing every planner is uncertainty. When will the next epidemic strike? Will the

stock market go up or down? How many rare particles will ﬂow through a detection device? These

outcomes cannot be known ahead of time, but they can be modeled as probabilities. Knowing when the

epidemic is likely to happen can guide our decision of how much to invest in mitigation. Knowing how

many particles are likely to pass through an area can inform us how sensitive our detection device needs

to be.

On the other hand, probabilities can also help us understand what has already happened. Probabilities

tell us whether the results of an experiment are likely to be a coincidence. Is an apparent pattern just

the variation inherent in random sampling, or is it likely to be present if the procedure is repeated? This

is in fact the basic model for statistical reasoning:

1 Assume that the type of pattern you’re looking for does not exist (a null hypothesis).

2 Collect observations.

3 Compute the probability of seeing those observations, given your assumption.

4 If the probability is very low, then the assumption is probably false.

Such reasoning allows us to conclude that survey is representative of the population as a whole. It

allows us understand what outcome will occur on average, or how much outcomes are likely to vary.

Such statistics help us understand the way the world works. We can design our next experiment or plan

our future behavior around that understanding. For example, on average, the stock market goes up.

This is one of the most powerful ﬁnancial facts available to long-term investors, and it can be grounded

in a probabilistic study of past performance.

Question 2.6.1

What Is a Continuous Probability Distribution?

Deﬁnition

A random variable encodes the possible outcomes of a random selection. We use the notation

P (outcome) to denote the probability that a particular outcome occurs. If an outcome is impossible,

we write P (outcome) = 0. If it is certain we write P (outcome) = 1.

138

Example

Our outcome can be any expression concerning the random variable, for instance:

If S is the sum of the rolls of two six-sided dice, then

P (S = 8) =

If T is the number of tails when two coins are ﬂipped then

P (T ≥ 1) =

We can encode these probabilities with a distribution function. The value of the function at each

number a is the probability that the outcome is a.

Example

If T is the number of tails obtained from two fair coins then

(t) =











if t = 0

if t = 1

if t = 2

0 if t = anything else

Notice

The sum of the probabilities adds to 1.

There are only ﬁnitely many values of T that are possible.

What if we wanted to model height with a random variable? No one is exactly 68 inches tall. Even

people who say they are “ﬁve feet eight inches” are slightly taller or shorter. A distribution function

like we made for coins is unsuitable. It would have the property f

(h) = 0 for all h. To handle this

situation, we need to deﬁne a diﬀerent kind of random variable with a diﬀerent relationship to a deﬁning

function.

139

Question 2.6.1

What Is a Continuous Probability Distribution?

Deﬁnition

A continuous random variable X is a random variable whose outcomes are real numbers, and whose

probability is modeled by a probability density function f

(x) such that

P (a ≤ X ≤ b) =

(x) dx.

(x) must satisfy

1 f

(x) ≥ 0 for all x.

∞

−∞

(x) dx = 1

Remark

The term density should give us a hint about how to think about these functions. Density is a rate.

The value of a probability density function tells you the rate of likelihood per unit of length on the real

number line. Integrating this rate over an interval gives the total likelihood of lying on that interval,

much like integrating a rate of change over an interval computes the total change.

An integral is the natural way to measure probability. The rules of integration are compatible with

our intuition of probability. Suppose we have an interval [a, b] broken into two or more subintervals. The

total probability of X having an outcome in [a, b] is equal to the sum of the probabilities of the outcome

lying in each subinterval. Similarly, the area above [a, b] and below the graph y = f (x) is equal to the

sum of the areas above each subinterval. In equations, these are the laws:

P (a ≤ X ≤ c) + P (c ≤ X ≤ b) = P (a ≤ X ≤ b)

(x) dx +

(x) dx =

(x) dx

140

Example 2.6.2

Describing a Random Variable from its Density Function

Consider the function

(x) =

(

if 0 ≤ x ≤ 3

0 if x > 3 or x < 0

Verify that f

is a probability density function.

If f

is the density function of X, compute P (X ≥ 2).

What does f

tell us about the likely values of X?

Solution

We need to check that f

(x) is never negative and

∞

−∞

(x) dx = 1

(x) is never negative, because it is either a square or 0.

∞

−∞

(x) dx =

−∞

(x) dx +

∞

(x) dx

−∞

0 dx +

dx +

∞

0 dx



(27 − 0)

= 1

P (x ≥ 2) =

∞

(x) dx

(x) dx +

∞

(x) dx

dx +

∞

0 dx



141

Example 2.6.2

Describing a Random Variable from its Density Function

(27 − 8)

Outcomes outside of [0, 3] are impossible. Among the outcomes in [0, 3], outcomes closer to 3 are

more likely than outcomes closer to 0, because the density function has a greater value there.

Figure: The density function of X and the area representing P (X > 2)

Main Ideas

To verify that a function is a probability density function, we need to check that it is never negative

and that it integrates, over the entire real line, to 1.

We compute the probability that X has an outcome in an interval by integrating f

(x) over that

interval.

Outcomes of X where f

(x) is large are more likely than outcomes where f

(x) is small.

142

Figure: The density function of X and the areas that represent the likelihood of larger and smaller

outcomes

Question 2.6.3

What Density Functions Arise Naturally?

The requirements to be a probability density function are not very strict. The vast majority of prob-

ability density functions do not model a real life phenomenon or even an intriguing thought experiment.

What follows are three families of density functions that are especially useful. The ﬁrst is the simplest.

When we lack data to suggest otherwise, it is a common choice when creating a model with some

randomness.

Deﬁnition

Given an interval [a, b], the uniform distribution on [a, b] is given by

(x) =

(

b−a

if a ≤ x ≤ b

0 if x > b or x < a

Notice that the shorter the interval [a, b] is, the higher density is required to integrate to a total

probability of 1.

143

Question 2.6.3

What Density Functions Arise Naturally?

Figure: The density function of a uniform distribution

An intuitive but imprecise way to describe a random variable with a uniform distribution is to say that

all outcomes in [a, b] are equally likely. Since every outcome of a continuous random variable occurs with

probability 0, this is unhelpful. X is remarkable, because all outcomes in [a, b] have equal probability

density. To connect this to actual probabilities, we might say that all subintervals of [a, b] are equally

likely to contain the outcome of X, but this is incorrect. X is 3 times as likely to have an outcome in

an interval of length 6 as an interval of length 2. A precise statement would be: the likelihood of the

outcome of X occurring in each subinterval of [a, b] is proportional to the length of the subinterval.

Our second family of random variables naturally measures waiting time. This answer questions like:

when will the next customer come in? When will this device next detect a certain type of ambient

particle? Here is the formal deﬁnition.

Deﬁnition

Suppose an event happens randomly and uniformly at an average rate of λ times per unit of time (x).

Then the amount of time until it next occurs is given by the exponential distribution:

(x) =

(

λe

−λx

if 0 ≤ x

0 if x < 0

Observe the following

1 Higher λ means that X is likely to be smaller, as the event occurs sooner.

2 The probability of the event occurring in given interval, given that it did not occur before that

interval, depends only on the length of the interval.

144

Figure: The density function of an exponential distribution

The second point is best illustrated with a concrete example.

Example

Gravitational waves large enough to detect pass through the earth from time to time. Suppose we

switch on a gravitational wave detector, and the time (in days) until the ﬁrst detection is modeled by

the exponential random variable X with density function 0.7e

−0.7x

The probability that the ﬁrst detection occurs within two days is 0.75.

If the ﬁrst detection does not occur in the ﬁrst two days, then the probability that it occurs in the

following two days is 0.75

If the ﬁrst detection does not occur in the ﬁrst four days, then the probability that it occurs in

the following two days is 0.75

And so on

From this we can compute

P (2 ≤ X ≤ 4) = (1 − P (X ≤ 2))

| {z }

X is not in

the ﬁrst two days

(0.75)

= (0.25)(0.75)

= 0.1875

Our ﬁnal family is the most famous, because it is the most generally applicable.

Deﬁnition

The normal distribution is sometimes called a bell curve. Many natural phenomena are normally

distributed. The formula is

(x) =

√

2π

−

(x−µ)

2σ

145

Question 2.6.3

What Density Functions Arise Naturally?

The anti-derivative of this density function cannot be expressed with functions that we can evaluate.

Instead we can look up values in a table. The normal distribution has a special role in statistics:

Theorem [The Central Limit Theorem]

The average of any n independent identically distributed random variables (for instance performing the

same experiment n times) will converge to a normal distribution as n gets large.

This theorem helps explain why many natural measurements are approximated by bell curves. For

example, human height is aﬀected by hundreds of factors, including individual genes, nutrition and

environment. If we view human height as an average of these factors, scaled with appropriate units,

then we expect human heights to be modeled by a normal random variable. Viewing a histogram of

human height statistics shows the expected bell curve.

The parameters in f

can be interpreted as follows:

µ is the average value of X. It corresponds to the peak of the bell curve.

σ is the standard deviation of X. Larger σ means that X has a larger probability of being far

from µ.

Figure: The density function (bell curve) of a normal distribution

Question 2.6.4

What Is the Expected Value of a Random Variable?

Expected value will be the ﬁrst statistic we can compute for a random variable. Statistics of a data

set tell us something about the numbers in the data set. Statistics of a random variable should tell us

something about the outcomes of the random variable.

The expected value or average value of X describes what the average result will be, if you

let X take a value at random many times. It is typically denoted E[X] or with the letter µ.

146

Example

Suppose we average our rolls of a six-sided die. As the number of rolls n gets large, we’ll roll each

number close to

times. The sum of the rolls will be approximately





+ 2





+ 3





+ 4





+ 5





+ 6





to compute the average, we divide by n. Fortunately, every term already has an n.

µ = 1





+ 2





+ 3





+ 4





+ 5





+ 6





= 3.5

In general dividing the number of occurrences of the result a in n evaluations of X will be nf

(a).

When we divide out n, we obtain the following weighted average:

Formula

The expected value of a (discrete) random variable X with probability distribution function f

E[X] =

(x)

where x is summed over all possible outcomes of X.

To produce the corresponding formula for a continuous random variable, instead of multiplying

each outcome by its probability and summing, we multiply each output by its density and integrate

Formula

The expected value of a continuous random variable X with probability density function f

E[X] =

∞

−∞

(x) dx

147

Example 2.6.5

The Expected Value of a Uniform Random Variable

Compute the expected value of a uniform random variable on [a, b].

Solution

We’ll apply the formula. Since f

(x) has discontinuities at a and b, we will break it into three parts.

E[X] =

∞

−∞

(x) dx

−∞

x(0) dx +

b − a

dx +

∞

x(0) dx

2(b − a)



2(b − a)

−

2(b − a)

− a

2(b − a)

(b − a)(b + a)

2(b − a)

b + a

Notice that this is the midpoint of the interval [a, b]. Since X is uniformly distributed across the interval,

we’d expect the average value to occur at the midpoint.

Main Ideas

E[X] is typically occurs somewhere in the middle of the possible outcomes of X. With symmetric

density functions, it is the midpoint.

Example 2.6.6

The Expected Value of an Exponential Random Variable

Compute the expected value of a exponential random variable.

Explain why the role of λ in the answer to

makes sense.

148

Solution

We will use the formula. Even after removing the region of 0 density, we are left with an improper

integral. We therefore will compute a limit.

E[X] =

∞

−∞

(x) dx

−∞

x(0) dx +

∞

xλe

−λx

= lim

t→∞

xλe

−λx

= lim

t→∞

− xe

−λx



−

−e

−λx

= lim

t→∞

− xe

−λx

−

−λx



= lim

t→∞

−te

−λt

− e

−λt

+ 0e

= lim

t→∞

−te

−λt

− 0 + 0 +

+ lim

t→∞

−

λt



∞

form



+ lim

t→∞

−

λe

λt

(l’Hˆopital’s rule)

+ 0

u = x dv = λe

−λx

du = dx v = −e

−λx

by parts

Our ﬁnal answer is

E[X] =

X measures the time until an event with average frequency λ occurs. Thus on average, we expect

to wait

for it. For example, if an event occurs three times per hour, we would expect to wait

about 20 minutes for it to occur.

149

Example 2.6.6

The Expected Value of an Exponential Random Variable

Figure: The expected value of a exponential random variable

Main Idea

For asymmetric density functions, E[X] will not be in the middle of the range of values. It will be pulled

toward regions of higher likelihood.

Synthesis 2.6.7

Median Wait Time

Suppose that an exponential random variable models the wait time of a random caller to a call

center.

What is the median wait time?

Explain graphically why the median wait time less than the expected wait time.

Solution

The median is the number m such that half the outcomes are larger than m and half are smaller.

150

We can write this as the following equation and solve for m.

P (X ≤ m) = 0.5

−∞

(x) dx = 0.5

−∞

(x) dx +

(x) dx = 0.5 (presumably m > 0)

−∞

0 dx +

λe

−λx

dx = 0.5

−e

−λx



= 0.5

−e

−λm

+ e

= 0.5

−e

−λm

= −0.5

−λm = ln 0.5

m =

ln 2

The median is the point such that half the area under y = f

(x) lies on either side. The expected

value is weighted. A few outcomes far to one side can balance many outcomes slightly to the

other side. The outcomes of X extends to ∞ on the right but only to 0 on the left. These distant

outcomes pull the average to the right, but their distant position has no eﬀect on the median.

Figure: The median M and expected value µ of an exponential random variable

151

Synthesis 2.6.7

Median Wait Time

Main Idea

The median is the value m such that half the area under y = f

(x) lies on either side of x = m.

We compute the median by setting P (X ≤ m) = 0.5 and solving for m.

Median is not the same as expected value. y = f

(x) may have more area on one side of E[X]

than the other, if the smaller side’s area is farther from the middle.

Section 2.6

Exercises

Summary Questions

Describe the diﬀerence between a continuous random variable and a non-continuous (discrete)

one.

How do we use a probability density function to compute the probability of an outcome?

What must be true about a probability density function?

How do you compute the expected value of a random variable?

2.6.1

How many possible outcomes does a continuous random variable have?

Which of the following probability questions can be answered without any further information?

Explain.

i. If you spin a prize wheel 3 times, what is the probability that my winnings add up to exactly

$80?

ii. If you ﬂip two weighted (unfair) coins, what is the probability that exactly one of them comes

up tails?

152

iii. If you pick a random person, what is the probability that her height is exactly 68 inches?

iv. If I spin a wheel of names, what is the probability that it takes exactly 7 spins to land on my

own name?

Let X be a continuous random variable. Compute P (X = 13).

Another book might teach you that P (a < X < b) =

(x) dx, instead of P (a ≤ X ≤ b) =

(x) dx. Why shouldn’t this bother you?

Let f

(t) be a probability density function of a random variable T . What quantity is represented

−∞

(t) dt?

Q10

Let f

(x) be a probability density function of a random variable X. What quantity is represented

∞

(x) dx?

Q11

Given a density function f

(u) for a random variable U, write an integral or integrals to compute

P (4 ≤ U

≤ 9).

Q12

Suppose the height of a mature sunﬂower is given by the random variable H with density function

(h). If you friend tells you that her sunﬂower is in the top quintile in height, explain how you

could use f

to determine a range that the height of her sunﬂower must lie in.

2.6.2

Q13

Let W be a random variable with density function

(w) =

(

36−w

144

if 0 ≤ w ≤ 6

0 otherwise

Compute P (2 ≤ W ≤ 9)

Q14

Let T be a random variable with density function

(t) =

(

√

if 0 ≤ t ≤ 1

0 otherwise

Compute (0 ≤ T ≤

)

153

Section 2.6

Exercises

2.6.3

Q15

If U is a uniform random variable on [4, 7.5], compute is the probability that U ≤ 5.5.

Q16

If X is a uniform random variable on [2, c] and P (0 ≤ X ≤ 4) = 0.25, what is c?

Q17

If W is an exponential random variable such that P (W ≥ 1) =

, then compute the value of the

parameter λ in its density function f

Q18

Juan looks at the density function of an exponential random variable X and says “X is more

likely to have the value 1 than 5.” “That’s silly,” replies Neha, “X has exactly zero probability

of being either of those. They are equally likely.” What do you think of their argument?

2.6.4

Q19

Let f (x) =

(

−3

x ≥ 2

0 x < 2

Compute a number b so that f is a probability density function.

If f is the density function for some random variable Z, compute E[Z].

Q20

Suppose X is a random variable with density function f

(x). Suppose f

(x) is 0 outside [3, 11]

and decreasing on [3, 11]. Is E[X] greater or less than 7? Explain.

Q21

Suppose X is a continuous random variable with probability density function

(x) =

(

√

if 0 ≤ x ≤ 4

0 if x > 4 or x < 0

In a sentence or two, state what you would need to check to ensure that f

(x) is a valid

probability density function. You do not need to actually perform the calculations.

Compute E[X].

154

Q22

Explain how you can use the graph of a normal random variable to identify the expected value.

Then compute that value using the expected value formula.

2.6.5

Q23

Give the expected value of a uniform random variable on [5.2, 9.4].

Q24

If the uniform random variable on [a, b] has expected value 7, and a = 3, what is b?

Q25

In this example, we divided by (b − a). What would happen if b − a = 0?

Q26

If you know the expected value µ of a uniform random variable X, what is the probability that

≥ µ? Is this problem answerable without the assumption that X is uniform? Explain.

2.6.6

Q27

Suppose X and Y are two diﬀerent exponential random variables modeling events that occur on

average p and 2p times per day respectively. How are their expected values related?

Q28

Does our expected value formula result sense if λ < 0? Why should this not bother us.

Q29

On bus route 70, 3 buses come per hour, on average.

Write a probability density function for X, the amount of time until the next bus arrives.

What is the expected amount of time until the next bus comes?

How likely is it that you will wait more than an hour for the bus?

Q30

If X is an exponential random variable, what is the probability that X ≤ E[X].

155

Section 2.6

Exercises

2.6.7

Q31

Compute the median value of a uniform random variable on [a, b].

Q32

Let W be a random variable with density function

(w) =

(

36−w

144

if 0 ≤ w ≤ 6

0 otherwise

Compute the median value of W .

Q33

Let T be a random variable with density function

(t) =

(

√

if 0 ≤ t ≤ 1

0 otherwise

Compute the median value of T .

Q34

Examine the graph of the density function of a normal random variable X. What is the median

of X? Explain how you can see this in the graph.

Extension and Synthesis

Q35

Suppose X is a uniform random variable on [a, b] and P (3 ≤ X ≤ 4) =

. Describe all possible

values of a and b.

Q36

Suppose the random variable W has the density function

(w) =

(

k(7 − w) if 1 ≤ w ≤ 7

0 if w > 7 or w < 1

What values of W are possible?

What can you say about which values of W are more likely than others?

Given that f

is a density function, what is the value of the constant k?

156

What is the average value of W ?

Can you compute the median value of W ? This might be easier with geometry than with

calculus.

Q37

Suppose that g(x) is a probability distribution for a random variable X and g(x) = 0 for all

x ≥ 0.

What is the value of

−∞

g(x) dx? Justify your answer with a sentence or computation.

Give a formula for E[X]. Is it positive or negative? Justify your answer in a sentence or two.

Q38

Recall that an even function f (x) has the property that f(x) = f(−x) for all x. If the density

function of a random variable is even, what does that say about the expected value and median

of X? Explain your answer.

157

Section 2.7

Functions of Random Variables

Goals:

1 Compute expected values of functions of a random variable.

2 Compute the average value of a function.

3 Compute the variance of a random variable.

Sometimes the quantity modeled by a random variable is not the quantity we actually care about. For

example, while we might have a model for how many people will contract a disease, what we actually

would like to predict is how many healthcare resources they will require. The number of patients

determines the required resources, so mathematically, resources is a function of patients. Expected

values of such functions turn out to be straightforward to compute. A natural way to generate statistics

about a random variable is to write a function that measures something interesting and compute its

expected value.

Question 2.7.1

What Is a Function of a Random Variable?

When we write a function g(X) of a random variable X, then the output Y of this function is itself

a random variable. These functions are most intuitive with a discrete random variable. In this case we

can compute Y ’s probability distribution function by applying g to each outcome of X and summing

the probabilities that produce each output.

Example

Let X be a discrete random variable with probability distribution function f

(x). If Y = g(X) = X

then Y is a random variable and we can compute its probability distribution function f

(y).

(x) =











0.1 if x = 0

0.2 if x = 2

0.3 if x = 3

0.4 if x = −2

0 otherwise

(y) =











0.1 if y = 0

0.6 if y = 4

0.3 if y = 9

0 otherwise

Since X = 2 and X = −2 both produce Y = 4, we added their probabilities together.

The function g does not need to be algebraically deﬁned.

158

Example

Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed

(meaning each occurs with probability

100

). Let N give the number of digits of X. Then N has

distribution function.

(n) =











100

if n = 1

100

if n = 2

100

if n = 3

0 otherwise

Question 2.7.2

How Do We Compute Expected Value of a Function?

In the case of a discreet random variable, we can compute expected value directly from the distribution

function.

Example

Let X be a discrete random variable whose outputs are integers from 1 to 100, uniformly distributed.

Let N give the number of digits of X.

E[N] = (1)



100



+ (2)



100



+ (3)



100



= 1.92

Alternately, we could avoid using f

by directly applying the digits function to each outcome X and

taking a weighted average.

Example

E[N] = (1)



100



+ ··· + (1)



100



| {z }

9 times

+ (2)



100



+ ··· + (2)



100



| {z }

90 times

+ (3)



100



= 1.92

159

Question 2.7.2

How Do We Compute Expected Value of a Function?

In general this gives us two ways to compute the expected value of a function.

Formulas

If Y = g[X] then we can compute E[Y ] from f

or from f

E[Y ] =

outcomes y

)

E[Y ] =

outcomes x

g(x

)

Remarks

We can equate these formulas by substituting

) =

g(x

)=y

All that remains is to distribute the y

Both formulas will get us to the answer, but one of them skips the step of ﬁnding a distribution

function for Y .

In the case of a continuous random variable X, we might ﬁnd it diﬃcult to ﬁnd the expected value

of Y = g(X) directly. We would need to

Find a density function f

(y) such that

(y) dy = P (a ≤ g(X) ≤ b)

for all a and b

Integrate E[Y ] =

∞

−∞

(y) dy.

The ﬁrst step is diﬃcult for any but the simplest functions.

Fortunately, there is an integration analogue of substitution and distributive argument for discrete

variables. This allows us to compute the average outcome of Y as a weighted average of the probabilities

of X.

Theorem

If Y = g(X) is a function of a continuous random variable X with density function f

(x), then

E[Y ] =

∞

−∞

g(x)f

(x) dx

160

Notice that the expected value of X is a special case of this theorem. In this case, we are computing

the expected value of the function g(X) = X.

Example 2.7.3

Computing the Expected Value of a Function

Consider the random variable X with density function

(x) =

(

if 0 ≤ x ≤ 3

0 if x > 3 or x < 0

What is the expected value of e

Solution

Since we want E[e

], our function is g(x) = e

E[e

] =

∞

−∞

(x) dx



−



−



−



− 2

u =

dv = e

du =

x dx v = e

by parts

u =

x dv = e

du =

dx v = e

by parts again

We can check whether our answer is reasonable. Since X has outcomes between 0 and 3, e

should

have outcomes between 1 and e

. Our expected value should also fall in that range, and it does.

161

Application 2.7.4

The Average Value of a Function

Sometimes people refer to the average value of a function without any reference to a random variable.

In this case, we understand the input variable to be uniformly distributed.

Deﬁnition

The average value of a function from x = a to x = b is the expected value of f(X), where X is

a uniform random variable on [a, b]. The density function is a constant, so we can factor it out of the

integral. We obtain the formula:

ave

b − a

f(x) dx.

The number f

ave

has geometric signiﬁcance as well. The signed area under the graph y = f (x) from

x = a to x = b is

Area =

f(x) dx.

The region under the horizontal line y = f

ave

is a rectangle with equal signed area:

Area = width ×height = (b − a)

b − a

f(x) dx

In other words, if we ﬂattened the area under f into a rectangle, f

ave

would be its height.

Figure: The graph of y = f(x) and the constant function y = f

ave

162

Example 2.7.5

Computing The Average Value of a Function

Compute the average value of f(x) = xe

between x = 1 and x = 3.

Solution

ave

3 − 1



− e)

u = x

x = 1 ⇒ u = 1

du = 2x dx x = 3 ⇒ u = 9

du =

u-substitution

Application 2.7.6

Variance

Suppose we wanted to plan ahead for the outcome of some random variable X. We might choose

to prepare for the circumstance in which X takes on the value E[X]. This is most likely to be a good

bet, but how much eﬀort should we expend preparing for outcomes far from E[X]? It would help to

know how likely X is to be far from E[X]. We can model this with a distance function (actually we’ll

use distance squared) and compute the expected value of the distance function.

Deﬁnition

The variance of a random variable X is the expected value of (X − E[X])

. If X is continuous with

density function f

(x), we obtain the formula

∞

−∞

(x − E[X])

(x) dx

The square root of variance is the standard deviation. Standard deviation is often denoted by σ, and

variance is often denoted by σ

If the expected value of (x − E[X])

is larger, then X is more likely to be far from its expected

value.

163

Application 2.7.6

Variance

Figure: A density function with less variance and a density function with more variance

For example, we can compute the variance of X where X is a uniform random variable on [0, 8].

Solution

Variance is the expected value of (X −E[X])

, so ﬁrst we need to know the number E[X]. We showed

earlier that for a uniform random variable, E[X] is the midpoint of the interval. In this case that is

8+0

= 4. Armed with this value, we can compute the variance.

(X − 4)

∞

−∞

(x − 4)

(x) dx

(x − 4)

8 − 0

dx because f

(x) = 0 outside [0, 8]

− 8x + 16 dx factor out



− 4x

+ 16x







512

− (4)(64) + (16)(8) − 0 + 0 − 0







128



Remarks

In order to solve for variance, we need to know the expected value. We may have to compute

E[X] =

∞

−∞

(x) dx.

Variance is larger when the area under y = f

(x) is spread farther to both sides, away from E[X].

164

Section 2.7

Exercises

Summary Questions

What kind of object is a function of a random variable?

How do we compute the expected value of a random variable?

If someone mentions the “average value” of a function without mentioning what random variable

to use, what do you assume?

What function’s expected value is the variance?

2.7.1

Let X be a random variable that indicates how long from now an event will occur (in hours).

How could a random variable indicating how long until the event happens in minutes be deﬁned

in terms of X?

Suppose the radius of a circle R is a random variable. How could we deﬁne a random variable to

express the area of the circle?

Dominic buys 200 shares of a stock for $60 each. At the end of the day, the stock is worth $V

per share, where V is a random variable. How could you express Dominic’s proﬁt or loss from his

stock purchase with a random variable?

Suppose X is a random variable with outcomes in the range [2, 7]. What is the range of outcomes

of the random variable Y =

165

Section 2.7

Exercises

2.7.2

Suppose X is a random variable and Y = cX for some number c. Explain using one or more

rules of integration why E[Y ] = cE[X].

Q10

Suppose X is a random variable and Y = X + d for some number d. Explain using one or more

rules of integration why E[Y ] = E[X] + d.

Q11

Let X be a uniform random variable on [2, 5] with density function f

. Write a density function

for Y = 10X. Explain how your density function diﬀers from f

Q12

Let X be a uniform random variable on [0, 3]. Is Y = X

a uniform random variable on [0, 9]?

Provide evidence for your answer.

2.7.3

Q13

Let W be a random variable with density function

(w) =

(

36−w

144

if 0 ≤ w ≤ 6

0 otherwise

Compute E





Q14

Let T be a random variable with density function

(t) =

(

√

if 0 ≤ t ≤ 1

0 otherwise

Compute E[T

Q15

Let X be an exponential random variable. Compute E[X

Q16

Let g(x) = c be a constant function. Let X be a random variable. Compute E[g(X)].

166

2.7.4

Q17

Suppose that you are told that the average value of f(x) from x = a to x = b is 0.

What geometric information does this give you about the graph y = f(x). Be speciﬁc.

Suppose you are told that f (x) is non-negative for all x. How does that aﬀect your answer

Q18

Suppose you know that f(x) =

√

x has a positive average value over [a, b]. What does this tell

you about a and b?

2.7.5

Q19

Compute the average value of f(x) = x

over [0, 3].

Q20

Compute the average value of g(x) = x sin x over [0, π].

Q21

Compute the average value of f(x) = x

over [0, 2]

Q22

What happens if we try to compute the average value of h(x) =

over [−2, 2]?

2.7.6

Q23

Compute the variance of an exponential random variable X. Note that you may already know

some components of this computation from earlier examples and exercises.

Q24

Compute the variance of a uniform random variable on [2, 7].

167

Section 2.7

Exercises

Q25

Let W be a random variable with density function

(w) =

(

36−w

144

if 0 ≤ w ≤ 6

0 otherwise

Compute the variance of W . I’d suggest using a computer to help with the algebra.

Q26

Let T be a random variable with density function

(t) =

(

√

if 0 ≤ t ≤ 1

0 otherwise

Compute the variance of T .

Synthesis and Extension

Q27

Let X be a random variable with density function f

. Let Y = cX for some number c. Write a

formula for f

Q28

Compute the value b such that the average value of f(x) = x

over [0, b] is 1.

Q29

Some people memorize compute variance using the formula σ

= E[X

] − E[X]

. Explain why

this formula is equivalent to the one we gave. (This is a famous calculation, so if you can’t ﬁgure

it out, look it up and try to explain each step).

168