No hardcore math this week, but we have something very interesting that any biologist can read, i.e. all you need to know is high school calculus and probability. (!!!)
Someone on livejournal posted this problem. Suppose you have a (hyper)sphere in dimensions (don’t worry, no one can really imagine anything dimensions higher than 3 or 4). Suppose you randomly pick two points on the surface of the sphere. What is the expected distance? Here is my “solution”.
Without loss of generality, fix one point to be the north pole. Note that though you do not need to imagine a (hyper)sphere, you do need to imagine a 3D sphere, and know spherical coordinates. Let denote the surface area of a sphere with radius
, in
dimensions. So
,
,
and so on.
Before proceeding, you might find it helpful to review spherical coordinates, or even look at hyperspherical coordinates by checking out http://en.wikipedia.org/wiki/Hypersphere. Alright let us move on.
Let us work in dimensions. (You may like to substitute
.)
Remember first point is the north pole. The main idea is to integrate circular strips from top to bottom. Each circle has radius where
is radius of our sphere and
is the “zenith angle” (http://en.wikipedia.org/wiki/Spherical_coordinates). Each of these strips have a certain probability. Its surface area is
.
is like the thickness of this strip. Convince yourself at least for
. So the probability of picking a point on this strip is
.
In fact, . For example,
is equal to
or
.
Every point on each strip has equal distance from the north pole. The distance is . (just draw a line between the two points and bisect
.) So, the expected distance is
.
Let be surface area of unit sphere. Obviously, by rescaling, similarity etc,
. Express the denominator as the integral over circular strips. Substitute in
. Cancel these constants to obtain the expected distance as
. Great. At this point, you can crank up mathematica, type in the expression, and get ugly expressions and send
. Very tempting but not nice. Let’s try something else
.
If you plot for
, you will notice that it becomes sharper and sharper at
. Aha, it is becoming like a delta function multiplied by a constant
. Therefore, the expected distance is becoming
with
which is
.. We are more or less done.
This is the idea, but it is not rigorous. The rest of this post will make this as robust as titanium. However, it may also be a good point to stop reading.
I recall Stein and Shakarchi, chapter 3, section 2 on “good kernels and approximations to the identity”. This is what it says.
Suppose there is a family of functions such that
for some real number
independent of
- For every
,
as
Then for any bounded function , we have
as
, where
is the convolution between
and
.
This is like
, a Dirac delta function at the origin. Often, we write
. To be mathematically correct, we should write
as
instead. Now, the integral becomes a convolution between
and
and the above “theorem” gives us the answer
. Now back to our problem.
Scratch work: we want “equal”
supported on
. Let
. Let
. Now we know we have to define
where
is just some useless normalizing factor to make
satisfy condition 1. Notice that
is basically our
over some constant, shifted to the left by
. Let us verify further that
is a good kernel/approximation.
Condition 2 is trivial since our function is nonnegative and condition 1 implies 2. Condition 3 is more tricky to check. Let’s work on it. Fix some . Let
which is strictly less than 1.
By symmetry, the numerator of the integral (see condition 3 and the definition of ) is
, i.e.
. The denominator
also grows small with
, but not as rapidly as we will show. Integrating by parts, we know that
. Its lower bound turns out to be
for some small constant
.
For example, . Insert extra factors of the form
to get a lower bound. Now, consecutive factors cancel off and we are left with some constant divided by 9.5. The same reasoning applies to any
.
In short, the numerator dies faster than the denominator and condition 3 is satisfied.
Next, let . Working backwards using the “scratch work”, we see that the expected distance is just integrating
, which is
, and must converge to
as
. That is it.
Key ideas: Recognizing good kernels.