I don’t think i want to make this weekly. I shall write as and when i feel like it and that should be more frequent than 1/week i hope. Today is the first day of school. I am taking far less classes now, but strangely not strangely, i think i am going to learn more.

I don’t think i am taking any class on stochastic calculus. The reason: i heard the person teaching this class is very theory oriented and probably won’t provide any applications. Bad. I am a simple person. I need to see some applications amidst the theory building. Instead I plan to take another standard algorithms class (i took one such grad class back at berkeley. but basically i just slogged thru the homework and forgot everything that wasn’t tested in hw/test. Perhaps about 20% of this class i might already know, but it would be a good review. I’m starting from ground zero essentially. We talk about fibonacci heaps which i didn’t study the last time. Amortized analysis is like magic. You come up with some brilliant potential function and the key idea is a_i=c_i+\Phi_i-\Phi_{i-1} where c_i is the actual cost incurred. Looking at a_i, you sometimes overcount, sometimes undercount, but in the end, \sum c_i =\sum a_i - \Phi_k-\Phi_0 \leq \sum a_i is all that matters and we can look at a_i the amortized cost of every operation instead, and blithely ignore the actual cost c_i. A link I find useful is www.cs.duke.edu/courses/fall05/cps230/L-11.pdf.

Another class I want to take is combinatorics: the probabilistic method. This is cool. I learn some basics as an undergrad but this class i hope is going to go beyond that. Simple tricks like the union bound (P(\cup A_i) \leq \sum P(A_i)) and linearity of expectation can be used to obtain powerful bounds. That’s all for now. Tomorrow I’ll be attending two classes on commutative algebra and computational algebra, one of which will use the book i am supposed to be reading (Cox, Little, O’Shea). Kleiman is teaching both courses. I hope he’s a thorough and considerate prof and I hope (too) that these can turn out to be useful for me in future. It’s about the connecting the dots, right Steve?

I have to make an effort to attend seminars too. CS seminars to find applications for what i’m learning. Math seminars to gather more tools. It ain’t easy trying to bridge these two sides.

I decide to rest until school starts. Certainly I will have a lot to write about when I start taking classes again. :D

No hardcore math this week, but we have something very interesting that any biologist can read, i.e. all you need to know is high school calculus and probability. (!!!)

Someone on livejournal posted this problem. Suppose you have a (hyper)sphere in n dimensions (don’t worry, no one can really imagine anything dimensions higher than 3 or 4). Suppose you randomly pick two points on the surface of the sphere. What is the expected distance? Here is my “solution”. :)


Without loss of generality, fix one point to be the north pole. Note that though you do not need to imagine a (hyper)sphere, you do need to imagine a 3D sphere, and know spherical coordinates. Let S_n(R) denote the surface area of a sphere with radius R, in n+1 dimensions. So S_1(R)=2 \pi R, S_2(R)=4 \pi R^2, S_3(R)=2 \pi^2 R^3 and so on.

Before proceeding, you might find it helpful to review spherical coordinates, or even look at hyperspherical coordinates by checking out http://en.wikipedia.org/wiki/Hypersphere. Alright let us move on.

Let us work in n+1 dimensions. (You may like to substitute n=2.)
Remember first point is the north pole. The main idea is to integrate circular strips from top to bottom. Each circle has radius R \sin \theta where R is radius of our sphere and \theta is the “zenith angle” (http://en.wikipedia.org/wiki/Spherical_coordinates). Each of these strips have a certain probability. Its surface area is S_{n-1}(R \sin \theta) \times R d\theta. R d\theta is like the thickness of this strip. Convince yourself at least for n=2. So the probability of picking a point on this strip is S_{n-1}( R\sin \theta) \times R d\theta / S_{n}(R).

In fact, S_{n}(R)=\int_{\theta=0}^{\pi} S_{n-1}(R \sin \theta) R d\theta. For example, \int_{\theta=0}^{\pi} 2 \pi (R \sin \theta) R d\theta is equal to 4\pi R^2 or S_{2}.


Every point on each strip has equal distance from the north pole. The distance is 2 \sin (\theta/2). (just draw a line between the two points and bisect \theta.) So, the expected distance is \int_{\theta=0}^{\pi} \frac{S_{n-1}(R \sin \theta) (R d\theta)}{S_n} \left( 2\sin (\theta/2) \right).

Let K_{n-1}=S_{n-1}(1) be surface area of unit sphere. Obviously, by rescaling, similarity etc, S_n=K_n R^n. Express the denominator as the integral over circular strips. Substitute in K_{n-1}. Cancel these constants to obtain the expected distance as

\frac{\int_{\theta=0}^{\pi} (\sin^{n-1} \theta) (2 \sin (\theta/2)) d \theta}{\int_{\theta=0}^{\pi} ( \sin^{n-1} \theta ) d\theta}. Great. At this point, you can crank up mathematica, type in the expression, and get ugly expressions and send n\rightarrow \infty. Very tempting but not nice. Let’s try something else ;) .

If you plot \sin^{n} \theta for \theta \in [0,\pi], you will notice that it becomes sharper and sharper at \theta=\pi/2. Aha, it is becoming like a delta function multiplied by a constant \int_{\theta=0}^{\pi} (\sin^{n} \theta) d\theta. Therefore, the expected distance is becoming 2 \sin (\theta/2) with \theta=\pi/2 which is \sqrt{2}.. We are more or less done.


This is the idea, but it is not rigorous. The rest of this post will make this as robust as titanium. However, it may also be a good point to stop reading.

I recall Stein and Shakarchi, chapter 3, section 2 on “good kernels and approximations to the identity”. This is what it says.

Suppose there is a family of functions \{ K_{\delta} (x) \} such that

  1. \int_{\mathbb{R}^d} K_{\delta}(x) dx=1
  2. \int_{\mathbb{R}^d} | K_{\delta}(x) | dx \leq A for some real number A independent of \delta
  3. For every \eta >0, \int_{|x|\geq \eta} | K_{\delta}(x)| dx \rightarrow 0 as \delta \rightarrow 0

Then for any bounded function f, we have (f * K_{\delta})(x) \rightarrow f(x) as \delta \rightarrow 0, where (f*K_{\delta})(x)=\int_{\mathbb{R}^d} f(x-y) K_{\delta}(y) dy is the convolution between f and K_{\delta}.

This K_{\delta}(x) is like \delta_{0}(x), a Dirac delta function at the origin. Often, we write \int \delta_{x_0}(x) f(x) dx = f(x_0). To be mathematically correct, we should write \delta_{x_0}(x) as K_{\delta}(x-x_0) instead. Now, the integral becomes a convolution between K_{\delta} and f and the above “theorem” gives us the answer f(x_0). Now back to our problem.


Scratch work: we want \delta_{\pi/2}(\theta) “equal” \frac{ \sin^{n} \theta }{\int_{\theta=0}^{\pi} (\sin^{n} \theta) d\theta} supported on \left[0,\pi \right]. Let \delta=1/n. Let u=\theta-\pi/2. Now we know we have to define K_{\delta}(u)=\frac{\sin^{1/\delta} (u+\pi/2)}{F_{\delta}} where F_{\delta}=\int_{\theta=0}^{\pi} \sin^{1/\delta}(\theta) d\theta is just some useless normalizing factor to make K_{\delta} satisfy condition 1. Notice that K_{\delta} is basically our \sin^{n}(x) over some constant, shifted to the left by \pi/2. Let us verify further that K_{\delta} is a good kernel/approximation.

Condition 2 is trivial since our function is nonnegative and condition 1 implies 2. Condition 3 is more tricky to check. Let’s work on it. Fix some \eta>0. Let \varepsilon= \cos \eta which is strictly less than 1.

By symmetry, the numerator of the integral (see condition 3 and the definition of K_{\delta}) is 2 \int_{\eta}^{\pi/2} \sin^{n}(x+\pi/2) dx \leq \pi \times \cos^n \eta, i.e. o(\varepsilon^n). The denominator F_{1/n} also grows small with n, but not as rapidly as we will show. Integrating by parts, we know that F_{1/n}=\frac{n-1}{n} F_{1/(n-2)}. Its lower bound turns out to be C/n for some small constant C.

For example, F_{1/9.5}=\frac{8.5}{9.5} \frac{6.5}{7.5} \frac{4.5}{5.5} \frac{2.5}{3.5} F_{1/1.5}. Insert extra factors of the form \frac{k-1}{k} to get a lower bound. Now, consecutive factors cancel off and we are left with some constant divided by 9.5. The same reasoning applies to any \delta=1/n.

In short, the numerator dies faster than the denominator and condition 3 is satisfied.

Next, let f(x)=2 \sin(x/2). Working backwards using the “scratch work”, we see that the expected distance is just integrating K_{1/n}(\theta-\pi/2) f(\theta), which is (K_{1/n}*f)(\pi/2), and must converge to f(\pi/2)=\sqrt{2} as n \rightarrow \infty. That is it. :)


Key ideas: Recognizing good kernels.

Going to cheat this week by talking about some stuff that is already kind of familiar.. We’re going to see how the rational and Jordan canonical forms come about, from the algebra standpoint. i.e. Dummit and Foote again. (chapter 12 on modules over PIDs)

At the heart of this chapter is Fundamental Theorem, invariant factor form and elementary divisor form. Let us write this down nicely.


(invariant factor form) Let R be a PID and M be a finitely generated R-module. Then, M can be written as the direct sum of finitely many cyclic modules. More precisely, M \approx R^r \oplus R/(a_1) \oplus \ldots \oplus R/(a_m), for some nonnegative integer r, and a_1,\ldots,a_m (the annihilators of each cyclic module) are nonunits and satisfy a_1 \mid a_2 \mid \ldots a_m.

Proof of this in Dummit and Foote relies on a “technical lemma”, aka “don’t read me because you will forget me immediately anyway”, which says:

Let R be PID. Let N be any submodule of a free R-module M which has rank n (max number of R-linearly independent elements). Then rank of N is free of rank m with m \leq n, and there exists a basis y_1,\ldots,y_n of M such that a_1 y_1,\ldots,a_m y_m is a basis of N and a_1 \mid a_2 \ldots \mid a_m. (the a_i‘s are known as the invariant factors)

If we assume this result, the fundamental theorem falls easily. Let b_1,\ldots,b_n be basis for R^n and x_1,\ldots,x_n be set of generators of M of minimal size. Consider the map \pi: R^n \rightarrow M mapping x_i to b_i. This is surjective since x_1,\ldots,x_n generates M, so R^n/ \ker \pi \approx M. By the lemma, we can find another basis y_1,\ldots,y_n of R^n such that a_1 y_1, \ldots,a_m y_m is a basis for \ker \pi. This means R^n/ \ker \pi \approx (R y_1 \oplus \ldots \oplus R y_n)/(R a_1 y_1 \oplus \ldots \oplus R a_m y_m). See that this is isomorphic to R/(a_1) \ldots R/(a_m) \oplus R^{n-m}. (consider the map from R y_1 \oplus \ldots \oplus R y_n to R/(a_1) \oplus \ldots \oplus R/(a_m) by “modding” the coefficients of y_i by a_i, for i\leq m. The kernel is exactly R a_1 y_1 \oplus \ldots \oplus R a_m y_m).


If we look at each cyclic module, say R/(a), we can write a=u p_1^{\alpha_1} \ldots p_s^{\alpha_s} (remember R is PID and UFD). By Chinese Remainder Theorem, R/(a) \approx R/(p_1^{\alpha_1}) \oplus \ldots \oplus (p_s^{\alpha_s}) where p_i is a prime in R. For each cyclic module annihilated by an invariant factor, we can do this.

Now GROUP the cyclic modules annihilated by powers of the same prime together. Example: M \approx R/(3^2 5^3) \oplus R/(3^7 5^{10}) can be rewritten as N_1 \oplus N_2 where N_1 = R/(3^2) \oplus R/(3^7) and N_2=R/(5^3) \oplus R/(5^{10}). Notice that N_1 is annihilated by 3^7, so 3^7 is an elementary divisor. N_1 is NOT a cyclic module, but it is generated by 2 elements, both of which are annihilated by 3^7.

This is the so-called “elementary divisor form” of the Fundamental theorem.


Now, let R=F[x] act on V (or M) by x \circ v=Tv. The “invariant factor form” of the Fundamental theorem states that V \approx F[x]/(a_1(x)) \oplus \ldots \oplus F[x]/(a_m(x)) where a_1(x) \mid \ldots \mid a_m(x). The minimal polynomial is clearly a_m. If it’s anything smaller, then elements in the “bigger” cyclic modules will not be annihilated by it.

Consider a cyclic module, annihilated by say a(x)=x^k+b_{k-1}x^{k-1} + \ldots+b_0. Say the cyclic module is generated by some v, then consider the basis v, x \circ v, x^2 \circ v, \ldots, x^{k-1} \circ v, i.e. v_i=x^i \circ v. They are linearly independent, otherwise there is a smaller annihilator. They obviously spans the cyclic module. Moreover, T v_i = T(x^i \circ v) = v_{i+1} and T v_{k-1}=x^k \circ v=(a(x)-b_{k-1} x^{k-1} -\ldots - b_0) \circ v=-b_{k-1} v_{k-1}-\ldots-b_0 v_0. Representing T in this basis \{v_0,\ldots,v_{k-1}, we have the “companion matrix”. (google rational canonical form).

The discussion applies to only one cyclic module. Since M is a sum of such modules, it follows that T can be represented as a matrix with companion matrices on the diagonal.


To get the Jordan canonical form, do roughly the same thing. This time, M is decomposed into N_i, each annihilated by some p_i^{\alpha_i}. For Jordan form, we assume an algebraically closed F ??, i.e. the only interesting primes are linear factors. Therefore, each N_i is annihilated by some power of some linear factor x-\lambda. Pick that one element which is annihilated by (x-\lambda)^{\alpha} (and no less). Call it y.

Consider the set \{ y, (x-\lambda) \circ y, \ldots, (x-\lambda)^{\alpha-1} \circ y \}. This is a basis since y is annihilated only by (x-\lambda)^{\alpha} and no smaller polynomial. It has the right dimensions and must span. Let v_i=(x-\lambda)^{i-1} \circ y=(x-\lambda) \circ v_{i-1}. Notice that T v_{i-1}=v_i+\lambda v_{i-1}. Representing T in this basis gives us a Jordan block. Since M is direct sum of such modules, T is represented as a matrix with Jordan blocks on the diagonal. This is the Jordan canonical form.


Key ideas: decomposition into cyclic modules, and their annihilators

Busy week. I’m going to just review the mean ergodic theorem from Stein and Shakarchi, III, chapter 6, theorem, 5.1, and redo an interesting exercise inside.

Mean Ergodic Theorem: Let T be an isometry of the Hilbert space H. Let S=\{ f\in H:Tf=f\} be the invariant subspace of T. Let P be the orthogonal projection onto S. Let A_n=\frac{1}{n}(1+T+T^2+\ldots+T^{n-1}. Then for each f\in H, A_n(f) converges to P(f) in norm, as n \rightarrow \infty.

(I think Prof Tao has like 4 different proofs of this theorem, but unfortunately I can only give the “slick but not particularly illuminating” proof by von Neumann.)


Define S_*=\{ f\in H: T^* f=f\} and S_1 =\{f\in H:f=g-Tg, g\in H\}. Note that S,S_* are both closed subspaces because \| Tf \|=\|T^* f\|=\| f\|. (write \| Tf-f \| = \|Tf-Tf_i+f_i-f\| =\ldots). Denote closure of S_1 as \bar{S}_1.

Lemma: S=S_* and orthogonal complement of \bar{S}_1 is S.

Basic facts about isometry:
1) If T is an isometry, (Tf,Tg)=(f,g) for all f,g\in H.
I remember the way I did it is to write (T(f-g),T(f-g))=(f-g,f-g).

2) Therefore (T^* T f,g) for all f,g and conclude that T^* T=I.

Proof of lemma: If Tf=f, then T^* Tf=T^* f, i.e. f=T^* f. So S \subset S_*. Now suppose T^* f=f. Check that now, (Tf,f)=\| f,f \|. ((f,T^* f)-(f,f)=(f,T^* f-f)=0) Notice that by Cauchy-Schwarz, | (Tf,f) | \leq \|Tf\| \|f\|=\|f\|^2. But we actually have equality. This is only possible when Tf=cf, and obviously, c=1.

For second part, f \in \bar{S}_1^{\perp} means (f,g-Tg)=0 for all g\in H. Since 0=(f,g)-(f,Tg)=(f-T^* f,g) for all g, we have f\in S_*, which is equal to S.


Main proof is short. Write f=f_0+f_1 where f_0 \in S and f_1 \in S^{\perp}=\bar{S}_1. Now apply A_n to f.

A_n(f_0)=\frac{1}{n}(f_0+Tf_0+\ldots+T^{n-1}f_0)=f_0=P(f) since f_0\in S, the subspace invariant under T. All we need to show that A_n(f_1) goes to 0. We can pick some f_1'=g-Tg\in S_1 arbitrarily close to f_1. The idea is to bound the norm of A_n(f_1-f_1') and A_n(f_1') separately.

A_n(f_1')=A_n(g-Tg)=\frac{1}{n}(g-T^n g) (telescoping sum). Check that this converges to 0.

\|A_n(f_1-f_1') \| \leq \frac{1}{n} \sum_{k=0}^{n-1}\| T^k (f_1-f_1')\| =\|f_1-f_1'\|. We can pick f_1' arbitrarily close to f_1.

This concludes the proof.


I’m too lazy to study the pointwise ergodic theorem (by Birkoff?). Shall just quote its interesting corollary.

Let \tau be an ergodic measure-preserving transformation. For an integrable function f, we have

\frac{1}{m} \sum_{k=0}^{m-1} f(\tau^k (x)) converges to \int_x f d\mu for a.e. x \in X as m  \rightarrow \infty.

The LHS can be interpreted as the “time average” of f. RHS is like the “space average”. \tau is like modifying x every time step.

Recall some definitions. \tau (assumed to be a measure preserving map, i.e. \mu(\tau^{-1}(E))=\mu(E)) is ergodic if for any measurable set E which is invariant under \tau (i.e. E and \tau^{-1}(E) differs by measure 0), either E or E^{c} has measure 0. (in probability space, this just says any invariant set under \tau has either measure 0 or 1)

We shall skip the proof.


An interesting exercise. Consider the map \tau_m defined on \left[0,1 \right) as \tau(x)=mx \mod 1. Show that \tau is measure preserving, mixing (hence ergodic). Prove as a consequence that for “almost every” number x, x is “normal”.

Let us define what we mean by “normal”. Suppose we do a “m-adic” expansion of x, i.e. write x in base m, i.e. x=\sum_{j=1}^{\infty} \frac{a_j}{m^j}. Then x is normal if for every k between 0 and m-1, we have \frac{\#\{j:a_j=k, 1\leq j \leq N\}}{N} \rightarrow \frac{1}{m} as N\rightarrow \infty.

This seems quite profound. Almost every number in \left[ 0,1 \right) is “normal”, i.e. every digit is repeated equal number of times!

Let’s do this exercise. Clearly, \tau is measure-preserving. Its inverse is m times smaller but repeated m times. Showing that it is mixing amounts to generalizing the proof for the doubling map on page 305.

Finally, apply the “space average=time average” corollary. Define f(x) to be characteristic function on the interval \left[ 0.k, 0.(k+1) \right) (in m-adic form). Then \int_x f d\mu=\frac{1}{m}. Notice that f(\tau^j(x))=1 only when a_j=k. Hence the time average is the LHS. Simple application of a deep theorem gives us an interesting result.


Key ideas: Ergodic measure-preserving maps, properties of an isometry.

This week we will study localization. The bonus/motivation is an algorithm to check whether an ideal in k[x_1,\ldots,x_n] is prime.

Localizing a ring R by its subset D really just means constructing a ring of “fractions” such that elements of D becomes units. This ring of fractions shall be denoted as D^{-1} R.

Here’s an example from Wikipedia. Let R be the ring of functions defined on some algebraic variety V. Suppose we are interested in studying this variety “locally” at a point p\in V. We will let D be set of functions not zero at point p and “localizes” R with D. The resulting ring D^{-1} R contains additional information about the behavior of V near p.

Reference as usual is Dummit and Foote (chapter 15.4).


The formal construction of D^{-1} R is as follows. Assume D \subset R to be multiplicatively closed. Define a relation on R \times D: (r,d) \sim (s,e) iff x(er-ds)=0 for some x \in D.

This is an equivalence relation. Let r/d denote the equivalence class of (r,d). Define addition and multiplication like fractions. e.g. a/b+c/d=(ad+bc)/(bd) and a/b \times c/d = ac/bd. Note that for every d\in D, d/1 is a unit in D^{-1} R. Its inverse is simply 1/d.

Define the map \pi: R \rightarrow D^{-1} R as \pi(r)=r/1 for the rest of the post. Two initial observations.

1) If D contains no zero divisors, then \pi is an injection. Easy to check. \pi(r)=r/1=0/1 implies that there exists some x\in D such that xr=0, implies that D has zero divisors. Example: if we localize \mathbb{Z} by \mathbb{Z}-\{0\}, then the map \pi(z)=z/1 is an injection. The ring of fractions here is just \mathbb{Q}.

2) D^{-1}R=0 iff 0 \in D. Proof: LHS iff 1/1=0/1, i.e. there exists x\in D such that x1=0, i.e. 0 \in D. So almost all the time, we assume D does not contain 0.

Do not bogged down by the formalism! Really just think of D^{-1}R as fractions where denominators are taken from D.


Now, we introduce two operators c (contraction) and e (extension). Given an ideal I \subset R, e(I)=\pi(I) D^{-1} R. NOT just \pi(I). It is best to interpret e(I) as \{ a/d: a\in I, d\in D \}. (anything in e(I) is of the form (r/1)(a/d)=ra/d where r\in I and a\in R and d \in D and ra\in I). The contraction is simply \pi^{-1}. We can think of \pi^{-1}(J) as J \cap R.

An important fact is that c,e is a bijective correspondence between prime ideals of R disjoint from D and prime ideals of D^{-1}R. So if P is a prime ideal of R, disjoint from D, then e(P) is a prime ideal in D^{-1}R and moreover, c(e(P))=P. On the other hand, if Q is a prime ideal of D^{-1}R, then c(Q) is a prime ideal in R, disjoint from D and also, e(c(Q))=Q.

We shall skip the routine but glorious details. This is proposition 38, page 709.

Note that for any ideal J \subset D^{-1}R, we always have e(c(J))=J. And for any ideal I \subset R, c(e(I))=\{r\in R: dr\in I \text{ for some } d\in D\}. i.e. these are elements of R such that there are some x\in I, d\in D such that x/d=r. i.e. we are adding elements of I divided by elements of D. Typically c(e(I)) is larger than I. If they are equal, we say I is saturated. Giving denominators to it does not add any new elements.


Let us get to the algorithm. Say we want to know whether P is a prime ideal in k[x_1,\ldots,x_n]. To do so, we try to show iteratively/inductively that P_i=P \cap k[x_1,\ldots,x_i] is prime. Think of k[x_1,\ldots,x_i] as k[x_1,\ldots,x_{i-1}][x_i], i.e. let R=k[x_1,\ldots,x_{i-1}]. Assume by induction that P_{i-1} is prime. Let S=R/P_{i-1}. This is an integral domain. Let F be the quotient field of S, i.e. the localization of S by D=S-\{0\}. Note that F[x] (or F[x_i]) is an Euclidean domain (smell something nice). Let us draw the following very important picture:

R[x] \xrightarrow{\varphi} (R/P \cap R)[x]=(R/P_i)[x]=S[x] \xrightarrow{\iota} F[x]

Let us put proposition 39 in this context. For the forward direction, it says that if P=P_i is prime, then P_{i-1}=P_i \cap R has to be prime. Moreover, let \bar{P}=\varphi(P), e(\bar{P}) is a prime ideal in F[x] and \bar{P} is saturated, i.e. e(c(\bar{P})=\bar{P}F[x] \cap S[x]=\bar{P}.

The other direction says that if these conditions are true, then P is a prime ideal. Inductively, the condition about S being an integral domain is already satisfied. So it suffices to check that \bar{P} F[x]=e(\bar{P}) is prime AND if \bar{P} is saturated. The first conditon is easy since F[x] is an Euclidean domain, PID, and every prime ideal is maximal and we just need to test if the generator is irreducible or not, or equal to 0$. Checking whether \bar{P} is saturated sounds more hairy.


Again, I’m not going into proof details.. Say \bar{P} F[x]=(h(x)). As we mention earlier, we need to check if h(x) is 0 or irreducible in F[x]. That aside, we need to check if \bar{P}F[x] \cap S[x]=\bar{P}. Proposition 40 comes to the rescue. Before that, note that we can assume that h(x) \in S[x]. This is because h(x) \in F[x] and we can clear the denominators of all the coefficients to make it a polynomial in S[x]. Let a \in S be the leading coefficient of h(x). Let S_a be the localization of S with respect to powers of a, i.e. introduce powers of a as denominators.

Prop 40 says that \bar{P} F[x] \cap S[x]= \bar{P} S_a[x] \cap S[x]. Let B be ideal generated by \bar{P} and 1-at in S[x,t] (weird). Then \bar{P} S_a[x] \cap S[x] = B \cap S[x].

We just need to check if B \cap S[x] = S[x]. Exercise 3 says that this is the same as checking the unmodded version, i.e. whether the ideal generated by P and 1-at in R[x,t] (instead of S[x,t]) is equal to P, which can be achieved by working with the reduced Groebner basis of P_i=P.


Although we didn’t go into details, we gain an idea of the main principles behind the algorithm. By localization, we can turn rings into fields, work in Euclidean domains, PIDs, and then translate back.

Key ideas: extension and contraction of ideals, bijective correspondence between prime ideals, I prime in R[x] implies that R \cap I is prime, i.e. S=R/(R \cap I) is integral domain which can be extended/localized to a field, and image of I in S[x] is saturated.

I’m late again. Really quite busy recently. Anyway, we are studying Dummit and Foote again, Chapter 15.3. My algebraic geometry is quite weak.

First, we define what is an integral extension. Note that we are dealing with commutative rings. Say R is a subring of S. An element s\in S is integral over R if s is the root of a monic polynomial in R[x]. We say the ring S is integral over R, i.e. integral extension, if every s \in S is integral over R.

Next, we define what is meant by “algebraically independent”. Let k be a field and y_1,\ldots,y_q be in some k-algebra. We say they are algebraically independent (over k) if there is no nonzero polynomial p in k[x_1,\ldots,x_q] such that p(y_1,\ldots,y_q)=0. Another way of saying this is that k[x_1,\ldots,x_q] is isomorphic to k[y_1,\ldots,y_q] by the map x_i \mapsto y_i. (kernel is trivial)


The Noether’s Normalization Lemma says: Let k be a field and A=k[r_1,\ldots,r_m] be a finitely generated k-algebra. Then we can find y_1,\ldots,y_q \in A such that these elements are algebraically independent (over k) and A is integral over k[y_1,\ldots,y_q] and q \leq m.

The proof looks rather technical. We shall skip this, despite its importance..


Weak Form of Hilbert’s Nullstellensatz (WFHN): Let k be an algebraically closed field. Then M is a maximal ideal in the polynomial ring k[x_1,\ldots,x_n] iff M=(x_1-a_1,\ldots,x_n-a_n ) for some a_1,\ldots,a_n \in k. Equivalently, there is a correspondence between points in \mathbb{A}^n and maximal ideals in k[\mathbb{A}^n].

Remark: Note that I((a_1,\ldots,a_n))=(x_1-a_1,x_2-a_2,\ldots,x_n-a_n). To see this, consider the map from k[x_1,\ldots,x_n] to k by mapping x_i \mapsto a_i. Clearly, k[x_1,\ldots,x_n]/I((a_1,\ldots,a_n)) \approx k. So LHS has to be a maximal ideal. The RHS is obviously a subset of LHS. Moreover, the RHS is a maximal ideal. If we add any polynomial to RHS, polynomial division gives us an element in k. This expands the ideal to the whole ring. In short, RHS is a subset of LHS, but both are maximal. So they have to be the same.

Remark: Let us see why the two statements are equivalent. Suppose WFHN holds. Pick a point (a_1,\ldots,a_n). Its ideal is (x_1-a_1,\ldots,x_n-a_n). By WFHN, this is a maximal ideal. If we apply V to this ideal, we get back the same point.

Now, suppose we pick a maximal ideal. By WFHN, it is of the form (x_1-a_1,\ldots,x_n-a_n). The variety of is a single point (a_1,\ldots,a_n). Apply I to this single point to get back (x_1-a_1,\ldots,x_n-a_n). So we got a bijective correspondence between maximal ideals and points.

Now consider the other direction. Assume the correspondence. We already know that M=(x_1-a_1,\ldots,x_n-a_n) is a maximal ideal. Suppose we have a maximal ideal M. By the bijective correspondence, I(V(M))=M. But V(M) (by the correspondence) is a single point and its ideal is of the form (x_1-a_1,\ldots,x_n-a_n).


Finally proof of WFHN: One direction we have already explained, that is (x_1-a_1,\ldots,x_n-a_n) is a maximal ideal. Consider the other direction. Let M be a maximal ideal. Let E=k[x_1,\ldots,x_n]/M be a field finitely generated by \{ \bar{x}_1 ,\ldots,\bar{x}_n \}. Apply Noether’s Normalization Lemma to say E is integral over some polynomial ring k[y_1,\ldots,y_q]. Since E is a field, k[y_1,\ldots,y_q] must be a field too!

(in general, if S is integral over R and S is a field, then R is also a field. reason: given r \in R \subset S. S is a field. So there is some r^{-1} \in S. But S is integral over R. So there is some monic polynomial satisfied by r^{-1}, say r^{-m}+a_{m-1}r^{-m+1}+\ldots+a_0=0. Multiply everything by r^{m-1} to get r^{-1}=-(a_{m-1}+\ldots + a_0 r^{m-1} \in R.)

A polynomial ring is in general not a field, unless q=0. So E is integral over k. Both are fields, so E is an algebraic extension of k. But k is algebraically closed, i.e. \bar{x}_i \in k, i.e. \bar{x}_i=a_i+M for some a_i\in k, i.e. x_i-a_i \in M. Hence, (x_1-a_1,\ldots,x_n-a_n) \subset M. Both are maximal ideals and are thus equal.

Important remark: The significance of the weak form is.. If V(I)=\phi, then I has to be the whole polynomial ring. If not, I \subset M for some maximal ideal M=(x_1-a_1,\ldots,x_n-a_n). V(I) must therefore contain V(M), i.e. contain the point (a_1,\ldots,a_n). Contradiction.

In English, the only ideal which has no common roots is the whole polynomial ring.


Now the stronger form says: I(V(I))=rad(I). Moreover, there is now a correspondence between affine algebraic sets and radical ideals.

Proof is rather slick. Clearly, rad(I) \subset I(V(I)). Focus on the other direction. By Hilbert Basis Theorem, assume I=(f_1,\ldots,f_m). Let g\in I(V(I)). Introduce new variable x_{n+1}. Consider the ideal I'=(f_1,\ldots,f_m,x_{n+1}g-1).

Claim is that V(I')=\phi. Reason: Pick any point (a_1,\ldots,a_n,a_{n+1}) \in V(I'). By definition of I', f_i(a_1,\ldots,a_n)=0. So, (a_1,\ldots,a_n)\in V(I). But g\in I(V(I)). So g(a_1,\ldots,a_n)=0. But (a_1,\ldots,a_n,a_{n+1})\in V(I') implies that a_{n+1}g-1=-1 \neq 0, contradiction.

By WFHN’s important remark, we know I' is the whole ring, i.e. 1\in I'. We can write 1=a_1 f_1+\ldots+a_m f_m + a_{m+1} (x_{n+1}g-1) for some a_i \in k[x_1,\ldots,x_{n+1}]. Now replace x_{n+1} with g and multiply both sides by extremely high powers of g, say g^N, to wipe out g in the denominator of the a_i‘s. We will obtain g^N=c_1 f_1 + \ldots + c_m f_m where c_i is a_i with x_{n+1} replaced with g, multiplied by g^N where N is huge so that c_i \in I. Thus, g \in rad(I).

As for the correspondence, it’s easy once we have the main result. We already know that V(I(V(I))=V(I). So all we need to show is that I(V(I))=I when I is a radical ideal. The Nullstellensatz says I(V(I))=rad(I)=I.


Some of the tricks are just too clever.

Key ideas: maximal ideals corresponds to points, radical ideas corresponds to affine algebraic sets/varieties, polynomial magic

Today, we will work on exercise 11 of Dummit and Foote (again?!) section 14.7 and some other results/exercises it uses. I think we will be working on this book quite often since I didn’t take any grad class in algebra (officially) but hope to work on algebraic stuff in grad school. Without further ado,

Question 10: Let K=\mathbb{Q}(\zeta_p) be cyclotomic field for some prime p and let G=Gal(K/\mathbb{Q}). Let \zeta be any p-th root of unity. Prove that \sum_{\sigma \in G} \sigma(\zeta) (the trace) is -1 or p-1 depending on whether \zeta is or is not a primitive p-th root of unity.

Proof: This is quite obvious. If \zeta is a primitive root of unity, the sum is over all the roots of unity, except 1. This has to be -1 since 1+\zeta+\ldots+\zeta^{p-1}=(\zeta^p-1)/(\zeta-1)=0. If \zeta is not a primitive root of unity, i.e. \zeta=1, then the sum will give p-1 since |G|=p-1. H therefore corresponds to \{1,2,4,6,\ldots,p-3\}


Question 11. The setup: Let K=\mathbb{Q}(\zeta_p) be the cyclotomic field, for some odd prime p. Let G=Gal(K/\mathbb{Q}). Say G is generated by \sigma. Let H be a subgroup of index 2 in cyclic group G. Define \eta_0=\sum_{\tau\in H} \tau(\zeta_p) and \eta_1=\sum_{\tau \in \sigma H} \tau(\zeta_p).

Let’s think about this set-up: |G|=p-1 and G=\{1,\sigma,\ldots,\sigma^{p-2}\}. Let \sigma_a denote the map that sends \zeta_p to \zeta_p^a. Recall there is an isomorphism between G and \mathbb{Z}_p^{\times} by sending \sigma_a to a. H turns out to be \{1^2,2^2,3^2,\ldots,\left(\frac{p-1}{2}\right)^2\}. The proof lies in elementary number theory. i^2=j^2 \mod p iff i=j \mod p or i=p-j \mod p. So the set \{i^2:i \in \mathbb{Z}\} has exactly |G|/2 elements. Also, it is a group since the product of squares is a square.


Part (a): Show that \sigma(\eta_0)=\eta_1 and \sigma(\eta_1)=\eta_0 and that \eta_0 = \sum_{a=\text{square}}\zeta_p^a and \eta_1=\sum_{b\neq\text{square}} \zeta_p^b.

Proof: For first part, just apply the definitions. The second part follows from our remark earlier about H being a set of squares. The coset \sigma H is disjoint from H, and has to contain nonsquares.


Part (b): Prove that \eta_0+\eta_1=(\zeta_p,1)=-1 and \eta_0-\eta_1=(\zeta_p,-1) where those bracketed terms are the Lagrange resolvents of \zeta_p.

Proof: What is the Lagrange resolvent in our context? (\zeta_p,\alpha)=\sum_{i=0}^{p-2} \alpha^i \sigma^i(\zeta_p). So (\zeta_p,1)=\sum_{i=0}^{p-1} \sigma^i(\zeta_p)=\sum_{\tau \in G} \tau(\zeta_p)=\eta_0+\eta_1. By same argument as in question 10, this sum is -1. Showing \eta_0-\eta_1=(\zeta_p,-1) is just as straightforward.


Part (c): Let g=\sum_{i=0}^{p-1} \zeta_p^{i^2} (the classical Gauss sum). Prove that g=(\zeta_p,-1)=\sum_{i=0}^{p-2} (-1)^i \sigma^i (\zeta_p).

Proof: This part is nice. Remember that i^2=(p-i)^2 in \mathbb{Z}_p. So g=1+\eta_0+\eta_0=-(\eta_0+\eta_1)+2\eta_0=\eta_0-\eta_1=(\zeta_p,-1).


Part (d): Prove that \tau g=g if \tau \in H and \tau g=-g if \tau \notin H. Conclude that \left[ \mathbb{Q}(g):\mathbb{Q}\right]=2. Recall that complex conjugation is the automorphism \sigma_{-1} on K. Conclude that \bar{g}=g if -1 is a square mod p, i.e. p=1 \mod 4, and \bar{g}=-g if -1 is not a square mod p, i.e. p=3 \mod 4.

Proof: It’s late now but I will persevere! H is of index 2 and must be a normal subgroup. Recall that g=\eta_0-\eta_1=\sum_{\tau'\in H} \tau'(\zeta_p)-\sum_{\tau'\in \sigma H} \tau'(\zeta_p). If we apply \tau\in H to g, clearly the first term is still a sum over H. As for the second term, we will be summing over \tau \sigma \tau'' where \tau'' \in H. Usual trick here. Write \tau \sigma \tau''=\sigma (\sigma^{-1} \tau \sigma) \tau'' \in \sigma H (noting that H is a normal subgroup). Hence, \tau g=\tau \eta_0-\tau \eta_1=\eta_0-\eta_1=g.

Actually this is stupid. We are dealing with a cyclic group. So H=\{1,\sigma^2,\sigma^4,\ldots\}. So clearly, if \tau \in \sigma H, then \tau is some odd power of \sigma and thus \tau \eta_0=\eta_1 and \tau \eta_1=\eta_0. In this case, \tau g=-g.

We have so far shown that H is the subgroup which fixes \mathbb{Q}(g). By the fundamental theorem of Galois theory, \left[ \mathbb{Q}(g):\mathbb{Q}\right] is the index of H, i.e. 2.

If -1 is a square \mod p, then \sigma_{-1}\in H and must fix g, i.e. \bar{g}=\sigma_{-1}(g)=g. Otherwise, it must send g to -g as what we have just shown.


Part (e): Prove that g \bar{g}=p.

Proof: Use the hint. Conjugate g in part (c), we have \bar{g}=\sum_{j=0}^{p-2} (-1)^j (\sigma^{j}(\zeta_p))^{-1}. The product is therefore \sum_{i,j=0}^{p-2} (-1)^{i-j} \sigma^j \frac{\sigma^{i-j}(\zeta_p)}{\zeta_p}. We let k=i-j and sum in a different manner: \sum_{k=0}^{p-2} (-1)^k \sum_{j=0}^{p-2} \sigma^j \frac{\sigma^k(\zeta_p)}{\zeta_p}. If k=0, then the inner term \frac{\sigma^k(\zeta_p)}{\zeta_p} is 1 and the inner sum is p-1. Otherwise, that inner term is some primitive root of unity and the inner sum is -1 (by question 10). Thus, the sum becomes p-1+\sum_{k=1}^{p-2}(-1)^{k+1}=p.


Part (f): Conclude that g^2=(-1)^{(p-1)/2}p and that \mathbb{Q}(\sqrt{(-1)^{(p-1)/2}}) is the unique quadratic subfield of \mathbb{Q}(\zeta_p).

Proof: We know from part (d) that \bar{g}=g or -g. Split into 2 cases. If p=1\mod 4, then the exponent on RHS is even and from RHS: p=g \bar{g}=g^2. If p=3 \mod 4, then from RHS: -p=-g \bar{g}=g^2.

Part (d) shows that \mathbb{Q}(g) is a quadratic subfield. In addition, it is unique. Given any other quadratic subfield, we can let H be the corresponding subgroup in G and apply the results above to show that its fixed field is \mathbb{Q}(g).


Key ideas: working with Gal(\mathbb{Q}(\zeta_p)/\mathbb{Q}), changing order of summing

This week I will study some important basics of modules from Chapter 10 of Dummit and Foote, particularly exact sequences and projective modules. Studying exact sequences will prepare us for studying homological algebra some time later.

First, a reminder about direct products/sums of modules. If M=N_1 \oplus \ldots \oplus N_k where N_i‘s are submodules of M, we mean that every element in N_1+\ldots+N_k is written uniquely as a_1+\ldots+a_k where a_i\in N_i, or equivalently the map \pi: N_1 \times \ldots \times N_k \mapsto N_1+\ldots+N_k defined as \pi(a_1,\ldots,a_k)=a_1+\ldots+a_k is an isomorphism. So direct sum or product, it doesn’t matter.

Next, recall what are free modules. When we say \mathcal{F} is a free R-module, we have to specify a subset A such that every element of \mathcal{F} can be written uniquely as a linear combination of elements from A, i.e. if x\in \mathcal{F}, then there exist uniquely a_1,\ldots, a_n in A and r_1,\ldots, r_n in R such that x=\sum_{i=1}^n r_i a_i. We call A a basis for \mathcal{F}. (Think linear algebra.) If R is commutative, then all bases have the same size, called the rank. If not commutative, the bases may be of different size. An important property of free modules is the “universal property”:

Given any R-module M and any map \phi:A \mapsto M, we can always extend \phi linearly to a R-module homomorphism between F and M. (Think linear algebra again.)

Note that if A is finite, say \{a_1,\ldots, a_n\}, then any free module on A is isomorphic to Ra_1 \oplus \ldots \oplus Ra_n as expected.


Now, we get to the main topic: exact sequences. Given X \xrightarrow{\alpha} Y \xrightarrow{\beta} Z, we say the sequence is exact at Y if \alpha(X)=\ker \beta. Notice that 0 \rightarrow A \xrightarrow{\psi} B is exact at A iff \psi is injective, and that B \xrightarrow{\phi} C \rightarrow 0 is exact at C iff \phi is surjective. In this chapter, we often deal with “short exact sequences”: 0 \rightarrow A \xrightarrow{\psi} B \xrightarrow{\phi} C \rightarrow 0. When we see such a sequence, we should intuitively think of B being an extension of C (by A) because B/ \psi(A) \approx C.

Notice that there is always an easy extension of a module C by A, by letting B= A \oplus C. In a certain sense, B got “split” by A and C. Let us define the notion of a “split sequence” now. If 0 \rightarrow A \xrightarrow{\psi} B \xrightarrow{\phi} C \rightarrow 0 is “split”, we mean that up to isomorphism, B=A \oplus C. What’s happening in more precise terms, is that B=\psi(A) \oplus C' where C' \approx C by \phi_1, where \phi_1 is \phi restricted to C'\subset B. We can say B is a “split extension” of C by A.

Another way to think of a split sequence is that there is a R-module homomorphism \mu:C \mapsto B such that \phi \circ \mu=id. Let us see why this is equivalent. If \mu is provided, then we let C'=\mu(C). If C' is given, we define \mu=\phi_1^{-1}.


Before the definition of a projective module, we need to consider and understand the following scenario.

Suppose we have a short exact sequence 0 \rightarrow L \xrightarrow{\psi} M \xrightarrow{\phi} N \rightarrow 0 and some R-module D. What is the relationship between \hom(D,M) and \hom(D,L) and \hom(D,N)? Given f\in \hom(D,L), we can easily get a f'\in \hom(D,M) by letting f'=\psi \circ f. We denote this obvious map between \hom(D,L) and \hom(D,M) as \psi'. Since \psi is injective, \psi' has to be injective. Reason is that if \psi \circ f=\psi \circ g and \psi is injective, then f=g. i.e. 0 \rightarrow \hom(D,L) \xrightarrow{\psi'} \hom(D,M) is exact.

The relationship between \hom(D,M) and \hom(D,N) is less clear. It turns out that the map \phi' is not always surjective! Given some f\in \hom(D,N), we cannot always “lift” it to the extension M, i.e. find a F\in \hom(D,M) such that \phi \circ F=f, i.e. \phi'(F)=f.

Nevertheless, it is known that 0 \rightarrow \hom(D,L) \xrightarrow{\psi'} \hom(D,M) \xrightarrow{\phi'} \hom(D,N) is exact. The left part is done, so we just need to show the exactness at \hom(D,M), i.e. \ker \phi' = \psi'(\hom(D,L)). This looks hairy and a little routine, so we shall skip.

Some category theory at this point :D We can think of \hom(D,\cdot) as a covariant functor between R-modules and abelian groups. If X is a R-module, then this functor will map it to \hom(D,X). And if X,Y are R-modules and \alpha: X \mapsto Y, we can define a group homomorphism \alpha': \hom(D,X) \rightarrow \hom(D,Y), defined by \alpha'(f)=\alpha \circ f. Our discussion earlier shows that \hom(D,\cdot) is a “left exact functor”.


SO, what are projective modules? We say P is a projective R-module if: for any exact sequence 0 \rightarrow L \xrightarrow{\psi} M \xrightarrow{\phi} N \rightarrow 0 (L,M,N can be anything), 0 \rightarrow \hom(D,L) \xrightarrow{\psi'} \hom(D,M) \xrightarrow{\phi'} \hom(D,N) \rightarrow 0 is also exact. We know this is not always true in general by the previous discussion. But to a projective module, this is true for any exact sequences. This is equivalent (by previous discussion) to saying that: for any R-modules M,N such that M \xrightarrow{\phi} N \rightarrow 0 is exact, we can lift any f\in \hom(P,N) to some F\in \hom(P,M) such that \phi \circ F=f, i.e. \phi':\hom(D,M) \mapsto \hom(D,N) is surjective. We call this definition 1.


We will end this post by establishing two more equivalent definitions of a projective module. Particularly, we are curious why are such modules called “projective”?

Definition 2 (of P being projective): If P is a quotient of M, then P is isomorphic to a direct summand of M, i.e. every short exact sequence 0 \rightarrow L \rightarrow M \rightarrow P \rightarrow 0 splits.

Definition 3: P is a direct summand of a free R-module.


First we do (1) \Rightarrow (2).

Say we are given some short exact sequence 0 \rightarrow L \rightarrow M \rightarrow P \rightarrow 0. From definition 1, let N=P, i.e. lift the identity map in \hom(P,N)=\hom(P,P) to a map in \hom(P,M). So there exists some F \in \hom(P,M) such that \phi \circ F=id. Think of F as \mu in the section about split sequences.


Second, we do (2) \Rightarrow (3).

We can always construct a free module on the generators of P. Say S is a set of generators for P. Define \mathcal{F}(S) as a free module on S. Define \varphi:\mathcal{F}(S) \mapsto P as the unique “linear extension” of inclusion map from S to P (recall the “universal property of free module \mathcal{F}(S)). Hence we have an short exact sequence 0 \rightarrow \ker \varphi \xrightarrow{\iota} \mathcal{F}(S) \xrightarrow{\varphi} P \rightarrow 0, i.e. P is the quotient of a free module. Apply definition 2.


Lastly, we do (3) \Rightarrow (1).

Assume M \xrightarrow{\phi} N \rightarrow 0.

Suppose \mathcal{F}(S)=P\oplus K where \mathcal{F}(S) is a free R-module on set S. Let f \in \hom(P,N). We want to lift it to some F\in \hom(P,M).

The main trick here I would say is “when you lift a bigger thing, you lift a smaller thing on it together”. The bigger thing is f \circ \pi : \mathcal{F}(S) \mapsto N where \pi is a natural projection from \mathcal{F}(S) to P. We want to lift it to some F' \in \hom(\mathcal{F}(S),M). The “lifting” means that \phi \circ F' = f \circ \pi. So how is the smaller thing lifted together?

We can define F as F' (the bigger thing lifted) restricted to P. F will be the smaller thing f is lifted to. Let us check this. Let x\in P. Then \phi(F(x))=\phi(F'(x,0))=f(\pi(x,0))=f(x).

It remains to “lift the bigger thing”, i.e. define F' \in \hom(\mathcal{F}(S),M) such that for every s\in S, \phi(F'(s))=f(\pi(s)).

To do so, we exploit the “universal property” of \mathcal{F}(S) to define F'. We will define a function g:S \mapsto M such that \phi(g(s))=f(\pi(s)) and then extend it to get F' by the “universal property” of \mathcal{F}(S). Getting g is easy. f \circ \pi maps into N and \phi\in \hom(M,N) is surjective: given any s\in S, there is a m_s\in M such that \phi(m_s)=f(\pi(s)). Define g(s) to be m_s and we are done.


Definition 2 is like saying that given any module M which projects onto P has P as a direct summand (up to isomorphism). That sort of explains why these modules are called “projective”.

Another way of stating definition 1 is: P is projective iff the functor \hom(P,\cdot) is exact, i.e. take short exact sequences to short exact sequences.


Key ideas: universal property, split exact sequences, “lift big to lift small”

This week (or rather last week) I am a little late. This time I am writing about Wedderburn’s Theorem on Finite Division Rings which says that any finite division ring D is commutative (i.e. a field). Recall that a division ring is just a ring where every nonzero element has a multiplicative inverse. To prove this, I will work through exercise 13 of Dummit and Foote’s chapter 13.6.

Step 1: Let Z be center of D. Show that Z is a field containing \mathbb{F}_p for some prime p. If Z=\mathbb{F}_q, then D has order q^n for some integer n.

We want to show that Z is a finite field. Remember D is a division ring, so every element of Z already has a multiplicative inverse. Moreover, the inverse is also in Z because: Let x \in Z and a \in D. Since x a^{-1}=a^{-1} x, we have a x^{-1}=x^{-1} a. So x^{-1} \in Z and Z is a finite field with some characteristic p. It acts on D like a field acting on a vector space. The last statement follows readily from counting. (q has to be some power of p)


Step 2: The nonzero elements D^{\times} of D form a multiplicative group. For any x \in D^{\times}, show that the elements of D which commute with x form a division ring which contains Z. Show that this division ring is of order q^m and m<n if x is not an element of Z.

The proof is similar to the previous. This is the centralizer of x and obviously contains Z. By the vector space argument, division ring has size q^m. If x is not in Z, then its centralizer cannot equal D and thus m<n.


Step 3: Show that the class equation for group D^{\times} is q^n-1=(q-1)+\sum_{i=1}^{r} \frac{q^n-1}{|C_{D^{\times}}(x_i)|} where x_1,\ldots,x_r are representatives of distinct conjugacy classes in D^{\times} not contained in the center of D^{\times}. Conclude from step 2 that for each i, |C_{D^{\times}}(x_i)|=q^{m_i}-1 for some m_i<n.

Let us revise the class equation. Assume that D^{\times} acts on itself by group conjugation. That |D^{\times}|=\sum_{x \text{with orbit size 1}} 1 + \sum_{i=1}^{r} |K_i| where K_i is orbit of x_i. LHS is q^n-1. First term on RHS is |Z|=q^m-1. And second term is \sum_{i=1}^{r} \frac{q^n-1}{|C_{D^{\times}}(x_i)|} using the stabilizer-orbit relationship.

The previous step says that the centralizer has order q^{m_i}. Its multiplicative group has zero removed.


Step 4: Prove that since \frac{q^n-1}{q^{m_i}-1} is an integer (it’s the size of K_i), then m_i divides n. Conclude that \Phi_n(x) divides \frac{x^n-1}{x^{m_i}-1} and hence that the integer \Phi_n(q) divides \frac{q^n-1}{q^{m_i}-1} for i=1,\ldots r.

For the first part: Let m=m_i and n=Am+r where 0\leq r <m. Write q^{Am+r}-1=q^r(q^{Am}-1)+(q^r-1). The first term is divisible by q^m-1 by considering geometric series. That means q^m-1 divides q^r-1 which is smaller. Contradiction.

For the second part: Recall that \Phi_n(x) is the cyclotomic polynomial with roots \zeta_n^a and (a,n)=1. This excludes m. So x^n-1 (whose roots are all powers of \zeta_n) is divisible by \Phi_n(x).


Step 5: Prove that step 3 and 4 imply that \Phi_n(q) divides q-1. Prove that |q-\zeta|>q-1 (complex absolute value) for any root of unity \zeta \neq 1. Conclude that n=1, i.e. that D=Z is a field, i.e. commutative.

Look at the class equation from step 3. \Phi_n(q) divides LHS as well as every term in the summation. Hence it divides q-1.

Since q is some integer, it lies on the real line and its closest point on unit circle is 1. The inequality is clear. So |\Phi_n(q)| cannot divide |q-1| unless n=1.


Key ideas: class equation, cyclotomic polynomial, “small cannot divide big” for integers

Next Page »

Follow

Get every new post delivered to your Inbox.