Squares and factoring

Cryptography can be very useful, but our goal as mathematicians is to solve problems. So it is to find ways to break codes, not to provide them. For example, one big goal is to find efficient ways to decompose numbers into prime factors. The other, finding primes large enough to form $N=pq$, will be very useful but of no mathematical interest.

Gauss considered that the honour of mathematics required us to solve such an elegant problem as factoring. With his methods he could factor numbers of 7, 8 or even more digits. In 1970 the limit was 20 digits, with the Brillhart-Morrison continued fraction method it jumped to 50 digits in 1980. In 1994 the Pomerance quadratic sieve method doubled this amount (100-digit numbers). Two important methods still emerged, Pollard’s algebraic body sieve and Lenstra’s method using elliptic curves, each with its own peculiarities.

The factorisation problem is in the class $\textbf{NP}$ (if we know the factorisation it is easy to check), but it does not seem to be $\textbf{NP}-\text{complete}$ (a fast factorisation algorithm does not imply the solution of any other problem in the class $\textbf{NP}$). There is a fair consensus that it is a hard problem, but the above developments raise doubts that perhaps there is a magic solution! Just as the current methods would seem magical to Gauss.

Fermat and his squares

Fermat was trying to put $n$ as the difference of two squares $$n=a^2-b^2=(a+b)(a-b).$$ If we want to factor 1927 we can notice that the year of the war is a perfect square $1936=44^2$. $$1927=1936-9=(44+3)(44-3)=47\times41.$$ The method is general, if $n$ is odd and has two factors $n=\alpha\beta$, then $$n=\bigl(\frac12(\alpha+\beta)\bigr)^2- \bigl(\frac12(\alpha-\beta)\bigr)^2,$$ but in practice the method is very limited.

Maurice Kraitchik in 1920 had an idea that improved Fermat’s method substantially. He realised that it was enough to find a difference of squares $a^2-b^2$ divisible by $n$. In other words, such that $a^2\equiv b^2\bmod n$. In this case $n$ divides $(a+b)(a-b)$ then there is some probability, if $n$ is composite, that $n$ does not divide $(a+b)$ nor $(a-b)$. Some of the factors of $n$ are in $a+b$ and some are in $a-b$ and the greatest common divisor $\mathop{\rm mcd}(n,a-b)$ will be a proper divisor of $n$.

Most of the methods we have considered before follow this route, finding numbers $a$ and $b$ satisfying $a^2\equiv b^2\bmod n$. You end up applying Euclid’s algorithm to obtain the greatest common divisor.

Kraitchik had yet another important idea, which is still used in almost all later methods. Although it does not seem easy to find two squares whose difference is divisible by $n$, it is easy to take a number $b$, square it and divide by $n$ to obtain a remainder $r$ (relatively small) and such that $r\equiv b^2\bmod n$. This doesn’t seem to do any good, until one realises that if I have a lot of these $r_j\equiv b_j^2\bmod n$ whose prime factors are small, we could form a product of them $r_1r_2\cdots r_k=a^2$ which is a square, then

$$r_1r_2\cdots r_k=a^2\equiv (b_1b_2\cdots b_k)^2\bmod n.$$

An example of Kraitchik’s method

With $n=21449$ we have $147^2=21609$, so we calculate

\begin{align*}147^2-n&\equiv 2^5\cdot 5 \bmod{n}\\148^2-n&\equiv 5\cdot7\cdot 13\bmod{n}\\151^2-n&\equiv 2^3\cdot 13^2\bmod{n}\\153^2-n&\equiv 2^3\cdot 5\cdot7^2\bmod{n}\\\end{align*}

Where we have chosen 145, 148, 151 because their remainders are factored with small numbers. We can then note that

$$(147\cdot153)^2=(2^5\cdot 5)\cdot(2^3\cdot 5\cdot7^2)\equiv (2^4\cdot 5\cdot 7)^2 \bmod 21449.$$

Let’s see if we have obtained a factor, calculating the greatest common divisor of $a-b$ $$\mathop{\rm gcd}(147\cdot153-2^4\cdot 5\cdot 7, 21449)=\mathop{\rm mcd}(21931,21449)=241.$$ Which is effectively a proper divisor and therefore we have the factorisation

Improvements to the method

One of the surprising effects of trying to do something explicitly, let’s say when you are programming, is the fact that when you try to do it, slight modifications of the method become evident. And yes, they are apparently slight modifications of the method, but they have a big impact on the result.

For example, we want to find $a$ such that $a^2-n\bmod n$ is factored with small primes. Clearly we want $a^2$ to be close to $n$. We can also use $a$ numbers with $a^2<n$ as long as we write $-1$ as if it were a prime factor. This simple observation doubles the number of convenient $a$.

In 1931 D. H. Lehmer and R. E. Powers found another way to find small remainders. Instead of using the polynomial $Q(x)=x^2-n$ they consider rational approximations to $\frac{a}{b}\approx\sqrt{n}$. It is to be expected that in that case $a^2-b^2n$ will be small and $a^2\equiv a^2-b^2n\bmod n$. There is a very important theory for finding good approximations to an irrational number: continued fractions. This way you get many more small numbers congruent to squares. So it is an obvious improvement on Kraitchik’s method.

Linear algebra

The weak point of the method is that one finds smooth numbers (that is with small prime factors) that are congruent modulo $n$ with a square. But then you have to combine them to get a square. Like when we saw earlier that $(2^5\cdot 5)(2^3\cdot5\cdot7^2)$ was a square. When the numbers are really big the question can be much more difficult. We can be talking about tens of numbers with prime factors all $\le 100$. How do we figure out which ones we have to multiply to get a square?

The answer to this problem was given in 1975 by M. A. Morrison and J. Brillhart. Assuming that the numbers are of the form $$(-1)^a 2^{r_1}3^{r_2}\cdots p_{k-1}^{r_{k-1}}$$ the important information is the parity of $(a,r_1,r_2,\dots, r_{k-1})$. We can consider an array of $0$ and $1$’s recording the parities of each of our numbers $$\begin{array}{c|ccccccc}& 2 & 3 & 5 & 7 & 11 & 13\\ \hline 147^2-n & 1 & & 1 & & & \\148^2-n & & & 1 & 1 & & 1\\151^2-n & 1 \\153^2-n & 1 & & 1\end{array}$$ Our problem is to find several rows whose sum has all its coordinates even. We can consider each row as a vector $\mathbf{v}$ in a vector space $\mathbf{F}_2^{k}$ on the field with only two elements. What we want to do is to find a set of these vectors $\mathbf{v_1} \dots \mathbf{v_\ell}$ that add up to $0$. We can apply the methods of linear algebra to quickly find this subset. In fact, if we have $k+1$ numbers with this factoring, the general theorems tell us that these vectors are linearly dependent and, therefore, we can form a square with them.

Therefore Morrison and Brillhart fix a number $B$ and consider only the prime numbers $\le B$. Given $n$ now the difficulty is how to choose the number $B$. If we take it too small there will be few numbers $a^2\bmod n$ that factor with the primes $p\le B$. If we take it too large the final linear algebra problem becomes more complicated. The advance we want to highlight today relates to this problem and its theoretical study.

The complexity of factoring

With the arrival of computers, the problem of studying the difficulty of problems has arisen. Given a large $n$ number, how long does it take to factor it. This depends on the method used. We want to consider the current methods.

Richard Schroeppel towards the end of the 1970s and in private correspondence suggests considering the numbers $Q(a)=a^2-n$ or the $Q_j$ of the continued fraction method as random. They are not really random, we have obtained them very carefully so that they are congruent to a square modulo $n$. The point is that from the point of view of obtaining a square with them, we can consider them random. Let us call a number $B$-smooth if it is decomposed into prime factors $\le B$. The first question is then, what is the probability that a random number in the interval $[1, X]$ is $B$-smooth? The answer is $\frac{\psi(X,B)}{X}$, where $\psi(X,B)$ is the number of $B$-smooth numbers $\le X$.

If we take $\le X$ numbers at random, the expected number of attempts before finding a $B$-smooth number will be the inverse of this probability $\frac{X}{\psi(X,B)}$. But by the Morrison-Brillhart method we need the order of $\pi(B)$ (the number of primes $\le B$) to obtain a set of linearly dependent vectors. So the total number of attempts to get a square will be $\frac{X\pi(B)}{\psi(X,B)}$. Finally, how much work does it take to check that a given number is $B$-smooth? If what we do is to divide by the small primes, the job is approximately $\pi(B)$ divisions. Then at the end the number of steps of the factorization of the number, choosing $B$ as the base of the primes is $$\frac{X\pi(B)^2}{\psi(X,B)}$$

Thus we have a job for Analytic Number Theory: Given $X$, how can we choose $B$ so that this quantity is minimum? What is the value of this minimum?

The answer is that $B$ must be of order $\exp(\frac12\sqrt{\log X\log\log X})$ and the value of the minimum is approximately $\exp(2\sqrt{\log X\log\log X})$. To factor $n$, the numbers we try are of the order $X=\sqrt{n}$. Therefore the method factors $n$ in $\exp(\sqrt{2\log n\log\log n})$ steps.

This is no more than a (well-founded) guess, the problems are that the numbers we use are not random and that it could happen that we find squares that fail in the final greatest common divisor step. This argument, however, is very sound as we will see in the rest of the entry.

The quadratic sieve method

According to Carl Pomerance, Schroeppel’s reasoning led him to improve the method in a spectacular way. He replaced one of the steps of the method with a much better alternative.

He discovered that in the above argument the process of checking whether a number is $B$-smooth has a large effect on the result. So in 1981, he thought he could use a sieve method, similar to Eratosthenes’ sieve to find the primes.

We want to find $Q(x)=x^2-n$ numbers that are $B$-smooth. We make a table of the numbers. For each $p\le B$ we see that $Q(x)$ is divisible by $p$ if $x\equiv a$ or if $x\equiv b$ for two particular numbers $a$ and $b$ that we can easily calculate. Then in the table it is easy to locate those numbers and divide them by $p$. When we finish with all the primes the positions where a $1$ appears, it is because the corresponding number $Q(x)$ is $B$-smooth. The process is much faster than dividing. With the previous procedure most of the time we divide to find that the number at the end is not divisible or is not $B$-smooth.

With this method the number of operations goes from $\exp(\sqrt{2\log n\log\log n})$ to $\exp(\sqrt{\log n\log\log n})$. This came about as an idea of Pomerance’s, but when it was implemented the result was spectacular, factoring 100-digit numbers.

There are other methods that have improved on the quadratic sieve, but what we have seen is sufficient to see the importance of the result we are discussing.

Generation of squares

We have seen that many of the methods for factoring $n$ are based on the idea of generating numbers $a_i\equiv b_i^2\bmod n$ and then finding subsets of the $a_i$ whose product $\prod_{i\in I}a_i\equiv z^2\bmod n$. So that $z^2\equiv w^2:= \prod_{i\in I}b_i^2\bmod n$ with which we have candidates for factors of $n$ the numbers $\mathop{gcd}(z-w,n)$.

The idea suggested by Schroeppel is to imagine the $a_i$ taken at random from a set of $[1,X]$ numbers. At the 1994 International Congress Pomerance posed the following question: consider $[1,X]$ as a probability space (each number with the same probability) and given a succession of independent variables $a_i$ in that space consider the random variable $$T(X):=\min\{ N\in \mathbf{N}\colon \exists\ I\subset [1,X] \text{ such that }\prod_{i\in I} a_i \text{ is a square }\}$$ Pomerance proved that for all $\varepsilon>0$ one has with high probability that $$\exp((1-\varepsilon)\sqrt{2\log X\log\log X})\le T(X)\le \exp((1+\varepsilon)\sqrt{2\log X\log\log X}).$$ and conjectured that $T(X)$ has a very sharp jump in the sense that there exists a function $f(X)$ with the property that with high probability $$(1-\varepsilon)f(X)\le T(X)\le (1+\varepsilon)f(X).$$ That is, if we take more than $f(X)$ samples $a_i$ with high probability we will obtain a square (which we can extract by means of linear algebra).

In 2012 E. Croot, A. Granville, R. Pemantle and P. Tetali almost succeeded in proving the existence of the jump. They proved that with high probability $$\frac{\pi}{4}(e^{-\gamma}-\varepsilon)J(x)\le T(x)\le (e^{-\gamma}+\varepsilon)J(x)$$ where $$J(X)=\min_{2\le B\le X}\frac{\pi(B)X}{\psi(X,B)}.$$ In other words, except for the factor $\pi/4$ they had found the function $J(X)$ that determines the position of the jump.

This post is focused on a paper published this year in Annals of Mathematics: P. Balister, B. Bollobás, R. Morris, have proved that indeed with high probability one has $$(e^{-\gamma}-\varepsilon)J(X)\le T(X)\le (e^{-\gamma}+\varepsilon)J(X)\qquad (1).$$

The article is about 100 pages long. The proof is very technical and combines techniques from combinatorics, probability and analytic number theory. This makes difficult to explain the proof. Instead I will try to explain the ideas that apply. This is an example of how mathematics solving seemingly unconnected problems creates tools that are then unexpectedly combined to solve new problems. These ideas that are used in the proof of (1) are:

Analytic number theory

The function $\psi(X,B)$ that we have used above represents the number of natural numbers $n\le x$ that are $B$-smooth, i.e. that are not divisible by any prime $p>B$. Its study began in 1930 when Dickman proved $$\lim_{x\to\infty}\frac{\psi(x,x^{1/u})}{x}=\rho(u),$$ where $\rho(u)$ is the unique solution of the delay differential equation $$u\rho'(u)+\rho(u-1)=0.$$ The function $\rho(u)$ is known today as the Dickman-de Bruijn function. The proof of (1) makes use of numerous results on $\psi(x,y)$ and $\rho(u)$ that have been published since the 1980s.

Differential equations and the self-correcting martingale method

Many processes that are truly discrete are modelled by differential equations. For example, the differential equation $$\frac{d}{dt}M=\lambda M$$ is used to describe population growth or radioactive decay. Many of these processes can also be described by a Markov process. For example in the above case by a random process $X(t)$ tal que $$\mathbb{E}(X(t))=X(0)e^{\lambda t}.$$

Both treatments are related. The values of the process are concentrated near the solution of the differential equation. More precisely, if we have a succession of initial values of the process $X_n(0)$ such that $\lim_n X_n(0)/n=M(0)$, then for all $\varepsilon>0$, we have $$\lim_{n\to\infty} \mathbb{P}\Bigl\{\sup_{s\le t}\Bigl|\frac{X_n(s)}{n}-M(s)\Bigr|>\varepsilon\Bigr\}=0.$$

In 1970 T. G. Kurtz published a paper extending this result to many other differential equations. This work, and above all, the techniques he employs, is the germ of the method known today as the self-correcting martingale method. And it is the main ingredient in the proof of (1).

The three protagonists

Learn more

The reason for this entry is the article:

R. Balister, B. Bollobás, R. Morris, The sharp threshold for making squares, Ann. Math. 188 (2018) 49-143.

as we have said it is quite technical. We can see an exposition by one of its authors (Robert Morris) at the Alan Turing Institute, which can be found on Youtube.

I have based much of the entry on the article by Carl Pomerance, one of the main characters in the story. It is certainly an easy and helpful read. The journal Notices of the American Mathematical Society is free and every month brings interesting news about Mathematics.

C. Pomerance, A Tale of Two Sieves, Notices Amer. Math. Soc. 43 nº 12 (1996) 1473-1485.

Finally, we have also used Kurtz’s article on differential equations

T. G. Kurtz, Solutions of Ordinary differential equations as limits of Pure Jump Markov Processes, J. Appl. Probability, 7 (1970) 49–58.