# Notes on “An Introduction to Mathematical Cryptography”

## Chapter 1: An Introduction to Cryptography

Origin of the word Cryptography:

Cryptography, the methodology of concealing the content of messages, comes from the Greek root words kryptos, meaning hidden, and graphikos, meaning writing. The modern scientific study of cryptography is sometimes referred to as cryptology.

### Cryptanalysis of Simple Substitution Ciphers

Cryptanalysis explaination:

The process of decrypting a message without knowing the underlying key is called cryptanalysis.

Powerful tools against simple Substitution Cipher: Frequency Analysis.

### Divisibility and Greatest Common Divisors

The Euclidean Algorithm: The division step is executed at most $2\log_2(b) + 2$ times. It has been proven that the Euclidean algorithm takes no more than $1.45\log_2(b) + 1.68$ iterations, and that the average number of iterations for randomly chosen $a$ and $b$ is approximately $0.85\log_2(b) + 0.14$.

Extended Euclidean Algorithm: 用来求二元一次方程$au + bv = gcd(a, b)$的整数解。

### Modular Arithmetic

The theory of congurences is a powerful method in number theory that is based on the simple idea of clock arithmetic.

The ring of integers modulo m: $$\mathbb{Z}/m\mathbb{Z} = {0, 1, 2, …, m-1}$$

Numbers that have inverses are called units.

$$(\mathbb{Z}/m\mathbb{Z})* = {a \in \mathbb{Z}/m\mathbb{Z}: gcd(a, m) = 1}$$

The set $(\mathbb{Z}/m\mathbb{Z})*$ is called the group of units modulo m.

Euler’s phi function (Euler’s totient function): $$\phi(m) = #(\mathbb{Z}/m\mathbb{Z})* = #{0 \le a < m: gcd(a,m)=1}$$

The Fast Powering Algorithm == Fast Exponentiation Algorithm == Square and Mutilply Algorithm 快速幂算法

### Prime Numbers, Unique Factorization, and Finite Fields

The Fundamental Theorem of Arithmetic: $$a = p_1^{e_1}\cdot p_2^{e_2}\cdot p_3^{e_3}\cdot \cdot \cdot p_r^{e_r}$$

We denote this power by $ord_p(a)$ and call it the order (or exponent) of p in a. View $ord_p$ as a function, called valuation. $$ord_p: {1, 2, 3, …} \longrightarrow {0, 1, 2, 3, … }$$

Field:

Finite fields are of fundamental importance throughout cryptography, and indeed throught all of mathematics.

Finite fields == Galois fields. $\mathbb{F}_p == GF(p)$.

1. Fermat’s little Theorem
2. Extended Euclidean algorithm

Primitive Root Theorem:

Primitive roots of $\mathbb{F}p$ or generators of $\mathbb{F}{p}^{}$ are the elements of $\mathbb{F}_{p}^{}$ having order p-1.

### Cryptography Before the Computer Age

Caesar $\longrightarrow$ 单表替换 $\longrightarrow$ Vignere(简单的多表替换)$\longrightarrow$ Enigma(复杂的多表替换)

### Symmetric Ciphers

$$e: \mathcal{K} \times \mathcal{M} \rightarrow \mathcal{C}$$

$$d: \mathcal{K} \times \mathcal{C} \rightarrow \mathcal{M}$$

Kerckhoff’s principle: The security of a cryptosystem should depend only on the secrecy of the key, and not on the secrecy of the encryption algorithm itself.

If ($\mathcal{K}$, $\mathcal{M}$, $\mathcal{C}$, $e$, $d$) is to be a successful cipher, it must have the following properties:

1. For any key $k \in \mathcal{K}$ and plaintext $m \in \mathcal{M}$, it must be easy to compute the ciphertext $e_k(m)$.
2. For any key $k \in \mathcal{K}$ and ciphertext $c \in \mathcal{C}$, it must be easy to compute the plaintext $d_k(C)$.
3. Given one or more ciphertexts $c_1, c_2, …, c_n \in \mathcal{C}$ encrypted using the key $k \in \mathcal{K}$, it must be very difficult to compute any of the corresponding plaintexts $d_{k}(c_1), …, d_{k}(c_n)$ without knowledge of $k$.
4. Given one or more pairs of plaintexts and their corresponding ciphertexts, ($m_1, c_1$), ($m_2, c_2$), …, ($m_n, c_n$), it must be very difficult to decrypt any ciphertext $c$ that is not in the given list without knowing $k$. This property is called security against a known plaintext attack.
5. For any list of plaintexts $m_1, …, m_n \in \mathcal{M}$ chosen by the adversary, even with knowledge of the corresponding ciphertexts $e_k(m_1), …, e_k(m_n)$, it is very difficult to decrypt any ciphertexts $c$ that is not in the given list without knowing $k$. This is known as security against a chosen plaintext attack. N.B. In this attack, the adversary is allowed to choose $m_1, …, m_n$, as opposed to a known plaintext attack, where the attacker is given a list of plaintext/ciphertext pairs not of his choosing.

Encoding Schemes: 编码方案 != 加密方案 两者目的不同。编码方案只是为了将数据从一种类型转化为另一种类型，应当被公众知晓。而加密方案则是为了隐藏信息，为了保证信息的机密性。

Symmetric Encryption of Encoded Blocks: 字节序与整数之间的一个转化。

exhaustive search attack(brute-force attack), meet-in-the-middle(collison) attacks

Examples of Symmetric Ciphers: 模(质数)运算下的乘法, 模运算下的加法(shift or Caesar cipher), 模运算下的线性运算(affine cipher), 模运算下的矩阵运算(Hill cipher), 异或运算(stream cipher)。

Random Bit Sequences and Symmetric Cipher: 伪随机数的一些理论上的东西

Asymmetric Ciphers Make a First Appearance: 对称密码算法中，如何交换密钥key? $\longrightarrow$ *public key (or asymmetric) cryptography.

Elgamal, RSA, Goldwassser-Micali, ECC, GGH and NTRU.

## Chapter 2: Discrete Logarithms and Diffie-Hellman

### The Birth of Public Key Cryptography

An extremely important event: in 1976, Whitfield Diffie and Martin Hellman published their now famous paper entitled “New Directions in Cryptography.”

We stand today on the brink of a revolution in cryptography.

Public key encryption cryptosystem 的概念第一次被提出。

Two years later, two major papers describing the PKC were published:

• the RSA scheme of Rivest, Shamir, and Adleman
• the knapsack scheme of Merkle and Hellam (broken)

### The Discrete Logarithm Problem

The discrete logarithm problem is a mathematical problem that arises in many settings, including the mod $p$ version and the elliptic curve version.

$$\mathbb{F}_p == \mathbb{Z}/p\mathbb{Z}.$$

$\mathbb{F}_p$ is a field with a prime number of elements.

$\mathbb{F}_p^{*}$ is the list of elements $1, g, g^2, g^3, …, g^{p-2}$ where $g$ is the generator (or primitive element) of the group.

Definition of the DLP: Let $g$ be a primitive root for $\mathbb{F}_p$. The Discrete Logarithm Problem (DLP) is the problem of finding an exponent $x$ such that $$g^x \equiv h \quad (\text{mod}\ p).$$

The number $x$ is called the discrete logarithm of $h$ to the base $g$ and is denoted by $\log_g(h)$.

Powers $627^i\ \text{mod}\ 941\ \text{for}\ i = 1,2,3,…$

Generalized definition of the DLP: Let $G$ be a group whose group law we denote by the symbol $*$. The Discrete Logarithm Problem for $G$ is to determine, for any two given elements $g$ and $h$ in $G$, an integer $x$ satisfying $$\begin{matrix} \underbrace{g * g * g * … * g} \newline \text{totally}\ x \ \text{times} \end{matrix} = h.$$

### The Elgamal Public Key Cryptosystem

One probabilistic encryption scheme.

### An Overview of the Theory of Groups

Definition of group

A group consists of a set $G$ and a rule, which we denote by $$, for combining two elements a, b \in G to obtain an element a * b \in G. The composition operation$$ is required to have the following three properties: [Identity Law] There exists an $e \in G$ such that $e * a = a * e = a \quad \text{for every } a \in G$. [Inverse Law] For every $a \in G$ there is a (unique) $a^{-1} \in G$ satisfying $a * a^{-1} = a^{-1} * a = e$. [Associative Law] $a * (b * c) = (a * b) * c$ for all $a, b, c \in G$.

If, in addition, composition satisfies the [Commutative Law] $a * b = b * a$ for all $a, b \in G$, then the group is called a commutative group or an abelian group.

order: the smallest integer $d$ such that $a^d = e$.

Two important properties of the orders of group elements:

• Let $G$ be a finite group. Then every element of $G$ has finite order. Further, if $a \in G$ has order $d$ and if $a^k = e$, then $d|k$.
• (Lagrange’s Theorem). Let $G$ be a finite group and let $a \in G$. Then the order of $a$ divides the order $G$.

### How Hard Is the Discrete Logarithm Problem?

How can we quantify “hard”?

A natural measure of hardness is the approximate number of operations necesssary for a person or a computer to solve the problem using the most efficient method currently known.

Definition of Order Notation:

Let $f(x)$ and $g(x)$ be functions of $x$ taking values that are positive. We say that “$f$ is big-$\mathcal{O}$ of $g$” and write $$f(x) = \mathcal{O}(g(x))$$ if there are positive constants $c$ and $C$ such that $$f(x) \le c g(x) \quad \text{for all}\ x \ge C.$$ In particular, we write $f(x) = \mathcal{O}(1)$ if $f(x)$ is bounded for all $x \ge C$.

If the limit $$\lim_{x \rightarrow \infty}{\frac{f(x)}{g(x)}}$$ exists (and is finite), the $f(x) = \mathcal{O}(g(x))$.

Order notation allows us to define several fundamental concepts that are used to get a rough handle on the computational complexity of mathematical problems.

Suppose that there is a constant $a \ge 0$, independent of the size of the input, such that if the input is $\mathcal{O}(k)$bits long, then

• polynomial time (easy): there is a constant $A \ge 0$, such that it takes $\mathcal{O}(k^A)$ steps to solve the problem.

• linear time: it takes $\mathcal{O}(k^1)$ steps to solve the problem, i.e., $A = 1$.

• quadratic time: it takes $\mathcal{O}(k^2)$ steps to solve the problem, i.e., $A = 2$.

• exponential time (hard): there is a constant $c \gt 0$, such that it takes $\mathcal{O}(e^{ck})$ steps to solve the problem.

• subexponential time: for every $\epsilon \gt 0$, they solve the problem in $\mathcal{O}_{\epsilon}(e^{\epsilon k})$ steps.

Method to solve DLP in $\mathbb{F}_p^{*}$:

• Trial-and-error method: $\mathcal{O}(p) = \mathcal{O}(2^k)$, which takes expoential time.
• Pohlig-Hellman algorithm: $p-1$ is smooth, then quite easy.
• Shanks’s Babystep-Giantstep algorithm: $\mathcal{O}(\sqrt{N}\cdot \log{N})$ + $\mathcal{O}(\sqrt{N})$ storage.
• Index Calculus algorithm: $\mathcal{O}(e^{c \sqrt{(\log_p)(\log{log_p})}})$, which is a subexponential algorithm.

Method to solve DLP in $\mathbb{F}_p$: $$x\cdot g \equiv h \quad (\text{mod}\ p)$$

• Extended Euclidean algorithm: $\mathcal{O}(\log{p})$, which is a linear-time algorithm.

The discrete logarithm problems in different groups may display levels of difficulty for their solutions.

Another sort of group called the elliptic curve: the best known algorithm to solve the DLP requires $\mathcal{O}(\sqrt{N})$ steps. Thus it takes exponetial time to solve the elliptic curve discrete logarithm problem (ECDLP).

### A Collision Algorithm for the DLP

In this section we describe a discrete logarithm algorithm due to Shanks. It is an example of a collision, or meet-in-the-middle, algorithm.

Shanks’s algorithm works in any group, not just $\mathbb{F}_p^*$.

The idea behind a collision algorithm is to make two lists and look for an element that appears in both lists.

[https://github.com/soreatu/Cryptography/blob/master/Shank’s%20Babystep-Giantstep%20Algorithm.py](https://github.com/soreatu/Cryptography/blob/master/Shank's Babystep-Giantstep Algorithm.py)

### The Chinese Remainder Theorem

Number Theory里学过，不多说。

from paper

### The Pohlig-Hellman Algorithm

$$g^x \equiv h \pmod{p}.$$

The Pohlig-Hellman algorithm thus tells us that the discrete logarithm problem in a group $G$ is not secure if the order of the group is a product of small primes.

We now explain the algorithm that reduces the discrete logarithm problem for elements of prime power order to the discrete logarithm problem for elements of prime order.

The idea is simple: if $g$ has order $q^e$, then $g^{q^{e-1}}$ has order $q$. The trick is to repeat this process several times and then assemble the information into the final answer.

$$g^x \equiv h \pmod{p}$$

### Rings, Quotient Rings, Polynomial Rings, and Finite Fields

Groups are fundamental objects that appear in many areas ofmathematics. A group $G$ is a set and an operation that allows us to “multiply” two elements to obtain a third element.

Another fundamental objects in mathematics, called a ring, is a set having two operations. These two operations are analogous to ordinary addition and multiplication, and they are linked by the distribution law.

#### An Overview of The Theory of Rings

• $\mathbb{Q}$: field
• $\mathbb{Z}$: ring
• $\mathbb{Z}/n\mathbb{Z}$: ring, and field if $n$ is a prime
• $\mathbb{F}_p$: field
• $\mathbb{Z}$: ring
• If $R$ is any ring, we can form a ring of polynomials whose coefficients are taken from the ring $R$.

#### Divisibility and Quotient Rings

The concept of divisibility, originally introduced for the integers $\mathbb{Z}$ can be generalized to any ring.

Elements that have multiplicative inverses and elements that have only trivial factorizations are special elements of a ring.

#### Polynomial Rings and the Euclidean Algorithm

Especially important are those polynomial rings in which the ring $R$ is a field.

One reason why it is so useful to take R to be a field $\mathbb{F}$ is because virtually all of the properties of $\mathbb{Z}$ that we proved are also true for the polynomial ring $\mathbb{F}[x]$.

It is not hard to see that the units in a polynomial ring $\mathbb{F}[x]$ are precisely the nonzero constant polynomials, i.e., the nonzero elements of $\mathbb{F}$.

#### Quotients of Polynomial Rings and Finite Fields of Prime Power Order

$\mathbb{F}_p$是一个有$p$个元素的finite field，如果我们尝试进行$\mathbb{F}_p[x]/m$，其中$m$的度数为$d$，那么就能构建出一个拥有$p^d$（prime order）个元素的finite field。

If $\mathbb{F}$ is a finite field, then $\mathbb{F}$ has $p^d$ elements for some prime $p$ and some $d \ge 1$.

## Chapter 3: Integer Factorization and RSA

### Euler’s Formula and Roots Modulo $pq$

$$\text{RSA} : x^e \equiv c \pmod{pq}$$

DLP是计算指数的困难，而RSA则是计算底数的困难。

The security of RSA relies on the assumption that it is difficult to take $e$th roots modulo $N=pq$.

If we know how to factor $N$, then it is again easy to compute $e$th roots.

We can often make the computation faster by using a smaller value of $d$.

Let $g = \gcd(p-1, q-1)$ and suppose that we solve the following congruence for $d$: $$d e \equiv 1 \pmod{ \frac{(p-1)(q-1)}{g}}$$ Euler’s formula says that $a^{(p-1)(q-1)/g} \equiv 1 \pmod{pq}$.

Hence, $$(c^d)^e = c^{de} = c^{1 + k(p-1)(q-1)/g} = c \cdot (c^{(p-1)(q-1)/g})^k \equiv c \pmod{pq}$$

$\frac{(p-1)(q-1)}{g}$实际上就是$\text{lcm}(p-1, q-1)$。

### Factorization via Difference of Squares

The most powerful factorization methods known today rely on one of the simplest identities in all of mathematics.

$N$的倍数在模$N$的情况下，同余0。因此，去寻找$kN = a^2 - b^2$，也就是去寻找 $$a^2 \equiv b^2 \quad (\text{mod}\ N).\tag{3-1}$$ 这样就可以用模运算来阐明我们的目标。

This procedure, in one form or another, underlies most modern methods of factorization.

### Smooth Numbers, Sieves, and Building Relations for Factorization

Smooth Numbers那边没怎么看懂。。

Definidtion. An integer $n$ is called B-smooth if all of its prime factors are less than or equal to B.

polynomial < subexponential < exponential

Proposition. Let $L(X) = e^{\sqrt{(\ln{X})(\ln{\ln{X}})}}$, let $N$ be a large integer, and set $B = {L(N)}^{\frac{1}{\sqrt{2}}}$.

(a) We expect to check approximately ${L(N)}^{\frac{1}{\sqrt{2}}}$ random numbers modulo $N$ in order to find $\pi(B)$ numbers that are B-smooth.

(b) We expect to check approximately ${L(N)}^{\frac{1}{\sqrt{2}}}$ random numbers of the form $a^2 (\text{mod}\ N)$ in order to find enough B-smooth numbers to factor $N$.

Hence the factorization procedure described above should have a *subexponential running time.

Two sieves:

• Number field sieve

How can we efficiently find many numbers $a > \sqrt{N}$ such that each $a^2\ (\text{mod}\ N)$ is B-smooth?

Allow slightly larger values of $a$ and use an efficient cancellation process called a sieve to simultaneously create a large number of values $a^2 \ (\text{mod}\ N)$ that are B-smooth.

Pomerance’s quadratic sieve: still the fastest known method for factoring large numbers $N = pq$ up to about $2^{350}$.

For numbers considerably larger than this, say larger than $2^{450}$, the more comlicated number field sieve holds the world record for quickest factorization.

Number Field Sieve

Theorem. Under some reasonable assumptions, the expected running time of the number field sieve to factor the number $N$ is $L_{1/3}(N)^c$ for a small value of $c$.

### The Index Calculus Method for Computing Discrete Logarithms in $\mathbb{F}_p$

The indes calculus is a method for solving the discrete logarithm problem in a finite field $\mathbb{F}_p$.

In any case, the index calculus is a subexponential algorithm for solving the discrete logarithm problem in $\mathbb{F}_p$.

If $g$ is a primitive root modulo $p$, then g^m\ \text{is a} \left{ \begin{aligned} &\text{quadratic residue} &\text{if}\ m\ \text{is even},\newline &\text{quadratic nonresidue} &\text{if}\ m\ \text{is odd}. \end{aligned} \right. 二次剩余，二次非剩余；Legendre symbol；二次互反律；Jacobi symbol.

Case 1: $(\frac{a}{p}) = (\frac{a}{q}) = 1$. $a$是一个模$pq$的平方.

Case 2: $(\frac{a}{p}) = (\frac{a}{q}) = -1$. $a$不是一个模$pq$的平方.

### Probabilistic Encryption and the Goldwasser-Micali Cryptosystem

Base on the idea:

Let $p$ and $q$ be (secret) prime numbers and let $N = pq$ be given. For a given integer $a$, determine whether $a$ is a square modulo $N$, i.e., determine whether there exists an integer $u$ satisfying $u^2 \equiv a \pmod{N}.$

Note that Bob, who knows how to factor $N = p q$, is able to solve this problem very easily, since $$a \ \text{is a square modulo } pq \quad \text{if and only if}\quad (\frac{a}{p}) = 1 \quad \text{and}\quad (\frac{a}{q}) = 1$$ Eve, on the other hand, has a harder time, since she knows only he value of N. She can compute $(\frac{a}{N})$, but cannot tell whether $a$ is a square modulo $N$.

Note: The Goldwasswer-Micali public key cryptosystem has a message expansion ratio of 1000, hence impractical.

## Chapter 7: Lattices and Cryptography

Public key cryptosystems exclusively base on the difficulty of:

1. factoring large numbers,
2. finding discrete logarithms in a finite group.

A new type of hard problem arising in the theory of lattices can be uesd as the basis for a public key cryptosystem.

• Faster encryption/decryption
• quantum resistence

### A Congruential Public Key Cryptosystem

A toy model of a real public key cryptosystem(NTRU).

2维的NTRU。

Due to Gauss, there is an extremely rapid method for finding short vectors in two-dimensional lattices.

CTF Challenge:

• 2019巅峰极客线上赛——NTURE

wp：https://xz.aliyun.com/t/7163

### Subset-Sum Problems and Knapsack Cryptosystems

The Subset-Sum Problem

Suppose that you are given a list of positive integers ($M_1, M_2, …, M_n$) and another integer $S$.

Find a subset of the elements in the list whose sum is $S$.

$n$个元素的集合，有$2^n$个子集。要找到一个确切的满足条件的子集，很难。

n很大的话，想要求解一个子集和问题是很困难的。

Definition.

A superincreasing sequence of integers is a list of positive integers $\mathbf{r} = (r_1, r_2, \dots, r_n)$ with the property that $$r_{i+1} \ge 2r_i \quad \text{for all}\ 1 \le i \le n-1.$$

Let $\mathbf{r} = (r_1, r_2, \dots, r_n)$ be a superincreasing sequence. Then $$r_k > r_{k-1} + \dots + r_2 + r_1 \quad \text{for all}\ 2\le k \le n.$$

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  def solve(M, S): n = len(M) x = [0]*n i = n - 1 while i >= 0: if S >= M[i]: x[i] = 1 S -= M[i] i -= 1 return x M = (3, 11, 24, 50, 115) S = 142 print(solve(M, S)) # [1, 0, 1, 0, 1] 

Merkle和Hellman由此提出了一个基于超增序列的subset-sum problem的public key cryptosystem。

Cryptosystems based on disguised subset-sum problem are known as subset-sum cryptosystems or knapsack cryptosystems. The general idea is to start with a secret superincreasing sequence, disguise it using secret modular linear operations, and publish the disguised sequence as the public key.

If $r_1$ is too small, then there are easy attacks, so we must insist that $r_1 > 2^n$. $$r_n > 2r_{n-1} > 4r_{n-2} > \dots > 2^nr_1 > 2^{2n}.$$

$$M_i = O(2^{2n}), \quad S = O(2^{2n})$$

n = 160 —> 51200bits

Attacks:

• The first attack, by Shamir, Odlyzko, Lagarias and others, used various ad hoc methods.
• After the publication of LLL paper in 1985, it became clear that knapsack-based cryptosystems have a fundamental weakness.

Suppose S as a subset-sum from the set $\mathbf{M} = (m_1, \dots, m_n)$.

• LLL
• LLL-BKZ (LLL variants)

### A Brief Review of Vector Space

Definition:

。。。定义太多了。

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  # sage def GramSchmidtAlgorithm(M): ''' Given a basis for a vector space $V \in R^m$, creates an orthogonal basis for $V$. INPUT: * "M" -- a matrix. OUTPUT: * "G" -- a matrix consisting of vectors that are pairwise orthogonal. NOTE: Better way to do this -- sage: G, _ = M.gram_schmidt() ''' n = M.nrows() G = matrix(RR, n) for i in range(n): vi = M[i] for j in range(i): mu = (M[i]*G[j]) / G[j].norm()^2 vi -= mu * G[j] G[i] = vi return G 

### Lattice: Basic Definition and Properties

Definition of lattice.

Lattice：一组基向量的整系数线性组合。

Lattice可能有很多个基底。它们之间被由整数组成且秩为$\pm1$的矩阵所联系。

An integral (or integer) lattice is:

• a lattice with all of whose vectors have integer coordinations
• an additive subgroup of $\mathbb{Z}^m$ for some $m \ge 1$

Lattice is:

• a set of points
• a discrete additive subgroup of $\mathbb{R}^m$

$\mathbb{R}^n$里的每一点$w$都可以写为 $$w = t + v \quad \text{for a unique}\ t \in \mathcal{F} \quad \text{and a unique}\ v \in L.$$

All fundamental domains of a lattice $L$ have the same volume, which turns out to be an extremely important invariant of the lattice.

Upper bound for the derterminat of a lattice:

$\text{Vol}(\mathcal{F})$就是由基向量所组成的矩阵的秩。 $$\text{det}(L) = \text{Vol}(\mathcal{F}) = \text{det}(M)$$ 是lattice中的不变量。

### Short Vectors in Lattices

#### The Shortest Vector problem and the Closest Vector Problem

lattice中的两大根本难题：

• The Shortest Vector Problem (SVP)

Find a shortest nonzero vector in a lattice $L$, i.e., find a nonzero vector $v \in L$ that minimizes the Euclidean norm $| v|$.

• The Closest Vector Problem (CVP)

Given a vector $w \in \mathbb{R}^m$ that is not in $L$, find a vector $v \in L$ that is closest to $w$, i.e., find a vector $v \in L$ that minimizes the Euclidean norm $| w-v|$.

In full generality, CVP is known to be $NP$-hard and SVP is $NP$-hard under a certain “randomized reduction hypothesis”.

CVP is considered to be “a little bit harder” than SVP, since CVP can often be reduced to SVP in a slightly higher dimension.

In real world scenarios, cryptosystems based on $NP$-hard or $NP$-complete problems tend to rely on a particular subclass of problems, either to achieve efficiency or to allow the creation of a trapdoor.

• Shortest Basis Problem (SBP)

Find a basis $v_1, \dots, v_n$ for a lattice that is shortest in some sense.

• Approximate Shortest Vector Problem (apprSVP)

Let $\Psi(n)$ be a function of n. In a lattice $L$ of dimension n, find a nonzero vector that is no more than $\Psi(n)$ times longer than a shortest nonzero vector.

• Approximate Closest Vector Problem (apprCVP)

This is the same as apprSVP, but now we are looking for a vector that is an approximate solution to CVP, instead of an approximate solution to SVP.

#### Hermite’s Theorem and Minkowski’s Theorem

Lattice中的最短向量到底有多长？取决于lattice的维度和$det(L)$。

Hermite’s Theorem

Every lattice $L$ of dimension $n$ contains a nonzero vector $v \in L$ satisfying $$| v | \le \sqrt{n}\ \det(L)^{1/n}.$$

Hadamard ratio of the basis $mathcal{B} = {v_1, \dots, v_n}$: $$\mathcal{H}(\mathcal{B}) = (\frac{\det(L)}{|v_1| |v_2| \dots |v_n|})^{1/n}.$$ Thus $0 < \mathcal{H}(\mathcal{B}) \le 1$, and the closer that the value is to 1, the more orthogonal are the vectors in the basis.

Minkowski’s Theorem

#### The Gaussian Heuristic

gamma函数：

Gaussian expected shortest length

### Babai’s Algorithm and Using a “Good” Basis to Solve apprCVP

$$w = v + \epsilon_1 v_1 + \epsilon_2 v_2 + \dots + \epsilon_n v_n \quad \text{for some}\ 0 \le \epsilon_1, \epsilon_, \dots, \epsilon < 1.$$

  1 2 3 4 5 6 7 8 9 10 11 12 13 14  # sage def BabaisClosestVertexAlgorithm(L, w): ''' INPUT: * "L" -- a matrix representing the LLL-reduced basis (v1, ..., vn) of a lattice. * "w" -- a target vector to approach to. OUTPUT: * "v" -- a approximate closest vector. ''' vt = L.solve_left(w) v = vector(ZZ, [0]*len(L[0])) for t, vi in zip(vt, L): v += round(t) * vi return v 

### Cryptosystems Based on Hard Lattice Problems

• Ajtai-Dwork cryptosystem
• Provably secure unless a worst-case lattice problem can be solved in polynomial time.
• key size: $O(n^4)$.
• Nguyen and Stern showed that any practical and efficient implementation of the Ajtai-Dwork system is insecure.
• GGH cryptosystem of Goldreich, Goldwasser, and Halevi
• Private key: good basis; public key: bad basis.
• Message: a binary vector $m$
• Encryption: linear combination $\sum{m_iv_i^{\text{bad}}}$with the bad basis, and then perturbs the sum by adding a small random vector $r$.
• Decryption: knowing a good basis, use Babai’s algorithm to recover $m$.
• Nguyen showed that a transformation of the original GGH encryption scheme reduced the problem to an easier CVP.
• NTRU cryptosystem proposed by Hoffstein, Pipher, and Silverman
• is most naturally described in terms of quotients of polynomial rings.
• hard problem underlying is easily transformed into an SVP (for key recovery) or a CVP (for plaintext recovery) in a special class of lattices.
• public key size: $O(n \log{n})$.

Roughly speaking, in order to achieve $k$ bits of security, encryption and decryption for Elgamal, RSA, and ECC require $O(k^3)$ operations, while encryption and decryption for lattice-based systems require only $O(k^2)$ operations. Further, the simple linear algebra operations used by lattice-based systems are very easy to implement in hardware and software. However, it must be noted that the security analysis of lattice-based cryptosystems is not nearly as well understood as it is for factorization and discrete logarithm-based systems.

### The GGH Public Key Cryptosystem

GGH is an example of a probabilistic cryptosystem.

### Convolution Polynomial Rings

NTRU的一些前置知识。

$x^i$取模$x^n$，系数$a_i$取模$q$。

### The NTRU Public Key Cryptosystem

#### NTRUEncrypt

NTRU(pronounced $en-tr\bar{u}$): “N-th degree Truncated polynomial Ring Units”

• N：维度，prime
• p, q：两个模数，gcd(N, q) = gcd(p, q) = 1

• Public parameters：(N, p, q, d)
• Private parameters：$f(x) \in \mathcal{T}(d+1, d) \quad \text{and}\ \quad g(x) \in \mathcal{T}(d, d)$

The condition $q>(6d+1)$ ensures that decryption never fails.

For additional efficiency and to reduce the size of the public key, it may be advantageous to choose a smaller value of $q$.

Decryption failures have the potential to reveal private key information to an attacker.

NTRUEncrypt is an example of a probabilistic cryptosystem.

It is a bad idea for Bob to send the same message twice using different random elements.

$F_q(x) \in R_q$ tend to be randomly and uniformly distributed modulo $q$, same as $h(x)$ and $e(x)$.

The most time consuming part of encryption and decryption is the convolution product.

NTRUEncrypt encryption and decryption take $O(N^2)$ steps, where each step is extremely fast.

#### Mathematical Problems for NTRUEncrypt

A hidden relationship: $$f(x) \star h(x) \equiv g(x) \pmod{q}$$ where $f(x)$ and $g(x)$ have very small coefficients, and $h(x)$ appear to be random integers modulo $q$.

Thus breaking NTRUEncrypt by finding the private key comes down to solving the following problem:

Any pair of polynomials $(f(x), g(x))$ with sufficiently small coefficients and satisfying $f(x) \star h(x) \equiv g(x) \pmod{q}$ serves as an NTRU decryption key.

The problem is not practically solvable by a brute-force or collision search.

It is proved that solving the NTRU key recovery problem is (almost certainly) equivalent to solving SVP in a certain class of lattices.

The use of lattice reduction is currently the best known method to recover an NTRU private key from the public key.

Example. NTRUEncrypt with parameters $$(N, p, q, d) = (251, 3, 257, 83)$$ requires $2^{381.6}$ times check.

A collision search take $O(3^{N/2}/\sqrt{N})$ steps.

### NTRUEncrypt as a Lattice Cryptosystem

#### The NTRU Lattice

$$h(x) = h_0 + h_1 x + \dots + h_{N-1} x^{N-1}$$

The NTRU lattice $L_h^{\text{NTRU}}$ associated to $h(x)$ is the 2N-dimensional lattice spanned by the rows of the matirx

#### Quantifying the Security of an NTRU Lattice

The security of NTRUEncrypt depends at least on the difficulty of solving SVP in $L_h^{\text{NTRU}}$.

More generally, if Eve can solve apprSVP in $L_h^{\text{NTRU}}$ to within a factor of approximately $N^{\epsilon}$ for some $\epsilon < \frac{1}{2}$, then the short vector that she finds will probably serve as a decryption key.

Experiments suggest that values of $N$ in the range from 250 to 1000 yield security levels comparable to currently secure implementation of RSA, Elgamal, and ECC.

### Lattice-Based Digital Signature Schemes

#### The GGH Digital Signature Scheme

Sign：只有拥有"good" basis（private key）的人才能解出CVP。

Verify：通过验证给定的lattice point（signature）的确是离目标向量最短的。

The signature $s \in L$ is a solution to apprCVP for the vector $d \in R^n$, so signing a document is equivalent to solving apprCVP.

The GGH signature scheme suffers the same drawback as the GGH cryptosystem, namely security requires lattices of high dimension, which in turn lead to very large public verification keys.

However, both GGH and NTRU signature schemes have a more serious shortcoming which we now describe.

#### Rejection Sampling

An alternative method of thwarting transcript attacks was proposed by Lyubashevsky. It is based on the idea from statistics called rejection sampling, in which one generates samples from a desired probability distribution by using samples from another distribution.

#### The NTRU Modular Lattice Signature Scheme

Lyubashevsky gave an example of a transcript-secure signature scheme based on the learning with errors (LWE) problem.

NTRUMLS：NTRU modular lattice signature scheme

An attacker, using only the publick key $h$, can create a transcript of signed document hashes that is indistinguishable from a transcript created using the private key $(f, g)$. Hence the latter transcript (created using the private key) contains no information about the private key.

### Lattice Reduction Algorithms

In this section we describe an algorithm called LLL that solves apprSVP and/or apprCVP to within a factor of $C^n$, where $C$ is a small constant and $n$ is the dimension of the lattice.

Ultimately, the security of lattice-based cryptosystems depends on the inability of LLL and other lattice reduction algorithms to efficiently solve apprSVP and apprCVP to within a factor of, say, $O(\sqrt(n))$.

#### The LLL Lattice Reduction Algorithm

1982：the publication of the LLL algorithm

A.K. Lenstra, H.W. Lenstra Jr., L. Lova ́sz, Factoring polynomials with ratio- nal coefficients. Math. Ann. 261(4), 515–534 (1982)

$$B^* = MB$$

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43  # sage def LLL(L, delta=0.75): ''' The LLL Lattice Reduction Algorithm. INPUT: * "L" -- a matrix representing a lattice basis. * "delta" -- a constant between 0.25 and 1. OUTPUT: * "Q" -- a matrix which is LLL-reduced. NOTE: Better way to do this -- sage: L.LLL(delta=0.75) ''' Q = matrix(ZZ, L) G, M = Q.gram_schmidt() k = 1 n = Q.nrows() while k < n: # Size Reduction for j in reversed(range(k-1)): Q[k] = Q[k] - round(M[k,j])*Q[j] G, M = Q.gram_schmidt() # Lovasz Condition if G[k].norm()^2 >= (delta - M[k,k-1]^2) * G[k-1].norm()^2: k = k + 1 else: # Swap Step Q[k], Q[k-1] = Q[k-1], Q[k] G, M = Q.gram_schmidt() k = max(k-1, 1) return Q 

If we use 1 instead of 3/4, then it is an open problem whether the LLL algorithm terminates in polynomial time.

If we use 3/4, or any other constant strictly less than 1, then LLL runs in polynomial time, but we may miss an opportunity to reduce the size of a determinant by passing up a swap.

In practice, one often takes a constant larger than 3/4. but less than 1, in the Lovasz condition.

The output from LLL is dependent on the order of the basis vectors.

#### Using LLL to Solve apprCVP

LLL算法能够输出一个quasi-orthogonal basis

Babai’s algorithm有两个版本：

• the closest vertex algorthm
• the closes plane algorithm (更好一些)

#### Generalizations of LLL

Most of these methods involve trading increased running time for improved output.

• The deep insertion method

LLL算法中交换的是$v_k$和$v_{k-1}$，这个算法是把$v_k$插到前面的两个$v_{i-1}$和$v_i$之间。

• BKZ-LLL

基于一个叫做Korkin-Zolotarev (KZ) reduced的东西。

把交换那一步换成了对一个区块的reduction。

维度为$n$，区块大小为$\beta$，那么复杂度为$O(\beta^{c\beta}n^d)$。

### Applications of LLL to Cryptoanalysis

• Attacks on knapsack public key cryptosystems to more recent analysis of lattice-based cryptosystems such as Ajtai-Dwork, GGH, and NTRU.
• Lattice reduction attacks on RSA in certain situations.

#### Congruential Cryptosystems

2维的NTRU，直接用Gauss Lattice Reduction或者LLL把$(f, g)$求出来就可以了。

#### Applying LLL to Knapsacks

When using LLL to solve subset-sum problems, it is often helpful to multiply $m_1, \dots, m_n, S$ by a large constant $C$. 能使Gaussian expected shortest vector 变大$C^{1/(n+1)}$倍。