# **Hardware Architectures**



# **Hardware Architectures**



**Exponentiation** 

- linear algebra

# **NSA-developed Cryptographic Standards**

**Block Ciphers** 



time

# **Cryptographic Standard Contests**



time

# Criteria used to evaluate cryptographic transformations



# Advanced Encryption Standard (AES) Contest 1997-2001

### **June 1998**

**August 1999** 

October 2000

### **15 Candidates**

from USA, Canada, Belgium, France, Germany, Norway, UK, Israel, Korea, Japan, Australia, Costa Rica

**5** final candidates

Mars, RC6, Rijndael, Serpent, Twofish

### Round 1

Security Software efficiency Flexibility

Round 2

Security Hardware efficiency

1 winner: Rijndael Belgium

# Implementations of candidates for the new Advanced Encryption Standard (AES) Speed [Mbit/s] Xilinx, Virtex 1000 FPGA



# **Our Results:** Encryption in cipher feedback modes (CBC, CFB, OFB) - Virtex FPGA





### <u>NSA Results:</u> Encryption in cipher feedback modes (CBC, CFB, OFB) - ASIC, 0.5 μm CMOS Throughput [Mbit/s]



# **Speed of the final AES candidates in Xilinx FPGAs**

# **Speed** [Mbit/s]

K.Gaj, P. Chodowiec, AES3, April, 2000



### Survey filled by 167 participants of the Third AES Conference, April 2000 # votes



# **NIST Report: Security**



# **eSTREAM Stream Cipher Comparison**

- Part of the GMU Fall 2006 & Fall 2007 graduate courses ECE 545 Introduction to VHDL
- Individual 6-week project
- 4 students working independently on each eSTREAM cipher
- best code for each algorithm selected at the end of the semester
- selected designs verified and revised in order to assure
  - correct functionality
  - standard interface & control
  - possibly uniform design & coding style

### **Comparison of 4 Focus Hardware-Oriented Stream Ciphers** FPGA: Xilinx Spartan 3 family



### **Comparison of 8 Final Candidates Sorted by Minimum Area and Maximum Throughput/Area**

| Candidate       | Area<br>(slices) | Candidate             | Throughput/Area<br>(Mbps/slices) |
|-----------------|------------------|-----------------------|----------------------------------|
| Grain v1        | 44               | Trivium (x64)         | 39.26                            |
| Grain 128       | 50               | Grain 128 (x32)       | 7.97                             |
| Trivium         | 50               | Grain v1 (x16)        | 5.98                             |
| DECIM v2        | 80               | Trivium               | 4.80                             |
| DECIM 128       | 89               | F-FCSR-16             | 4.53                             |
| MICKEY 2.0      | 115              | Grain v1              | 4.45                             |
| MICKEY 128 2.0  | 176              | Grain 128             | 3.92                             |
| Moustique       | 278              | F-FCSR-H v2           | 3.23                             |
| F-FCSR-H v2     | 342              | MICKEY 2.0            | 2.03                             |
| Trivium (x64)   | 344              | <b>MICKEY 128 2.0</b> | 1.27                             |
| Grain v1 (x16)  | 348              | Moustique             | 0.81                             |
| F-FCSR-16       | 473              | DECIM v2              | 0.58                             |
| Grain 128 (x32) | 534              | DECIM 128             | 0.49                             |
| Pomaranch       | 648              | Edon80                | 0.10                             |
| Edon80          | 1284             | Pomaranch             | 0.08                             |

# **Conclusions from the Comparison of the eSTREAM Candidates in Hardware**

**Very large differences among 8 leading candidates:** 

- ~30 x in terms of area (Grain v1 vs. Edon80)
- ~500 x in terms of the throughput to area ratio (Trivium (x64) vs. Pomaranch)

### **Current State of Security of Major Hash Functions**



### SHA-2: SHA-256, SHA-384, SHA-512

## **SHA-3 Contest Timeline**

#### <u>2007</u>

- publication of requirements
- 29.X. 2007: request for candidates

#### <u>2008</u>

• **31.X.2008:** deadline for submitting candidates

#### <u>2009</u>

2 Q – first workshop devoted to the presentation of candidates

#### <u>2010</u>

2 Q: <u>second workshop</u> devoted to the analysis of candidates 3 Q: selection of finalists

#### <u>2012</u>

- **1 Q: last workshop**
- 2 Q: selection of the winner
- **3 Q: draft version of the standard published**
- 4 Q: final version of the standard published

# Basic iterative architecture of a typical hash function



# **Unrolled architecture**



# **Unrolled Architectures of Hash Functions - Summary**

Loop unrolling more suitable for hash algorithms than for symmetric-key ciphers

Speed up compared to the basic iterative architecture:SHA-1:1.9 (5 rounds unrolled)SHA-256:1.5 (4 rounds unrolled)SHA-512:1.3 (5 rounds unrolled)

Speed up is a strong function of data dependencies present in the algorithm

# **Hardware Architectures**



# **Montgomery Multipliers: Motivation**

- Fast modular multiplication required in multiple cryptographic transformations
  - RSA, DSA, Diffie-Hellman
  - Elliptic Curve Cryptosystems
  - ECM, p-1, Pollard's rho methods of factoring, etc.
- Montgomery Multiplication invented by Peter L. Montgomery in 1985 is most frequently used to implement repetitive sequence of modular multiplications in both software and hardware
- Montgomery Multiplication in hardware replaces division by a sequence of simple logic operations, conditional additions and right shifts

### **Primary Advantage of our New Architectures**

• Reduction in the number of clock cycles from

to n + e - 1 n - size of operands in bitse - size of operands in words

• Minimum penalty in terms of the area and clock period

### Normalized Product Latency Times Area New Architecture vs. Previous Architectures



# **Hardware Architectures**



# **Pairing Based Cryptography**

- New family of public key cryptosystems, first proposed by Menezes, Okamoto, and Vanstone in 1993
- Application to:
  - Identity Based Cryptography
  - One-round 3-way key exchange
  - Short digital signatures
  - Others: Group signatures, batch signatures, threshold cryptography, broadcast encryption, private information retrieval, electronic voting, etc.
- Not a part of any standard yet
- Very limited number of software and hardware implementations

## **Spectral Modular Exponentiation**

- New method for fast modular exponentiation for very long integers in the range of 10,000-20,000 bits
- First publication in 2007, by Koc and Saldamli
- Intersection of cryptography and Digital Signal Processing
- Better computational complexity than any other algorithm known to date
- No reported software or hardware implementations

# **Hardware Architectures**



### **Workshop Series**

# SHARCS - Special-purpose Hardware for Attacking Cryptographic Systems

 1<sup>st</sup> edition: Paris,
 Feb. 24-25, 2005

 2<sup>nd</sup> edition: Cologne,
 Apr. 3-4,
 2006

 3<sup>rd</sup> edition: Vienna,
 Sep. 9-10,
 2007

See

http://www.ruhr-uni-bochum.de/itsc/tanja/SHARCS/

# Best Algorithm to Factor Large Numbers NUMBER FIELD SIEVE

Complexity: Sub-exponential time and memory



# Factoring 1024-bit RSA keys using Number Field Sieve (NFS)



# **Comparison among technologies**



# SRC 6 reconfigurable computer



SRC 6 from SRC Computers

Basic unit:

2 x Pentium Xeon 3 GHz

2 x Xilinx Virtex II FPGA XC2V6000 running at 100 MHz

Fast communication interface between the microprocessor board and the FPGA board, 1600 MB/s

Multiple basic units can be connected using Hi-Bar Switch and Global Common Memory

### **Factoring Runs per Second**



# ASIC 130 nm vs. Virtex II 6000 - rho (24 units)



Area of an ASIC with equivalent functionality

# Number of rho & ECM computations per second using the same chip area

