CERG Seminars are held in the Engineering Building on the GMU Fairfax campus unless noted otherwise. Parking is available in the Sandy Creek parking deck near the Engineering Building. Directions to the campus can be found here. The seminar talks are usually 45 to 60 minutes long and are open to the public. If you wish to be notified about future seminars, please send an e-mail to Jens-Peter Kaps.
High-Speed Hardware Implementations of Post-Quantum Cryptography Multivariate Signature Schemes
Ahmed Ferozpuri, ECE MS Defense
Date: Wednesday, December 6, 10:30 AM - 12:00 PM
Location: Engineering Building, Room 2901
Multivariate cryptosystems belong to the five most promising families of post-quantum cryptography (PQC) schemes. Among them, the Unbalanced Oil and Vinegar (UOV) and the Rainbow signature schemes have been extensively studied since 1999 and 2005, respectively. The main advantage of UOV is high confidence in its security; the disadvantages include large key and signature sizes. Rainbow is a multi-layer version of UOV that offers better performance, smaller keys, and smaller signatures. In this thesis, we present and compare hardware implementations of both schemes in high-performance Field Programmable Gate Arrays (FPGAs). The optimization is for the minimum signature generation and verification time. The generation of keys is assumed to be done in software. Compared to the previous state-of-the-art high-speed implementation, the proposed design for Rainbow is more than twice as fast, and introduces two architectural innovations: a novel pivot calculation circuit and a memory based microprogrammed architecture. Additionally, in order to make benchmarking easier and fairer, our design follows a universal PQC hardware API, which allows for fair comparison with other post-quantum signature schemes, in particular those submitted to the NIST PQC Project. The design is intended to be made open-source to speed-up further optimizations. Additionally, we will discuss a novel matrix binarization method with its potential applications, a projection of scalability for larger security levels and future optimizations.
Methodology for Developing Lightweight Architectures for FPGAs
Panasayya Yalla, ECE PhD Defense
Date: Friday, December 1, 1:30 PM - 3:00 PM
Location: Engineering Building, Room 4801
Until now, application specific integrated circuits (ASICs) are the main platform for lightweight cryptography because of their low power consumption and good performance. However, their complex design cycle and very high non-recurring engineering cost limit them to high volume applications. In recent years, low cost and power Field Programmable Gate Arrays (FPGAs) (Xilinx: Spartan-6 and Artix-7; Altera: Cyclone- IV and -V; Actel: IGLOO and ProASIC3) have started emerging, reducing the power consumption gap between ASICs and FPGAs. FPGAs are the ideal platform for fast changing environments and lower volume applications. In spite of these advantages, very little attention has been paid to FPGAs as a target for lightweight cryptography.
Implementing algorithms for lightweight applications is a complex and time consuming task due to inter-dependencies of the constraints on size, power, energy, and cost. The various design choices such as interface, width of datapath, serialization, pipelining, choice of processing elements etc. determine whether the design meets these constraints. In most cases this results in designs where the datapath width is reduced. However, this is not sufficient, one has to one has to carefully evaluate the trade-off of various constraints at every step of the design process. The control unit is an additional hurdle. Extensive component re-use in the datapath can lead to a very complex control logic that might negate the area savings in the datapath. In this research, we tackle these problems in three parts. First part involves developing a generalized methodology for making early design choices and various optimizations that can be applied to datapath. The control logic optimization techniques using memories are proposed in the second part. Finally, a tool is developed that optimizes the control logic by using the existing controller or state matrix as the input and transforms it into an optimized controller. This optimized controller is a combination of traditional FSM realized using Flip-flops and combinational logic with fewer states and memories.
Using the proposed methodology, we developed lightweight architectures for block cipher Advanced En- cryption Standard (AES) for three different widths, Secure Hash Algorithm-256 (SHA-256), multipurpose AES and Keccak cores, Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) candidates Ketje-Sr, Ascon-128, and Ascon-128a. The effectiveness of the optimization tool is tested using AES128 and Keccak core. We also developed a hardware package that supports CAE- SAR hardware Application Programming Interface (API) for lightweight implementations and evaluated its benefitts using Ketje-Sr and Ascon.
Evaluation of the CAESAR Hardware API for Lightweight Implementations
Panasayya Yalla, ECE PhD Seminar
Date: Thursday, November 30, 1:00 PM - 2:00 PM
Location: Engineering Building, Room 3202
The Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) requires that all hardware implementations of candidate algorithms adhere to the CAESAR Hardware Application Programming Interface (API). The CAESAR Hardware API is supported by a development package which includes VHDL code for universal pre- and post-processors for high-speed and recently also for lightweight implementations. These processors are designed to make a cipher core compliant with the API. However, for lightweight implementations it is generally assumed that having generic pre-and post-processors increases the area consumption over merging their functionality with the cipher cores. We evaluate the lightweight package through two case studies. First, we verified that the lightweight package has a smaller area footprint than the high-speed package. Second, we show that the overhead of using the generic lightweight pre- and post-processors over integrating their functionality into the cipher core is negligible. As part of these case studies, we have developed the first lightweight implementations of Ketje-Sr, Ascon-128, and Ascon-128a.
Public Key Cryptography Using Hardware/Software Codesign for the Internet of Things
Ahmad Salman, ECE PhD Defense
Date: Wednesday, August 2nd, 1:00 PM - 3:00 PM
Location: Engineering Building, Room 3507
Embedded electronic devices and sensors are playing a major role in bridging the gap between the physical world and the virtual world. Billions of devices such as smartphones, smart watches, wearables, medical implants, and Wireless Sensor Nodes (WSN) are considered building blocks in making "The Internet Of Things" (IoT) a reality. Such devices often carry sensitive data and are used in critical applications, making it essential to create a secure environment to protect the data they gather at rest and in transit. With these devices being limited in their power, energy, area, and memory, choosing a suitable cryptographic system to provide the necessary security services becomes a challenge. Pairing Based Cryptography (PBC) is among the leading candidates to bringing Public-Key Cryptography(PKC) to lightweight devices as it provides services that traditional PKC systems lack. Security services such as non-interactive key agreement, Identity-based Encryption (IBE), revoking of compromised keys and more, are all examples which show PBC benefit over PKC.
For these reasons and more, the area of creating lightweight implementations for different building blocks of PBC in software and hardware is an active research area and a hot topic among the cryptographic community. In this research, we studied bilinear pairings and their lightweight implementations in software, hardware, and hardware/software co-design in efforts to create a design that is efficient, flexible, and lightweight. We also studied the effect of adding countermeasures to side-channel attacks on area usage and power consumption. Finally, we performed measurements on the power and energy consumption of the implemented designs. Our goal is to exploit the benefits of using PBC over classical public key for applications running on resource constraint devices and show that a lightweight PBC implementation on these devices is feasible and practical. The work was divided into two main phases. The first phase focused on the selection of pairing parameters (finite field, elliptic curve, embedding degree) that provide an acceptable security level while meeting efficiency requirements for resource constraint devices. The second phase focused on designing an efficient hardware accelerator for computationally intensive operations in pairing-based cryptography to achieve acceptable speed while minimizing area and power consumption.
Hardware-Software Codesign Approaches to Public Key Cryptosystems
Malik Umar Sharif, ECE PhD Defense
Date: Wednesday, August 2nd, 10:00 AM - 12:00 PM
Location: Engineering Building, Room 4801
If a quantum computer with a sufficient number of qubits was ever built, it would easily break all current American federal standards in the area of public-key cryptography, including algorithms protecting the majority of the Internet traffic, such as RSA, Elliptic Curve Cryptography (ECC), Digital Signature Algorithm (DSA), and Diffie-Hellman. As a result, a new set of algorithms, resistant against any known attacks involving quantum computers, must be developed. These algorithms are collectively referred to as Post-Quantum Cryptography (PQC). The standardization effort for these algorithms is likely to last years and result in the entire portfolio of algorithms capable of replacing current public-key cryptography schemes. As a part of this standardization process, fair and efficient benchmarking of PQC algorithms in hardware and software becomes a necessity. Traditionally, software implementations of public-key algorithms provided the highest flexibility but lacked performance. On the other hand, custom hardware implementations provided the highest performance but lacked flexibility and adaptability to changing algorithms, parameters, and key sizes. Therefore, in this work, we investigate the suitability of the hardware/software codesign for implementing and evaluating traditional and post-quantum public-key cryptosystems from the point of view of their implementation efficiency.
As our case studies, we considered one traditional public key cryptosystem, RSA, and one post-quantum public key cryptosystem, NTRUEncrypt. We implemented both of them using custom hardware, as well as software/hardware codesign. The Xilinx Zynq-7000 System on Chip platform, which integrates a dual-core ARM Cortex A9 processing system along with Xilinx programmable logic, was used for our experiments. The performance vs. flexibility trade-off has been investigated, and the speed-up of our software/hardware codesign implementations vs. the purely software implementations on the same platform is reported and analyzed. Similarly, the speed-up of the custom hardware vs. hardware-software codesign is investigated as well. Additionally, we have determined and analyzed different percentage contributions of the execution times for equivalent component operations executed using the aforementioned three different implementation approaches (custom hardware, software/hardware codesign, and pure software). We demonstrate that hardware/software codesign can reliably assist in early evaluation and comparison of various public-key cryptography schemes. Our project is intended to pave the way for the future comprehensive, fair, and efficient benchmarking of the most promising encryption, signature, and key agreement schemes from each of several major post-quantum public-key cryptosystem families.
A New Approach to the Development of Coprocessors for Pairing-based Cryptosystems
Rabia Shahid, ECE PhD Defense
Date: Monday, July 31st, 2:00 PM - 4:00 PM
Location: Engineering Building, Room 3507
Cryptographic engineering is a field that combines cryptology, algebraic geometry, and number theory with methods from computer arithmetic, digital system design, and computer architecture. Unfortunately, most of the researchers working in this area are either mathematicians/cryptographers or computer engineers, specializing in their respective fields. The theoretical complexities related to number theory and abstract algebra in the majority of public-key cryptosystems can easily prevent computer engineers from fully optimizing their designs. One of the most established state-of-the-art solutions is Elliptic Curve Cryptography (ECC). One of the most promising emerging approaches is Pairing-Based Cryptography (PBC). PBC-based security services, such as non-interactive key agreement, identity-based encryption (IBE) and short signatures, solve problems beyond the range of traditional cryptographic schemes, and make cryptographic solutions less costly, less cumbersome, and easier to deploy. The broad spectrum of ECC and PBC schemes available in the literature and a wide-range of possible parameter choices requires deep understanding of the possible trade-offs and dependencies among the parameter values and efficiency of the corresponding hardware implementations.
In this research, we describe a possible bridge between the aforementioned two domains, demonstrated using selected families of Elliptic Curve and Pairing-Based Cryptosystems. We present the design of a configurable and generic execution unit that serves as a coprocessor to perform operations involved in these cryptosystems. The execution unit is supported by a software static scheduler to automate the cumbersome process of manual scheduling of operations required by these algorithms. The arithmetic unit performs the operations at the lowest level of hierarchy, i.e., at the level of prime field arithmetic. We focus on optimizing the overall performance of the crypto-processor by using an optimal number of multiplier units, capable of taking full advantage of the parallelism present in an implemented algorithm and a single modular adder/subtractor, working in parallel with multipliers. An instruction set architecture capable of supporting all required instructions is designed, along with the coprocessor that can process multiple batches of instructions using the arithmetic unit. We report results in terms of latency in clock cycles and in absolute time units. We also demonstrate that the entire setup is generalizable to any cryptosystem that involves modular multiplications and modular additions/subtractions at the lowest level of hierarchy.
A Generic High-Speed Hardware Implementation of NTRUEncrypt SVES
Malik Umar Sharif, ECE PhD Seminar
Date: Monday, July 24th, 11:00 AM - 12:00 PM
Location: Engineering Building, Room 3507
NTRUEncrypt is a polynomial ring-based public-key encryption scheme that was first introduced at Crypto'96. In 2008, an extended version of this algorithm was published as the IEEE 1363.1 Standard Specification for Public Key Cryptographic Techniques Based on Hard Problems over Lattices. Within the standard, the described algorithm is called Short Vector Encryption Scheme - SVES. The recent renewed interest in NTRU is at least partially driven by its presumed resistance to any efficient attacks using quantum computers. In Feb. 2016, NIST has announced its plans of starting the standardization effort in the area of post-quantum cryptography. This effort is likely to last years and result in an entire portfolio of algorithms capable of replacing current public-key cryptography schemes. As a part of this standardization process, fair and efficient benchmarking of PQC algorithms in hardware and software becomes a necessity.
We present a high-speed hardware implementation of NTRUEncrypt Short Vector Encryption Scheme (SVES), fully compliant with the aforementioned IEEE standard. Our design supports two representative parameter sets, ees1087ep1 and ees1499ep1, optimized for speed, which provide security levels of 192 and 256 bits, respectively. Our implementation follows an earlier proposed Post-Quantum Cryptography (PQC) Hardware Application Programming Interface (API). As a first design following this API, it provides a reference that can be adopted in any future implementations of post-quantum cryptosystems. We present the detailed flow and block diagrams, as well as results in terms of latency (in clock cycles), maximum clock frequency, and resource utilization. We also report the speedup of our implementation in Xilinx Field Programmable Gate Arrays (FPGAs) as compared to existing software implementations of NTRUEncrypt SVES, with equivalent functionality. Our results show a significant speed-up of hardware vs. software, and very different percentage contributions of the execution times for equivalent operations executed in these two different environments. Our project is intended to pave the way for the future comprehensive, fair, and efficient hardware benchmarking of the most promising encryption, signature, and key agreement schemes from each of several major post-quantum public-key cryptosystem families.
A Generic Approach to the Development of Coprocessors for Elliptic Curve Cryptosystems
Rabia Shahid, ECE PhD Seminar
Date: Monday, July 24th, 10:00 AM - 11:00 AM
Location: Engineering Building, Room 3507
Cryptographic engineering is a field that combines cryptology, algebraic geometry, and number theory with methods from computer arithmetic, digital system design, and computer architecture. Unfortunately, most of the researchers working in this area are either mathematicians/cryptographers or computer engineers, specializing in their respective fields. The theoretical complexities related to number theory and abstract algebra in the majority of public-key cryptosystems can easily prevent computer engineers from fully optimizing their designs. In this talk, we describe a possible bridge between these two domains, demonstrated using a family of Elliptic Curve Cryptosystems (ECC). We present the design of a configurable and generic execution unit for ECC that serves as a coprocessor to perform operations involved during a scalar multiplication. The execution unit is supported by a software static scheduler to automate the cumbersome process of manual scheduling of operations involved in ECC. The arithmetic unit performs the operations at the lowest level of hierarchy, i.e., prime field arithmetic. We focus on optimizing the overall performance of the cryptoprocessor by using an optimal number of multiplier units, capable of taking full advantage of the parallelism present in the algorithm and a single modular adder/subtractor, working in parallel with multipliers. An instruction set architecture capable of supporting all required instructions is designed, along with the coprocessor that can process multiple batches of instructions using the arithmetic unit. We report results for an entire scalar multiplication in terms of latency in clock cycles and in absolute time units. We also demonstrate that the entire setup is generalizable to any cryptosystem that involves modular multiplications and modular additions/subtractions at the lowest level of hierarchy.
A Scalable ECC Processor for High-Speed and Light-Weight Implementations with Side-Channel Countermeasure
Ahmad Salman, ECE PhD Seminar
Date: Friday, June 16th, 3:00 PM - 4:00 PM
Location: Engineering Building, Room 3203
With the growing number of devices connected to the Internet, the need for flexible Public Key Cryptosystems (PKC) that can be supported by multiple platforms while maintaining a high level of security is essential. The performance of PKC based on elliptic curves is mostly dependent on the performance of the underlying field arithmetic. In this work, we present high-speed and lightweight implementations of a fully scalable architecture of an Elliptic Curve Cryptography (ECC) scalar multiplier processor. The processor supports operations over GF(p) for arbitrary values of p, and field sizes up to 521 bits. The implementations perform modular multiplication operations using fully scalable Montgomery multiplier architectures, one tailored for high-speed and one for lightweight. Point addition and point doubling operations are performed over Co-Z projective coordinates. While transmission and storage are done in affine coordinates. In addition to having dedicated high-speed and lightweight architectures, both also support different bus widths to increase flexibility and allow for a wide range of applications. Our cores include countermeasures to side-channel attacks by using the Montgomery Ladder and Exponent Randomization methods to provide resistance to Simple Power Analysis (SPA) and Differential Power Analysis (DPA) respectively.
We have implemented the design on FPGA and All Programmable System on Chip platforms from different vendors as well as using a standard-cell ASIC library in order to provide comprehensive results We also analyzed power and energy consumptions for each implemented design to determine the relation between area/throughput trade-off and power and energy consumptions. We have evaluated our designs based on NIST recommended field lengths - 192, 224, 256, 384 and 521 bits - using several arbitrary values of prime p
ECE 746 Advanced Cryptography, Project Presentations
Date: Tuesday, May 1st, 4:30 PM - 8:00 PM
Location: Engineering Building, Room 3507
Join us for an evening of exciting presentations by ECE 746 students. The exact schedule is posted here. Farnoud Farahmand, Abubakr Abdulgadir, and Brian Jarvis of our research group will be presenting. Please come over to cheer them on!