Date of Award


Degree Name

Doctor of Philosophy


Electrical and Computer Engineering

First Advisor

Dr. Lina Sawalha

Second Advisor

Dr. Jonshon Asumadu

Third Advisor

Dr. Janos Grantner

Fourth Advisor

Dr. Alvis Fong


Hardware/software partitioning techniques, Cloud-scale, CPU-FPGA platform


The diversity of workload characteristics has stimulated the deployment of heterogeneous architectures to accommodate workloads’ requirements disparity in cloud data centers. In heterogeneous computing, co-processors are utilized to support Central Processing Units (CPUs) in fulfilling workload demands. Field Programmable Gate Arrays (FPGAs) have advantages over other accelerators because of their power, performance and re-configurability benefits. In order to achieve the most benefit of a heterogeneous platform, efficient partitioning of workload between the CPU and the FPGA is a crucial demand.

This dissertation first presents a design and implementation of cooperative CPU-FPGA execution techniques, which include code and data partitioning, of an image processing algorithm on Intel’s Hardware Research Acceleration Program (HARP). The data partitioning outperforms both a CPU-only and a FPGA-only implementations by up to 4.8X and 2.1X respectively. It also results in a 55.3% reduction in energy consumption, on average, compared to the CPU-only implementation. The code partitioning resulted in up to 2.3X speedup compared to a CPU-only implementation and improved system utilization.

The dissertation also presents an automatic hardware/software partitioning of cloudscale applications such as the k-means algorithm, the Canny algorithm, and the Advanced Encryption (AES) algorithm on HARP. Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) were used to partition these applications leveraging a multi-objective utility function. The accuracy and the execution time of PSO depend to a large extent on its parameters. However, generally accepted fixed value parameters are used by researchers and practitioners. In this study, a machine learning-based tuning technique for the PSO parameters was proposed and implemented. The results show an improvement in PSO accuracy by up to 62.9% and in its execution time by up to 29%. Moreover, aiming at mitigating the effect of the premature convergence problem that GA and PSO suffer from. The PSO algorithm is extended with a distributed greedy search technique. This approach improves the accuracy of PSO by up to 55.4%. GA also was extended with the distributed local search technique, which improved the accuracy of GA by up to 82.6%.

Finally, we propose and implement a variation of the PSO algorithm that partitions the code and the data of an application between the CPU and the FPGA by assigning some nodes to both devices with different data sets. This partitioning approach improves the accuracy of PSO by up to 33% for a data parallel application, Canny.

Access Setting

Dissertation-Campus Only

Restricted to Campus until