M.Eng. Poster Session 2017
Best Overall Poster
Crystallographic chemical etching is an enabling process for semiconductor device technology. Chemical etch process for GaN materials and devices are severely underdeveloped due to its apparent inertness to all common wet etchants. This project revisits the etch characteristics of a particular GaN sample using the general geometric principles, and investigates the anisotropic etching effect from tetramethylammonium hydroxide (TMAH) solution.
The motivation of this research project comes from real-world fabrication of novel GaN-based devices such as GaN FinFET, where predicting geometries resulting from the extremely high etch rate anisotropy is required. This new understanding can aid in realizing smoother sidewalls during fabrication of GaN FinFETs.
Best in AI / Pattern Recognition (Computer Vision, Machine Learning, Robotics)
Deep learning has quickly risen to dominate fields such as computer vision and speech recognition. Although machine learning itself has been a research field for decades there has never been a more promising time for deep learning. The rise is partly due to the enormous amount of data currently available as well as the computational power of current systems. Deep learning systems thrive on data. The models become much better as they obtain more data to learn from. In order to process the data, deep learning algorithms required high performance computational power which were not cost efficient until the past decade. However, a general CPU does not have the performance necessary to do research in deep learning. The current practice requires GPUs which have high performance but also consume a lot of power. A possible middle ground between the two hardware would be a FPGA which will provide better performance than a CPU while reducing power consumption when compared to GPUs.
The objective of this M.Eng. project is to accelerate a deep learning model, specifically a Convolutional Neural Network (CNNs), using FPGAs to classify images. We use an analytical approach to design where we optimize computational throughput but only up to the point where the off-chip memory bandwidth can be supported by the platform. It considers the computation to communication ratio to optimize sizing of arrays for on-chip memory. The network was build using C++ on Xilinx Vivado High-Level Synthesis (HLS). The computation engine used several loop transformations to achieve high throughput. The primary method for loop optimization was loop tiling which allowed us to maximize data reuse and parallelism. We used loop unrolling to create multiple hardware copies to parallelize loop iterations. It was supplemented by the use of loop pipelining at the kernel level to reduce latency. While the computation optimizations were critical, it was just as important to have memory optimizations which could match the throughput of the computation engine. The on-chip memory accesses was optimized using cyclic array partitioning which allowed the engine to access multiple RAM ports in a single cycle thereby boosting the loop unrolling performance. Additionally, the accelerator used local memory promotion to mitigate redundant accesses to memory. Although memory access was the primary bottleneck for the accelerator, its effect was reduced by analyzing the access patterns of neural networks and could be used in almost all convolutional neural network architectures.
Best in Bio-Signals (Neural, Controls, Imaging, Bioinformatics)
Cardiovascular disease is a leading cause of death in developed countries. Interventricular septum, the surface separating ventricles from one another, is an indicator of heart health. Abnormal position or geometry of septum reflects potential heart disease. To diagnose heart disease, contrast agent is usually injected into patient’s body to obtain contrast-enhanced images, but it can have adverse effects on patients, and may even cause death. Low-dose CT, which is used in lung cancer screening, produces lower radiation than standard-dose CT, and does not require contrast agent. Thus, identifying the septum in low-dose CT brings extra benefits to patients to monitor heart health without risks.
This poster describes an atlas-based segmentation approach to identify the interventricular septum in low-dose CT scans. Partial obtained results and visualizations are presented in this poster as well.
Best in Communications (Information Theory, Network Coding, Digital Communications)
Optimal data detection for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems requires prohibitive computational complexity. Therefore, practical data-detector designs typically rely on near-optimal algorithms, such as linear minimum mean-square error (L-MMSE) equalization. This method, however, requires accurate knowledge of the signal-to-noise ratio (SNR), which is difficult to acquire in practice due to the time-variant nature of wireless channels. We develop a VLSI design of a novel, nonparametric data-detection algorithm for massive MU-MIMO systems that provably achieves the error-rate performance of the L-MMSE equalizer without requiring knowledge of the SNR. The algorithm, referred to as NonParametric Equalizer (NOPE), is robust to a broad range of system impairments and exhibits lower complexity than traditional L-MMSE data detectors that require costly matrix inverses and tedious parameter tuning. To demonstrate the effectiveness of NOPE, we develop a coarsely-pipelined VLSI architecture and provide preliminary implementation results in 40nm CMOS for a massive MUMIMO system in which 16 single-antenna users transmit data to a 64-antenna base-station.
Best in Computer Systems (OS, Embedded, Networks, Architecture, Database)
Internet of Things: Environmental Sensor Control
Rahul Sharma and Amardeep Manak
Best in Electronic Devices and Materials (Analog, Digital, Optics, MEMS, Circuits)
Electronics for Autonomous Construction Robots
Alberto Gutierrez and Boling Hu
To bring automation to construction, Prof. Kirstin Petersen has designed a robot collective for construction of user-specified three-dimensional structures. In this project, we propose and implement updates to the driver electronics of these robots, which include infrared (IR) sensors, digital filters, motor drivers, wireless communication, and package miniaturization. The new design is based on a dsPIC33F microcontroller with digital signal processing capabilities. This unit offers the performance of a DSP with the simplicity of an MCU. To filter out unwanted signals, such as ambient or fluorescent light, received by the IR sensors, a bandpass Finite Impulse Response (FIR) filter was proposed and designed. The filter provides a more compact and robust design than the previous analog circuit. Coding one DSP unit versus mounting six times an analog filter provides a considerably faster, simpler, and easier-to-calibrate design.
In addition, new motor drivers were added to the system, which are compliant with the power specification and capable of driving two motors simultaneously. Bluetooth modules were also included in the design, enabling wireless communication between the robot and a computer. The complete design resulted in a miniaturized printed circuit board with faster fabrication time, and better power consumption. Our design provides simpler and more compact set of electronics that enhance the capabilities of the current robot, a first step towards the implementation of a large robot swarm, relying on simple, small-scale inexpensive electronics.
Best in Large Scale Systems (Power Systems, Energy)
Although cloud computing has increased in popularity, datacenter utilization has remained for the most part low. This is in part due to the interference that comes as a result of applications sharing hardware and software resources. When interference occurs, the resources of at least one co-scheduled application need to be reduced forcing it to take a performance penalty. In current proposals, the penalized application is typically a low-priority, best-effort workload. Approximate computing applications present an opportunity to improve datacenter efficiency without performance degradation, since they can absorb the enforced resource reduction as a loss in output quality.
The objective of this project is to build up a runtime system, Pliant, that improves datacenter utilization by co-scheduling interactive services with approximate computing applications. When the runtime detects QoS violations in the interactive service, it employs approximation to reduce interference, and absorbs the resource reduction as a loss in output accuracy. Pliant enables 90-95% CPU utilization, while ensuring that memcached achieves the same throughput (QPS) and tail latency as when run in isolation, and the approximate computing applications achieve 18.3% lower execution time on average, with a maximum of 15% loss in accuracy.
Best in Signal and Information Processing
Resilient Communication Against Adversarial Path-Errors
We consider a communication problem in which a transmitter encodes a source into several streams that are subject to modification by an omniscient adversary while en route. More specifically, the encoder generates three distinct messages from the source and the adversary is only able to intercept up to one of these messages. This project aims to validate a novel coding technique, developed by Professor Wagner's research group, that provides a performance guarantee under the presence of such an adversary. MATLAB will be used to implement the communication system and to validate the Professor's codes. The data sources used for the project include audio, images, and video. The results of the project will include plots of distortion over the size of the stream and show that a mixture of highly redundant coding and uncoded transmission outperforms schemes that protect all of the transmitted data against adversarial attack. This project is in collaboration with The MITRE Corporation which shows that MITRE has interest in the Professor's results and could use it to further serve the public interest.