The purpose of the online quizzes is to ensure that you have
read and understood the papers in advance of class.
The quiz questions are not intended to be difficult or tricky;
the answers to the questions should be known or easily found
by anyone who has read the paper. However, the questions are
designed so that you cannot easily find the answers within five
minutes if you have not read the papers in advance. Hence
read the papers before attempting the quizzes.
Unit 1: Parallel Computing Models |
L1: Introduction |
L2: Message Passing & Shared Memory |
M. D. Hill, S. Adve, L. Ceze, M. J. Irwin, D. Kaeli, M. Martonosi, J. Torrellas, T. F. Wenisch, D. Wood, K. Yelick - 21st Century Computer Architecture, CCC Whitepaper, 2012 |
David Wood and Mark Hill, Cost-Effective Parallel Computing, IEEE Computer, 1995 |
L3: Data-level Parallelism |
Christina Delimitrou and Christos Kozyrakis. Amdahl's law for tail latency. Commun. ACM 61, July 2018 |
H Kim, R Vuduc, S Baghsorkhi, J Choi, Wen-mei Hwu, Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU), Ch. 1 |
L4: GPUs |
Tor M. Aamodt, Wilson Wai Lun Fung, Timothy G. Rogers, General-Purpose Graphics Processor Architectures, Ch. 3.1-3.3, 4.1-4.3 |
V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and Y. N. Patt, Improving GPU performance via large warps and two-level warp scheduling, MICRO 2011. |
Unit 2: Synchronization |
L5,L6: Synchronization |
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 1, 4.0-4.3.3, 5.0-5.2.5 |
Alain Kagi, Doug Burger, and Jim Goodman. Efficient Synchronization: Let Them Eat QOLB, Proc. 24th International Symposium on Computer Architecture (ISCA 24), June, 1997 |
L7: Transactional Memory |
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 9.0-9.2.3 |
Ravi Rajwar and James R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, Dec. 2001. |
Unit 3: Coherence and Consistency |
L8: Snooping Cache Coherence |
Michael Scott, Shared-Memory Synchronization Synthesis Lectures on Computer Architecture (Ch. 8-8.3) |
M. Herlihy, Wait-Free Synchronization, ACM Trans. Program. Lang. Syst. 13(1): 124-149 (1991) |
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence (Ch. 6 & 7) |
L9: Snoop-based Multiprocessors |
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. ISCA 2009 |
L10: Directory-based Coherence |
Chaiken et al., Directory-Based Cache Coherence Protocols for Large-Scale Multiprocessors, IEEE Computer, 19-58, June 1990. |
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence , Chapter 8 |
L12: Coherence Optimization & COMA |
A. Gupta et al. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". ICPP 1990. |
Fredrik Dahlgren and Josep Torrellas. Cache-only memory architectures. Computer 6 (1999): 72-79. |
L13-15: Memory Consistency |
Daniel J. Sorin, Mark D. Hill, and David A. Wood, A Primer on Memory Consistency and Cache Coherence, Ch. 3-4 |
L16: Release Consistency and Programming Language MCMs |
K. Gharachorloo, D. Lenoski, J. Laudon, P. B. Gibbons, A. Gupta, and J. L. Hennessy, Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors, ISCA 1990 |
H.J. Boehm, S. Adve, Foundations of the C++ Concurrency Model, PLDI 2008 |
K. Gharachorloo et al. "Two Techniques to Enhance the performance of Memory Consistency Models". ICPP 1991. |
C. Blundell, M. M. K. Martin, T.F. Wenisch, InvisiFence: Performance-transparent Memory Ordering in Conventional Multiprocessors, ISCA 2009 |
Unit 4: Interconnection Networks |
L17: Interconnects: Intro |
D. Lustig, M. Pellauer, M. Martonosi, PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models, MICRO 2014 |
C. Trippel, Y. A. Manerkar, D. Lustig, M. Pellauer, M. Martonosi, TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA, ASPLOS 2017 |
L18: Interconnects: Topology |
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 3 |
Kim, Dally, & Abts. Flattened Butterfly : A Cost-Efficient Topology for High-Radix Networks. ISCA 2007. |
L19: Interconnects: Routing |
Scott & Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus, Hot Interconnects 1996. |
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 4 |
L20: Interconnects: Flow Control |
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 5 |
L21: Interconnects: Router uArch |
On-Chip Networks, Synthesis Lecture, Jerger, Krishna, and Peh, Ch. 6 |
Kim, Dally, Towles, & Gupta. Microarchitecture of a High-Radix Router. ISCA 2005. |