Title:

Advanced Computer Architectures

Code:ARP
Ac.Year:ukončen 2006/2007
Term:Summer
Curriculums:
ProgrammeBranchYearDuty
EI-MSC-3VTN3rdCompulsory
EI-MSC-5VTI2nd Stage/3rd YearCompulsory
Language:Czech, English
Private info:http://www.fit.vutbr.cz/study/courses/ARP/private/
Credits:6
Completion:accreditation+exam (written)
Type of
instruction:
Hour/semLecturesSem. ExercisesLab. exercisesComp. exercisesOther
Hours:39160010
 ExaminationTestsExercisesLaboratoriesOther
Points:60100030
Guarantee:Dvořák Václav, prof. Ing., DrSc., DCSY
Lecturer:Dvořák Václav, prof. Ing., DrSc., DCSY
Instructor:Dvořák Václav, prof. Ing., DrSc., DCSY
Faculty:Faculty of Information Technology BUT
Department:Department of Computer Systems FIT BUT
 
Learning objectives:
  To familiarize students with architecture of the newest processors exploiting the instruction-level parallelism and its impact on a compiler design. To make them understand features of parallel systems which make use of functional parallelism at a process- or thread-level and also data parallelism.
Description:
  The course covers architecture of processors and parallel systems. Instruction- and thread-level parallelism (ILP, TLP) is studied on scalar, superscalar, VLIW and multithreaded processors. Next, in the context of process-level parallelismus, the most frequently used bus-based symmetric multiprocessors are dealt with. Then the treatment of interconnection networks follows, as a base of systems with a distributed shared memory (NUMA) and of multicomputers with local memories, especially popular clusters of workstations and massively parallel systems. The last part is devoted to parallel vector processors and SIMD-style processing (data parallelism).
Knowledge and skills required for the course:
  Von-Neumann computer architecture, computer memory hierarchy, cache memories and their organization, programming in assembly and in C/C++, compiler's tasks and functions
Learning outcomes and competences:
  Overview of processor microarchitecture and its future trends, principles of parallel system design and interconnection networks, ability to estimate performance of parallel applications.
Syllabus of lectures:
 
  • Function- and data-level parallelism, performance figures and speedup laws.
  • Pipeline instruction processing and instruction dependencies. Typical CPU architecture (DLX).
  • FP unit. Eliminating instruction dependencies. Loop-level parallelism, branch prediction.
  • Superscalar CPU. Dynamic instruction scheduling, register renaming, ROB, speculation.
  • Relaxed models of memory consistency. VLIW processors, software pipelining, predication.
  • Thread-level parallelism, support in hardware. Multithreaded processors.
  • Shared memory architectures. Bus scalability, memory organization, cache coherence.
  • MSI and MESI cache coherence protocols. Synchronization of events in multiprocessors.
  • Interconnection and switching networks. Features and specs, routing, control, group communications.
  • Distributed shared memory architectures, shared virtual memory.
  • Message passing architectures. Hardware support for communication, overlapping communication and computation.
  • Data-level parallelism, vector processors and instructions. SIMD machines and SIMD-like processing. Systolic structures.
  • Accelerators and specific architectures for ANN, architectures of future CPUs.
Syllabus of numerical exercises:
 
  • Efficiency and speedup of parallel applications, Amdahl's and Gustafson's laws.
  • Instruction dependencies and hazard elimination at pipeline instruction processing, loop unrolling.
  • Superscalar processing.
  • Midterm examination.
  • VLIW and software pipelining.
  • Multithreading, SMT.
  • Shared memory, bus scalability, SM-system performance.
  • Parameters of interconnection networks, routing algorithms.
  • Vector processors, duration of vector operations.
Syllabus of laboratory exercises:
 
  • Pipelined instruction processing in DLX CPU. (WinDLX)
  • Bus-based, shared memory multiprocessor: bus saturation.
  • Parallel matrix transpose on a cluster of workstations.
  • A power of a matrix using a processor pipeline and sw pipelining.
Fundamental literature:
 
  • Culler, D.E. et al.: Parallel Computer Architecture. Morgan Kaufmann Publishers, 1999, 1025 s., ISBN 1-55860-343-3. 
  • Hennessy, J.L., Patterson, D.A.: Computer Architecture - A Quantitative Approach. 3. vydání, Morgan Kaufman Publishers, Inc., 2003, 1136 p., ISBN 1-55860-596-7.   
Study literature:
 
  • Hennessy, J.L., Patterson, D.A.: Computer Architecture - A Quantitative Approach. 3. vydání, Morgan Kaufman Publishers, Inc., 2003, 1136 p., ISBN 1-55860-596-7.   
Progress assessment:
  Assessment of four small projects, a midterm examination.