CNIT 56100

Advanced Parallel Data Systems

This course is a graduate level course that covers advanced topics in high performance, cluster, and grid computing in detail from a systems perspective. Topics covered in this course will focus on aspects of the design, implementation, and use of high performance storage systems progressively from the hardware layer through the operating system up to the application level. Topics will include: commodity hardware and novel architectural storage components; the architecture and use of parallel file systems, including PVFS2 and Lustre; reliability and scheduling; virtualization and fault tolerant strategies for Petascale computing; system architectures for data intensive computing and workflows; parallel I/O systems; and grid and cloud computing architectures.

This course also explores big data, typically defined by three main characteristics: volume, velocity and variety.  As such, there is significant complexity within the data and the underlying systems that host and analyze this information.  These systems demand accurate, timely, cost-effective, innovative forms of information processing for enhanced insight and decision-making at enterprise scales of operation.  This is also a rapidly developing and continually evolving process for analytics within organizations where simple data warehousing and data mining techniques of the past are no longer sufficient.  Due to these factors, this course will also address the architectures, algorithms, and implementations of popular Big Data systems, such as Hadoop/MapReduce, and will include analysis of real-world data sets, such as Twitter, Netflix, the Human Genome Project, U.S. Census Data, Google PublicData, etc.

Prerequisite:  Experience in Linux systems administration

This course is typically offered in the fall semester.

Contact Professor Tom Hacker for additional details.

Credit Hours: 
3

Department: 
Computer and Information Technology

Location: 
West Lafayette