ACM Prize in Computing
Canada - 2025
citation
For visionary development of distributed data systems and computing infrastructure, which has enabled large-scale machine learning and analytics at global scale.
Matei Zaharia’s contributions are best summarized as the visionary creation of distributed data systems and computing infrastructure that have fundamentally reshaped modern data processing and analytics in both research and industry. Zaharia’s work has identified practical bottlenecks that previously constrained entire fields and resolved them with rigorous intellectual depth, producing systems that became default building blocks rather than isolated demonstrations.
His most recognized breakthrough, Apache Spark, addressed the severe limitations of earlier big-data frameworks by introducing Resilient Distributed Datasets (RDDs), a novel shared-memory abstraction that enables efficient, fault-tolerant, in-memory computation. Spark’s design provided a flexible, unified engine that supports batch processing, streaming, interactive queries, and machine learning (ML) workloads. Spark has become the de facto standard substrate for large-scale analytics, adopted by thousands of organizations worldwide. He translated these innovations into widely deployed infrastructure through the founding of Databricks and sustained leadership in open-source ecosystems.
Building on Spark, Zaharia continued to solve core infrastructure problems created by the shift to cloud-scale data and ML. He co-developed Delta Lake, a transactional storage layer that brings strong reliability guarantees and performance-oriented data management to cloud object stores, effectively bridging the gap between data lakes and data warehouses. He also created MLflow, an open-source platform that systematizes the ML lifecycle—experiment tracking, reproducibility, model management, and deployment—helping teams operationalize machine learning across diverse frameworks and environments.
Through both conceptual contributions and durable adoption his work has advanced computer science while democratizing large-scale data and ML capabilities for organizations and practitioners around the world.
ACM Doctoral Dissertation Award
USA - 2014
citation
For his dissertation, "An Architecture for Fast and General Data Processing on Large Clusters," nominated by the University of California at Berkeley.