Today’s Data Science: Expectations and Constraints

Jayadev AcharyaHuman expectations from modern data science and machine learning systems are growing. We demand results: as fast as possible; using as little data as possible; in as little space as possible; with as little communication with other entities as possible; leaking as little personal information as possible; and importantly, as accurately as possible. These constraints, however, are often at odds with each other. A system that provides strong privacy guarantees might require more data and computation, and a system that uses little data might require more computation. In spite of many success stories of data science, these trade-offs are poorly understood even in some of the simplest settings.

Jayadev Acharya, Electrical and Computer Engineering, is formulating and studying fundamental trade-offs between these resources, as well as design efficient schemes that achieve them. This is critical for tackling the many challenges in data science that lay ahead.

The project’s outcomes will help design faster, communication-frugal, privacy-preserving, and space-efficient learning systems.

A particular interest is the impact of the availability of shared randomness on the other constraints for distributed machine learning systems. While the role of randomness has been studied in communication complexity problems, its role in machine learning systems is often overlooked. Acharya is integrating ideas from computer science, information theory, machine learning, and statistics in order to bridge researchers from these communities. He is also working with a diverse group of researchers through outreach activities that target undergraduate students and underrepresented communities.

Original article by Cornell Research

Image credit (graphic): Beatrice Jin

Other Articles of Interest