Nemo | A Data Processing System for Flexible Employment With Different Deployment Characteristics.

What is Nemo? [ˈnemoʊ]

Nemo is a data processing system for flexible employment with different execution scenarios for various deployment characteristics on clusters. They include processing data on specific resource environments, like on transient resources, and running jobs with specific attributes, like skewed data. Nemo decouples the logical notion of data processing applications from runtime behaviors and express them on separate layers using Nemo Intermediate Representation (IR). Specifically, through a set of high-level graph pass interfaces, Nemo exposes runtime behaviors to be flexibly configured and modified at both compile-time and runtime, and the Nemo Runtime executes the Nemo IR with its modular and extensible design.

Flexible

Nemo offers flexible adaptation to your desired execution environment. Examples of such execution environments include using transient resources, disaggregation of different computing resources, and handling skewed data.

Modular and Extensible

Nemo is designed to be modular and extensible for even more variety of execution scenarios and deployment characteristics. Users with specific needs can plug in and out the required components and execute their jobs accordingly.

Runs Everywhere

Nemo is able to run Apache Beam™ programs using our runtime, and Apache Spark™ programs in the near future. Moreover, by using Apache REEF™, Nemo enables data processing possible on different resource managers including Apache Hadoop™ YARN or Apache Mesos™.