SheerMP: Optimized Streaming Analytics-as-a-Service over Multi-site and Multi-platform Settings Full text

George Stamatakis, Antonis Kontaxakis, Alkis Simitsis, Nikos Giatrakos, Antonios Deligiannakis
Abstract. Analytics are in the core of many emerging applications and can greatly benefit from the abundance of data and the progress in the processing capabilities of modern hardware. Still, new challenges arise with the extreme complexity of deciding how to execute analytics workflows given the plethora of choices of various cloud providers, the fragmented nature of diverse Big Data technologies, and the difficult task of resource provisioning to dynamically satisfy the demands of running streaming analytics over time. In this paper, we demonstrate a prototype system that optimizes streaming analytics workflows across Big Data platforms and computer clusters. Our system is the first that (i) considers a multi-user setup, (ii) examines the availability of multiple (potentially, geo-dispersed) compute choices, and (iii) provides a holistic framework covering a wide variety of practical optimization and adaptive resource allocation scenarios over a variety of streaming Big Data platforms.