Towards meta-modeling of workload performance in public clouds

Dr. Rizz

Predicting the workload behavior sets the expectations for execution time and cost [1]. In particular, this enables more reliable guarantees towards service level agreements (SLA) prior to any workload execution [2]. Knowing the workload behavior a priori is also very useful for many administrative tasks such as capacity planning, admission control and effective scheduling of workloads [3].

In my PhD, I developed performance models that predict throughput for transactions, and response time for queries [1]. These performance models are built for workloads executing at a data-service hosted in a public cloud. However, they are built offline, and are specific to requests and virtual machine (VM) types, hence require retraining for new requests or VM types. Further, my performance models provide raw predictions without expressing any confidence in them. This is an important issue since the errors are cumulative in my framework, presented in my previous post, that predicts dollar-cost of executing data-intensive workloads in the public clouds. I need some method of managing the errors across the framework components.

I seek performance models that express confidence in their predictions, and have the ability to reuse prior data and adapt online for unknown request or VM types. I see an online model satisfying these requirements, and came across Prof. Ng at Stanford’s lecture on the topic. While the online model may eventually evolve to an unknown environment, the evolution can be sped up by an “appropriate” initial state. Therefore, I also envision a meta-model that generates the initial version or the bootstrap of the online model given a resource configuration and a workload type. The meta-model is trained offline for different resource configurations and workloads containing different request types. Combinations of both resource configurations and workload types need not be exhaustive. This is because the online model adjusts to its environment, the configuration, and the workload type at run-time. The magnitude of the prediction errors might be large initially, but would reduce fast below acceptable threshold due to a suitable bootstrap. Both meta and online models are complementary and are particularly suited for a cloud environment. This is because a public cloud has many possible configuration types, and has a high level of variance [2].

[1] R. Mian, P. Martin and J.L. Vazquez-Poletti, “Towards Building Performance Models for Data-intensive Workloads in Public Clouds,” 4th ACM/SPEC International Conference on Performance Engineering (ICPE), ACM, 2013, pp. in press, Prague, Czech Republic.
[2] J. Schad, J. Dittrich and J.-A. Quiane-Ruiz, “Runtime measurements in the cloud: observing, analyzing, and reducing variance,” Proceedings of VLDB Endowment vol. 3, no. 1-2, 2010, pp. 460-471.
[3] M.B. Sheikh, et al., “A bayesian approach to online performance modeling for database appliances using gaussian models,” 8th ACM international conference on Autonomic computing (ICAC), ACM, 2011, pp. 121-130, Karlsruhe, Germany.