Assured Cloud Computing Weekly Seminars Slides and Video Spring 2017

  • Posted on January 19, 2017 at 12:50 pm by whitesel@illinois.edu.
  • Categorized Uncategorized.
  • Comments are off for this post.

Trustworthy Services Built on Event-based Probing for Layered Defense  slides | video
Read Sprabery, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
February 1, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Numerous event-based probing methods exist for cloud computing environments allowing a hypervisor to gain insight into guest activities. Such event-based probing has been shown to be useful for detecting attacks, system hangs through watchdogs, and for inserting exploit detectors before a system can be patched, among others. Here, we illustrate how to use such probing for trustworthy logging and highlight some of the challenges that existing event-based probing mechanisms do not address. Challenges include ensuring a probe inserted at given address is trustworthy despite the lack of attestation available for probes that have been inserted dynamically. We show how probes can be inserted to ensure proper logging of every invocation of a probed instruction. When combined with attested boot of the hypervisor and guest machines, we can ensure the output stream of monitored events is trustworthy. Using these techniques we build a trustworthy log of certain guest-system-call events. The log powers a cloud-tuned Intrusion Detection System (IDS). New event types are identified that must be added to existing probing systems to ensure attempts to circumvent probes within the guest appear in the log. We highlight the overhead penalties paid by guests to increase guarantees of log completeness when faced with attacks on the guest kernel. Promising results (less that 10% for guests) are shown when a guest relaxes the trade-off between log completeness and overhead. Our demonstrative IDS detects common attack scenarios with simple policies built using our guest behavior recording system.

Prioritization of Cloud System Monitoring  
Uttam Thakore, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
February 15, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Rapid identification of and response to incidents is a costly but necessary part of ensuring the reliability and security of large-scale enterprise cloud systems. This functionality requires efficient analysis of heterogeneous monitor and log data, which becomes increasingly challenging as systems grow in size and complexity. In this talk, we describe a novel method for prioritizing the collection and analysis of monitor data in enterprise clouds for incident analysis. In particular, we use statistical correlation analysis to construct a graph of time-lagged correlation relationships between heterogeneous data sources in the system, and use the strength of correlation along paths in the graph to prioritize which data sources an administrator should analyze when performing incident analysis. We discuss our current results in evaluating our approach on incidents in an IBM enterprise cloud and how well our approach identifies the data sources that provide evidence of behavior that causes the incidents.

Label-based Defenses Against Cache Side Channel Attacks in PaaS Cloud Infrastructure  slides | video
Konstantin Evchenko, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
March 15, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Cache side channels pose a serious risk to cloud computing environments due to multi-tenancy. With the move to containers this risk has been exacerbated as more multi-tenancy is possible when compared to VM situations.

We introduce a label based defense for protecting against cache-based side channels that target container based PaaS infrastructures. Our approach is a novel combination of hardware enforced spatial separation and software enforced temporal separation of labeled containers on shared resources.

We present the implementation of this defense as a series of modifications to popular existing platforms that are used to deploy cloud services. Unlike many previous works, our approach does not require modifications to the existing hardware and client software, allows hyperthreading to remain enable and can be quickly deployed in the cloud as a part of scheduled software upgrade routine. We evaluate both the effectiveness and the overheads of our approach using representative cloud workloads.

Exploring Design Alternatives for the RAMP Transaction Systems Through Statistical Model Checking   slides | video
Si Liu, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
March 29, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: In this work we explore and extend the design space of the recent RAMP (Read Atomic Multi-Partition) transaction system for large-scale partitioned data stores. Arriving at a mature distributed system design through implementation and experimental validation is a labor-intensive task, which means that only a limited number of design alternatives can be explored in practice. The developers of RAMP did implement and validate three design alternatives: RAMP-Fast, RAMP-Small, and RAMP-Hybrid. They also sketched three additional designs and presented some conjectures about them. This work addresses two questions: (1) How can the design space of a distributed transaction system such as RAMP be systematically explored with modest effort, so that substantial knowledge about design alternatives can be gained before designs are implemented? and (2) How realistic and informative are t he results of such design explorations? We answer the first question by: (i) formally modeling eight RAMP-like designs (five by the RAMP developers and three of our own) in Maude as probabilistic rewrite theories, and (ii) using statistical model checking of those models to analyze key performance metrics such as throughput, average latency, and actual degrees of strong consistency and read atomicity. We answer the second question by showing that the quantitative analyses thus obtained for these models: (i) are consistent with the experimental results obtained by the RAMP developers for their implemented designs; (ii) they confirm the conjectures made by the RAMP developers for their other three unimplemented designs; and (iii) they uncover a new design, our proposed RAMP-Faster design, that outperforms all other designs for several key properties, such as latency, throughput and consistency, while providing read atomicity for 99% of the transactions.

Trustworthy Services Built on Event-based Probing for Layered Defense   slides | video
Read Sprabery, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
March 29, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Numerous event-based probing methods exist for cloud computing environments allowing a hypervisor to gain insight into guest activities. Such event-based probing has been shown to be useful for detecting attacks, system hangs through watchdogs, and for inserting exploit detectors before a system can be patched, among others. Here, we illustrate how to use such probing for trustworthy logging and highlight some of the challenges that existing event-based probing mechanisms do not address. Challenges include ensuring a probe inserted at given address is trustworthy despite the lack of attestation available for probes that have been inserted dynamically. We show how probes can be inserted to ensure proper logging of every invocation of a probed instruction. When combined with attested boot of the hypervisor and guest machines, we can ensure the output stream of monitored events is trustworthy. Using these techniques we build a trustworthy log of certain guest-system-call events. The log powers a cloud-tuned Intrusion Detection System (IDS).

This talk will focus on the algorithm for proper insertion of dynamic probes and on the structure and effectiveness of layered policies.

Formalizing Hardware-Assisted Virtualization Behavior to Verify VM Monitoring Frameworks  slides | video (audio only)
Lavin Devnani, Electrical and Computer Engineering Research Assistant, University of Illinois at Urbana-Champaign
April 5, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: This paper presents an approach to verify virtual machine monitoring frameworks by formalizing guest and hypervisor behavior. We model components of guest environments that are exposed to monitoring frameworks during VM transitions. In addition, we model execution flows at the guest user and guest kernel levels that lead to VM transitions. Explicit-state model checking and state space searches are used to verify monitor properties specified as LTL formulae. We apply this model to verify correctness and security properties of monitors specified under frameworks like hprobes and HyperTap.

Getafix: Workload-aware Distributed Interactive Analytics  slides | video
Mainak Ghosh, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
April 12, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Distributed interactive analytics engines (Druid, Redshift, Pinot) need to achieve low query latency while using the least storage space. This paper presents a solution to the problem of replication of data blocks and routing of queries. Our techniques decide the replication level of individual data blocks (based on popularity, access counts), as well as output optimal placement patterns for such data blocks. For the static version of the problem (given set of queries accessing some segments), our techniques are provably optimal in both storage and query latency. For the dynamic version of the problem, we build a system called Getafix that dynamically tracks data block popularity, adjusts replication levels, dynamically  routes queries, and garbage collects less useful data blocks. We implemented Getafix into Druid, the most popular open-source interactive analytics engine. Our experiments use both synthetic traces and production traces from Yahoo! Inc.’s production Druid cluster. Compared to existing techniques Getafix either improves storage space used by up to 3.5x while achieving comparable query  latency, or improves query latency by up to 60% while using comparable storage.

Deep Learning Inference as a Service  slides | video
Mohammad Babaeizadeh, Computer Science Research Assistant, University of Illinois at Urbana-Champaign
May 3, 2017, 4:00 p.m., 2405 Siebel Center

Abstract: Deep Learning technologies are showing up in a vast number of industrial areas, from real-time speech translation and smart cities to self-driving cars and drug discovery. The increasing number of these models being utilized for numerous applications demands for a scalable, high efficient inference mechanism capable of serving the ever growing number of queries.

However, unlike deep model development and training which is supported by sophisticated infrastructure and systems, model deployment and inference have received little attention. Currently, developers must combine the necessary pieces from various system components to support the inference, and often opt-out of shared resources, which makes the whole process highly error-prone and costly.

Compared to other computations in cloud computing, serving a model is unique in various ways. First, it is compute-intensive and often need a coprocessor, which results in a more complex framework. Second, unlike training, it is a real-time service with a tight service level objective. Lastly, inference on co-processors such as GPU has a non-linear performance model with respect to input size, which needs a more complicated scheduler.

In this talk, I will discuss the intriguing computational aspects of deep neural networks at inference time and how they can be exploited to design and implement a scalable deep learning inference service. Such cloud-based service enables customers to easily deploy pre-trained deep models at scale while maintaining a high utilization of available resources to minimize the service cost.