This reference architecture is specifically for users who require a production Eucalyptus cloud environment to serve a medium to large scalable web-services use case using a combination of low cost commodity hardware augmented with enterprise grade hardware components where necessary. The ultimate size of the deployment is limited to the capacity described later in the document, and can be expanded through careful deployment and hardware placement planning.
Note: This reference architecture has 2 variations:
Reference Architecture Sections
Jump to a specific section below:
- Use Case: Scalable Web Services
- Physical Resources
- Deployment Topology
- Data Center Management
- Summary Considerations
Download the Scalable Web Services Large Reference Architecture
The benefits of choosing this architecture are:
- Composed of a combination of commodity and enterprise grade servers, networks, and storage devices for high performance, robust deployments
- Provides self-service environment for fast, stable allocation and execution of scalable Web services workloads and applications
- Designed to be able to scale up as workload/application demands increase
- Accommodates HA deployment option, for increased stability in the face of physical resource failures
This document is intended for readers who have familiarized themselves with AWS terminology, the Eucalyptus Installation/Admin/User Guides, and have experience implementing production data center solutions based upon Linux environments.
The following diagrams outline the general logical view, as well as the layout of physical servers, networks, and the Eucalyptus components of this reference architecture.
The purpose of this deployment is to cover a large (or starts small, capable of growing to larger) scalable Web services (SWS) use case, using enterprise grade hardware. Generally, for scalable Web services, the deployment will support a number of production, largely independent web-service applications running in parallel within the Eucalyptus cloud.
The SWS use case is one where there are several applications that need to run simultaneously within the cloud, each composed of potentially many different varieties of instances (LAMP stack, for example). These applications may be designed to scale up and down as application workload demand changes over time. There are expected to be several application stacks running within the cloud simultaneously, with each belonging to a relatively small number of cloud accounts. While there is expected to be a fair amount of 'churn' for each application with regard to spinning up and tearing down individual virtual machine instances as workload fluctuates, the SWS application use case is expected to be one where the application itself (minimum number of VM instances needed to consider the application as 'available') is long-lived. In other words, this deployment is designed to permit long-running application services with fluctuating sizes (number of VMs composing the entire application).
This use case is expected to have a 'medium to large' number of virtual machine images (EMIs), a 'medium to large' reliance on EBS volumes to store/provide access to static application data, and a 'small' number of Boot from EBS instances, which should only be used in cases where a certain aspect of the application demands a static server that is not intended to be a part of the dynamic 'scaling' of applications themselves (i.e. static servers in support of several applications, etc.).
The following list describes the workload capacity that deployments based upon this reference architecture can support. It should be noted that some of these capacity boundaries can be exceeded by deviating from the architecture, with readers being encouraged to Contact Eucalyptus for information on designing production Eucalyptus deployments.
- Max of 512 running virtual machine instances
- Four clusters composed of maximum 32 nodes
- Max of 16 virtual machines per node
- Max of 128 simultaneous attached elastic block storage volumes per cluster
- Max of 1024 independent active users (max of 64 accounts, each with max of 16 users)
The following is the minimum resource requirements for the physical servers, networks, and storage that are needed to support this architecture. For each category of physical resource, more resources than the minimum will not have a negative impact on the deployment (more cores, more RAM, more local disks, faster interfaces, higher bandwidth networking, more disk capacity, etc.).
Minimum Front-end/Middle-tier Server Configuration
- 4 or more modern cores
- 16 or more GB of RAM
- 80 or more GB RAID 1/5/6 local disk for OS/Eucalyptus (see below for special storage requirements, based on component)
- Network: (see below for special network requirements, based on component)
Minimum Node Server Configuration
- 8 or more modern cores
- 32 or more GB of RAM
- 80 or more GB RAID 1/5/6 local disk for OS/Eucalyptus (see below for special storage requirements, based on component)
- Network: see below for special network requirements, based on component
- Network 1: Cloud Controller/Walrus/Cluster Controllers/Storage Controllers on 'public' user network. Cluster Controller and Walrus connected at 10GB, others are connected at 1GB. If more 10GB networking is available, it should be placed on the Storage Controller, Node Controllers, other components (in that order).
- Network 2: Cluster Controller/Node Controllers on 'private' 1GB cluster network
- Network 3: Node Controller/SAN network is 10GB
- Local RAID disk storage for Walrus and Node Controllers
- Walrus: 1 or more TB of storage, RAID1/5/6
- Walrus capacity impacts the total number of template images and S3 accessible data that is available
- Node Controller: 400 or more GB RAID1/5/6
- Node Controller capacity impacts the total size of all instance store images that can run concurrently on a single node and the total number of images that can be cached on a single node
- RAID volumes are separate from OS disk (partition/volume) for all servers with Eucalyptus accessible RAID volumes
- Supported SAN device accessible to each cluster's Storage Controller
The following is a description of the Eucalyptus platform topology atop physical resources. For this use case, the topology is designed to allow for a minimum of servers used for the Eucalyptus platform, while providing enough capacity to give acceptable performance up to the specified maximums defined at the beginning of this document.
Eucalyptus Component Topology
Each server in the above physical model diagram will be running one or more Eucalyptus software components which together form the Eucalyptus platform. Listed here are the mappings of physical server to Eucalyptus component, where each server is configured to conform to at least the minimum requirements for servers defined previously.
- Front-end machine 1: Cloud Controller
- Front-end machine 2: Walrus
- Front-end machine 3: User Console
- Cluster machine 1: Cluster Controller
- Cluster machine 2: Storage Controller
- Cluster SAN: EMC, NetApp, Equallogic
- Node machine 1-32: Node Controller x 32
Eucalyptus Configuration Options
The Eucalyptus platform is highly configurable, covering a wide variety of data center topologies, devices, software management systems and network/security policies. For this reference architecture, we list here certain fundamental configuration options which will provide the necessary service of the reference architecture balanced against minimal performance and management overhead. Please refer to the Eucalyptus Installation and Admin Guides for information on how to implement these configurations.
- Networking mode: MANAGED
- Public addresses: max number of virtual machines + 64/cluster
- Security group size: minimum 32, maximum 512
- Storage Controller Driver: SAN (EMC, Equallogic, NetApp)
- High Availability: no
- Linux Distribution: CentOS 6 + KVM
- Java components (CLC, Walrus, SC): configured to run with increased heap size (60% of total available memory)
Eucalyptus includes a number of features that are in place to support specific aspects of production deployments that may or may not be required based on the user's preferences and constraints. Listed here are descriptions of some of these features as they apply to this particular reference architecture. Please refer to the Eucalyptus Installation and Admin Guides for information on how to implement these features, if required.
- Reporting feature should be lightly used (configured to either be disabled or to poll at infrequent intervals) for this architecture. If it is a requirement of the deployment to supply fine grained or long-term reporting information, a data warehouse topology (extra machine) should be added to the deployment, with tooling in place to enforce periodic export/flush of reporting data to the warehouse.
- LDAP integration should be implemented only if required.
The sections that have been covered up to this point in this reference architecture have been outlining the design of a Eucalyptus software deployment along with a definition of minimum physical resource capacity and configurations. Next, we address additional technologies and techniques that surround the Eucalyptus software/hardware itself which are required to run a complete Eucalyptus private cloud in production.
Services provided by the Eucalyptus private cloud software platform:
- EC2-compatible private cloud virtual machine management platform
- S3-compatible storage platform
- Eucalyptus end-user Web based GUI console
- Eucalyptus end-user and admin CLI tools
- Service of creating, managing, and cleaning up virtual machines and related resource artifacts (EBS volumes, virtual networks, etc.)
- Eucalyptus service troubleshooting and problem resolution
Additional required services:
- Data-center server, network, storage, OS installation system
- Physical machine health and status monitoring
- Automatic resource performance monitoring and load-balancing
- Virtual machine, storage, network performance optimization
- Linux Distribution OS software and configuration management
- Dynamic deployment topology/physical infrastructure re-configuration
The Eucalyptus cloud platform software that provides AWS-compatible infrastructure as a service must be integrated with standard data center configuration, management, and monitoring software for production use. Each Eucalyptus component runs as a Linux process that must be configured through both configuration files and run-time configuration parameters, and must additionally be monitored along with physical resource health and status characteristics. There are a variety of User Interfaces that are available for use with your Eucalyptus deployment, including those that are included as part of the Eucalyptus platform as well as third party API, command-line and graphical interface software that is AWS compatible.
While the Eucalyptus software does not currently include the deployment of configuration management or system health/status monitoring solutions itself, there are several third party solutions that existing production deployments rely upon to perform these functions.
Production deployments based on this reference architecture should include the use of a third-party configuration management system in order to ensure that Eucalyptus configuration is correct both for initial deployment as well as under cases where a particular Eucalyptus server and software must be re-deployed.
Several options exist, and here we list those which are produced by organizations who have partnered with Eucalyptus to provide high quality integrations.
- Ansible configuration management and orchestration tool (find examples here on Github)
- Puppet Labs configuration management system
- Opscode Chef configuration management system
For an example of how to use/integrate Eucalyptus with Puppet, please refer to the following resources:
In addition to automated/controlled configuration management, a production Eucalyptus deployment based on this reference architecture should also be monitored via a third party solution to watch the health and status of the deployment, as well as to notify the cloud administrator when unexpected conditions are occurring. Basic monitoring includes but is not limited to:
- Physical resource availability (network ping and/or ssh access to physical servers running Eucalyptus components)
- Physical resource load
- Physical resource faults (as indicated by Linux fault notification mechanisms)
- Eucalyptus component faults (please refer to the Eucalyptus Admin Guide for information on monitoring for Eucalyptus faults)
For an example on how to set up an integrated Nagios with Eucalyptus environment, please refer to the following resource:
There are several other solutions for monitoring physical and software components of a data center, and here we list those which are developed by Eucalyptus partners:
As an AWS-compatible platform, Eucalyptus offers both a variety of user interface tools as well as the option to use third party AWS-compatible interfaces that interoperate with AWS and Eucalyptus. For information on installing and using the interfaces that are included by default with Eucalyptus, please refer to the Eucalyptus Install, Admin and User Guides.
- Eucalyptus Admin CLI tools (included with Eucalyptus, see the Admin Guide, and Command-line Reference Guide)
- Euca2ools CLI Guide (included with Eucalyptus)
- Eucalyptus User Console Guide (included with Eucalyptus)
- Eucalyptus and Enstratius (included with Eucalyptus subscription)
There are many other AWS-compatible user interface tools targeted at specific feature sets that are compatible with Eucalyptus:
- s3cmd - for managing AWS S3 and CloudFront, and Eucalyptus Walrus services.
In addition to monitoring and managing the deployment's physical resources, application workload images and workflows must also be managed and configured. The Eucalyptus platform offers AWS-compatible APIs and services which allow external workload management systems to interoperate with AWS and Eucalyptus, and works to ensure that VM image environments between AWS and Eucalyptus are interoperable.
- Eucalyptus User Guide (see the Using Images section)
- Eucalyptus Starter Images
- AMI2EMI Project: AWS image to Eucalyptus image conversion tools
The reference architecture presented here is meant to encapsulate a bounded production Eucalyptus system. As with all use cases, there are variations that cannot reasonable be generalized, but we add here some comments and observations that will help to tune the individual use case variations to achieve efficient, stable performance within Eucalyptus.
- Keep individual system load low. If physical systems are over-provisioned with virtual machines (whether it be too many VMs running on a single system or few but resource intensive VMs that interfere with one another), the underlying operating system and Linux dependencies can become fragile and difficult to debug. Eucalyptus has many features that are designed to function even if the underlying system is underperforming and/or misbehaving, but it is always best to provide Eucalyptus and your workload environments enough resources to function smoothly.
- Consider bottlenecks. When designing a deployment, deciding on capacity to be provided to your applications and making capacity and performance hardware decisions, it is best to consider the data paths that Eucalyptus either provides or works in concert with at run-time. Please refer to the Datapaths series of diagrams to aid in identification of potential shared resource bottlenecks.