"Scale" is central to the arguments for and against cloud computing but, somewhat curiously, the arguers never seem to provide or agree upon its definition. Given its importance, in this posting I will try to explore the meaning of the term "scale," specifically in the context of cloud computing.
Does the system scale?
Consider the following question: "Does the freeway system scale?" If you have spent any time stuck in traffic you have undoubtedly wondered about the scalability of the freeway or roadway system. Most immediately, the question at hand is "How does the user experience on the freeway degrade as more users share the system?" The term "user experience" here is most often thought to refer to delay, but even for something as familiar as freeway congestion, delay is too simplistic. For example, some drivers prefer a route that is free of congestion over one that minimizes travel time with congestion. That is, a 30 minute trip in stop and go traffic may be less desirable than 40 minute trip to the same destination with no traffic.
Notice also that there are other legitimate user experience metrics that one might consider. Accident rate, for example, as a function of the number of drivers using the road at the same time is a different way to measure freeway scale. Indeed, drivers may consider a combination of both metrics when determining their respective frustration levels with the scalability of the freeway system.
Users, then, are primarily concerned with the experienced response of the system to changes in user load. The same concerns pertain to cloud computing. The scaling of a cloud with respect to shared user load in terms of observed user experience is paramount.
This user scaling is not the type of scaling that is most discussed in "the literature," however, with respect to clouds. Instead, most aficionados of cloud computing are concerned with how a cloud scales with respect to the resources it is managing.
Returning to the freeway example, consider freeway scaling from the perspective of the people who maintain the freeway system. They are certainly concerned with user experience, but usually as a function of resource count or resource capacity. That is, the freeway department thinks about how many lanes, off ramps, emergency shoulders, etc. it should add or remove from the freeway system in order to maintain some minimum level of user experience while, at the same time, optimizing the management burden and the budgetary considerations.
The analogy for cloud computing is resource scaling: the measurable effects on user experience, management burden, and budget of varying the number of resources (machines, disks, network devices, etc.) managed by the cloud platform. Resource scaling is the type of scaling most often discussed in the context of cloud infrastructure but it is only part of the overall scaling question.
For clouds, we believe that the bulk of the scaling discussions can be understood in fairly simple terms using two units of scale (resource count and request load) and three response measurements (observed response time, observed failure rate, and operational complexity).
Resource scaling collectively refers to the response of the system as the number of resources is increased. Typically server count is the primary unit of interest, but disk count, network switch and router count, and even power unit count are sometimes important. Users and administrators worry that response time will degrade, failure rate will increase, and operational complexity will increase as the units of scale are increased in a cloud. This latter concern, in particular, is especially important and often only implied. Many administrator APIs fail to reduce the marginal cost of adding resources to the cloud. If the complexity scales linearly or worse with resource count, the administrative burden of scale becomes too great. Thus the statement that "administrators must have a scripting interface" is really a statement about how a system scales in operational complexity as a function of resource count.
Transactional scaling refers to the response of the system to an increase in request load. It is this form of scaling that translates most directly to observed user experience. Requests for service from clouds are transactional. Each request requires an unambiguous response even if the overall effect is asynchronous (e.g. a VM will start request is acknowledged even when the start will not occur until some time in the future). A cloud must be able to maintain acceptable levels of response as the number of simultaneous transactions presented to it increases.
In short form, then, understanding how a cloud platform scales requires an understanding of its resource scale and transactional scale. Individual users and/or administrators set thresholds on response time, failure rate, and operational complexity when determining of the scaling response is sufficient for their respective needs.
Going Deeper or, Perhaps, Wider
If you are a little squeamish when it comes to logical abstractions (who isn't?) or just pressed for time, feel free to skip the remainder of this diatribe in which I will attempt to put cloud scaling on a more concrete footing. Also, apologies to those who will rightly point out that this is not a fully rigorous exposition. One is possible but even more tedious.
To begin with, it doesn't make much sense to talk about scale without discussing the units of scaling. That is, the word "scale" refers to the scale of something: physical servers, virtual machines, users, network connections, disk volumes, power distribution units, etc. A system "scales" in terms of one or more measurable quantities that can be varied by the user or operator of the system.
Secondly, to discuss scale in more than a colloquial way, it is necessary to define a measure of the system's behavior as the units of scale are varied. For example, many users are interested in how the observable response time (a measure of system response) is affected by the introduction of more machines (a unit of scale). Any measurable response exhibited by the system is suitable, but typical scale response measurements include user request response time, request throughput, network bandwidth (point-to-point or aggregate), network latency, disk throughput, etc.
Thus the question "Does the cloud computing system scale?" is really short-hand for a significantly more complex question that can be formulated as
"How do measurements of one or more behaviors of the system vary as one or more units of scaling are varied?"
Providing a reasonable answer to a well-formed question about scaling often requires some careful thought. For example, consider the question of whether Amazon's AWS "scales." It is tempting to consider only the number of physical servers as a measure of the scale of AWS. Clearly Amazon has committed some large number of physical servers to AWS (the exact number is justifiably held secret) but what is the behavioral scaling response implied by the question "Does AWS scale?" It turns out that AWS attempts to maintain a maximum response time of 60 seconds for any request. That does not imply that a VM will start in 60 seconds or less‚ only that the system will be able to generate a response within 60 seconds no matter how many physical servers AWS is managing.
Less well understood is the scaling response of AWS to fluctuations in user requests. Amazon's goal is to support as many simultaneous users accessing AWS as possible while maintaining the 60 second maximum response time for all requests. Thus AWS must simultaneously "scale" with respect to offered request load and resource count in terms of response time.
As a result the scaling graph of observed response time is three dimensional: response time versus the independent scaling units of resource count and request load. That is, there is a response time measurement that can be gathered for every combination of machine count and request load. Gathering this data completely is impractical so the typical methodology is to collect a series of measurements and to intuit ("interpolate" for those with a mathematical bent) the missing measurements from the observed data. Even for response time, which is critical to user experience, clearly the question of scale is one that is more complex than can be answered with a simple "yes" or "no."
Failure rate is another measurable response to fluctuations in units of scale that is often implied but less often stated. As resource count or request load is increased, for example, the rate at which individual requests succeed or fail fluctuates. IT administrators are often more concerned with this form of scaling than user response time as a non-functional system is "worse" than a slow one, particularly for those responsible for maintaining it. For the purpose of analyzing what is meant by "scale" however, this failure rate scaling as a function of resource count and request load is another three-dimensional trajectory that answers the question "Does the system scale?"
Notice also that three-dimensional trajectories are poor answers to "yes" or "no" questions. What is happening here is that the question "Does the system scale?" refers to a set of limits in the scaling response that differentiate "yes" from "no." These limits, however, are subjective. For example, one user may deem any request requiring more than 60 seconds of delay before a response as being ‚too slow‚ For this user, the system does not scale if the addition of resources causes a degradation of response time beyond 60 seconds. Another user with a 90 second threshold, however, will deem the system to "scale" as long as the 90 second limit is not exceeded.
- Scaling is a relationship between measurable quantities.
- Units of scale define the quantities (typically system components) that can be varied by the user.
- Response measurements record the system's behavior as scaling units are varied by the user.
- There are many different units of scale and different response measurements that are relevant for any given system.
- Often there is a relationship between response measurements that prevents them from being considered independently.
- Answering the question "Does the system scale?" with a "yes" or "no" requires subjective definitions of acceptable ranges of measurable response.
It seems counter-intuitive (to me, anyway) for a term used so ubiquitously to have such a complex and technical underlying meaning. This complexity seems to be at the heart of much of the confusion about scale when discussing cloud computing. When we, at Eucalyptus, work with our users, however, we keep these complex relationships in mind when we provide them with software that "scales."