Designing business-critical applications in the cloud
Perhaps no business service is more critical than telecommunications; reliability and quality of a phone system are absolute requirements. Over the past decade, internet-based VoIP (voice over internet protocol) has taken its place as a viable means of communication for individuals and has emerged as an increasingly popular means of business communications, leveraging the cloud. This is in part due to all the well-known advances in Internet communications reliability and quality, but is also being fueled by a new generation of more reliable and more robust cloud system design patterns and best practices.
These cloud design patterns and techniques are not unique to communication services, they are the same as those used by many popular cloud service providers. A set of best practices in system design have emerged that take advantage of increasingly abundant resources like computer and network capacity along with reasonably reliable open source software infrastructure, while allowing service providers to best leverage their scarcer resources such as development and operations personnel. Cloud platform providers such as Amazon, Rackspace, Google and Microsoft are allowing a new generation of service providers to build, grow, and maintain highly-reliable and high-quality services at a very affordable cost.
Contained units of clustered services
Cloud computing systems achieve greater performance by leveraging a greater amount of computer resources, meaning more CPU, more memory, and more disk. This is done, of course, not by utilizing larger servers, but by utilizing a greater number of small- and medium-sized servers which are networked together.
In the ideal design, each server is a specialized entity, running one system service and providing a small set of well-identified system functions. These specialized servers are then grouped together into a unit, sometimes referred to as a cluster or a pod, where the specialized servers networked together provide a defined set of user or application services. For example, a communications cluster might consist of registration servers, call-control servers, monitoring servers, queue servers, logging servers, and cache servers. Together, these servers would comprise a complete communications cluster capable of processing user phone calls.
Specialized servers configured into a clustered unit can then be deployed repeatedly, providing unit-level horizontal scalability and redundancy. Deployment and updating of these clusters can also be automated.
Throttled and tested capacity limits
Each specialized server can be tested to specific capacity limits. For example, a database server or a message queue server can be tested to a certain number of transactions. Likewise, each cluster of specialized servers can also be tested to specific capacity limits for the user or application services provided. For example, a communications cluster can be tested for the number of concurrent audio calls it can handle.
For scalability and planning, it is critical to know the tested capacity of each cluster. If a communications cluster is tested and can handle 1,000 concurrent audio calls, and the peak load is 5,000 concurrent calls, then five cluster units (each capable of 1,000 calls) should be deployed.
For reliability, it is critical to not only know the tested capacity of each cluster but to implement throttles. If a communications cluster can handle 1,000 concurrent calls, throttles should be implemented to ensure that capacity will not exceed that known and safe limit. Excess calls can be rerouted as discussed below, but throttles ensure that no unit will exceed its safe capacity.
Operating within known and tested capacity limits, and employing throttles to ensure limits, is also recommended for the individual specialized servers within the clusters. Many open source infrastructure systems, including application servers, caches, messaging servers, database and other storage systems, have fairly well-established guidelines for capacity limits and include throttling capabilities.
Finally, all capacity limits should be monitored, and alerting should be implemented whenever system activity approaches the established limit thresholds.
DNS for failover and load-balancing
Depending on the nature of the business service provided, DNS (domain name system) can be used in a variety of ways to provide the initial level of load balancing across the unit clusters. Third-party managed DNS providers can enable traffic management, so users accessing the business service can be routed to alternate unit clusters by geography or load.
It also may make sense to preassign users or groups of users to specific unit clusters, sometimes referred to as sharding. In this case, DNS can also be used as an effective means to route users to their preassigned cluster. Each user group or unit cluster can be assigned a distinct URI (uniform resource identifier), and DNS can be used to map the URI to the specific locations. This approach provides flexibility for fast reassignment or rebalancing, simply by changing DNS records.
DNS and standard internet-client routing can also be used to provide an effective means for high-availability and failover. Third-party managed DNS providers can enable automatic failover, so if a unit cluster is down (or at full capacity) service requests can be redirected to an alternate unit cluster. Many internet client devices and applications can also support primary and secondary locations. For example, a VoIP phone can have a primary URI and a secondary URI to use if the primary is unavailable.
Overall, the kind of cloud design techniques discussed here are allowing the latest generation of cloud services to achieve scalability while meeting the performance and reliability demands of businesses and users.