The amount of information residing only in the cloud is relatively small today, especially compared to what it will be in less than two years, when a predicted 10 percent of all data will be maintained in a cloud. Moving to a cloud-based infrastructure requires choosing a database that can fully use all the benefits the cloud provides: transparent elasticity, transparent scalability, high availability, strong security, easy data distribution, data redundancy, support for all data formats, simple manageability, and low cost.
The amount of information that currently resides only in the cloud is small, but that’s about to change. A recent study by IT industry analyst group IDC estimates that cloud computing accounts
for less than 2 percent of IT spending today, but by 2015, nearly 20 percent of all information will be “touched” (stored or processed) in a cloud. Moreover, IDC predicts that by that same year, as
much as 10 percent of all data will be maintained in a cloud. Despite the growing movement toward cloud computing, some IT professionals remain standofﬁsh toward the idea of porting a company’s data onto a public cloud computing platform such as Amazon, Rackspace, and others. This position is understandable, given the current confusion over whether running a database in a cloud environment actually delivers tangible beneﬁts – technical and otherwise –over keeping that same data on-premise.
Whether deciding to move a small or signiﬁcant amount of data to a cloud database, today’s IT decision-makers need to understand whether the solution they’re considering is designed and/or
implemented in a way that utilizes all the beneﬁts and promises of cloud computing. This paper examines those key characteristics and discusses how Apache Cassandra™ stacks up from an evaluation perspective.
Why Move to a Cloud Database?
First, it should be understood that a cloud database is more than simply taking traditional relational database management system (RDBMS) software and running an instance of it on a cloud platform such as Amazon. Such a deployment in no way maximizes the capabilities of a cloud-computing environment.
But what constitutes a cloud-ready database? What features and functionalities must the database have to deliver on the potential that cloud computing offers? What follows is a discussion of some of the key promises of the cloud and the types of features a database should have to supply real beneﬁts in a cloud environment.
- The Cloud Promises Transparent Elasticity
- The Cloud Promises Transparent Scalability
- The Cloud Promises High Availability
- The Cloud Promises Easy Data Distribution
- The Cloud Promises Redundancy
- The Cloud Promises Support for All Data Types
- The Cloud Promises Easier Manageability
- The Cloud Promises Lower Cost
What Is Apache Cassandra?
Apache Cassandra is a highly scalable and high-performance distributed database management system that excels at being a real-time datastore (i.e., the “system of record”) for online/transactional applications that need extremely fast read and write operations. Cassandra can manage the distribution of data across multiple data centers and offers incremental scalability with no single point of failure.
Cassandra was originally incubated at Facebook and is based upon Google’s BigTable and Amazon’s Dynamo software. The end result is an extremely scalable and fault-tolerant data
infrastructure that solves both small and big data problems, handles write-intensive user trafﬁc, delivers sub-millisecond caching layer reads, and supports demanding workloads involving
petabytes of data.
Cassandra is built with the assumption that failures can and will occur in a data center or cloud infrastructure. Therefore, data redundancy to protect against hardware failure and other data loss
scenarios is built into and managed transparently by Cassandra. Furthermore, this capability can be conﬁgured so that big data applications can use a single large database distributed across multiple, geographically dispersed data centers, between different physical racks in a data center, and between public cloud providers and on-premise managed data centers.