Cassandra An introduction
So far, we have discussed so far –
• What is Hbase?
• Why NoSQL required and where Hbase solves our purpose?
• What is CAP theorem?
Cassandra is an alternate to Hbase but with Hadoop it is not that much famous.
Topics – 1. What is Casandra 2. CAP theorem & Cassandra 3. How does Cassandra cluster looks like? 4. Tunable read/write consistency. 5. Diff between HBase and Cassandra
What is Casandra?
It is a distributed column-oriented database and highly performant.
Since distributed, so highly scalable – add few more nodes and it scales out.
When do we use Cassandra -> When we require transactional activities or quick searching – low latency retrieval of data, we can use Cassandra.
So far this is similar to Hbase.
CAP theorem & Cassandra (Eventual Consistency)
Hbase is CP system.
Cassandra is AP system.
Consistency is eventual – if you retrieve data it might respond in some time.
Consider you have LinkedIn profile and someone likes your posts.
Consider you have 1000 likes on posts, and after this 1 more person does a like – system shows 1001 likes, but sometimes it gives 1000 likes. That is not latest value.
That is, we can tolerate late information in this use case. Here getting a response, i.e. Availability is more important.
It won’t show error, and gives eventual consistency.
How does Cassandra Cluster looks like?
In case of Hbase, we run on Hadoop cluster, and like Hadoop, Hbase also follows Master-Slave architecture, wherein -
- HMaster = master
- Region Server = Slave
If master goes down, the system crashes ~ non-availability.
In Cassandra, it follows Peer-to-Peer architecture, wherein ->
- There is no master
- All nodes are peers
- It’s a Decentralized architecture
- Nodes communicate using Gossip Protocol.
A master slave architecture can be down at time when master fails.
In Cassandra, because of Decentralized Architecture, where there is no master, it is highly available.
Tunable Read/Write Consistency
Cassandra is a AP System.
It by default compromises on the consistency in order to be highly available.
Step 1: Client will send request to get value of A.
Step 2: The request will go to one of the machines, for instance node5.
Step 3: Node 5 will go and talk to Node 1 to get the results.
Step 4 : Node 5 will return the result to the client.
Cassandra provides you a tunable consistency through Quorum.
So, you can tune to get result -based on 3 approaches
- 1 node.
o Default mode.
o Availability is High + Consistency is Low
- Quorum based
o I want the result only when 2 nodes agree on the same result. Here Quorum = 2
o Availability is High + Consistency is moderate
- All Node based
o I want the result only when all nodes agree on the same result.
o Availability is Low + Consistency is High
Differences between Hbase and Cassandra
Similarities
- Both are NoSQL databases
- Both hold data in columnar fashion
- Both are highly scalable
- Used when you want to perform
- transactions (update/inserts)
- quick reads.
- Low latency operations
Differences
Hbase | Cassandra |
---|---|
master-slave architecture | decentralized architecture (so highly available as no dependency on single master) |
CP | AP + Tuneable Consistency (one node, all node, Quorum based) |
runs on top of Hadoop cluster - so data kept in HDFS | has a separate cluster |
preferred choice when working on Hadoop cluster | Not suited with hadoop |
Hbase can be accessed using Shell commands, which are somewhat hard to understand. You can use Apache Phoenix on top of Hbase to give you SQL-like interface. |
It has its own SQL-like syntax, called Cassandra SQL (CSQL). |