Why Your Cassandra Delete Queries Didn’t Free Up Disk Space (And How to Actually Reclaim It)

So you just ran a bunch of DELETE statements in Cassandra, expecting your disk usage to drop dramatically… and nothing happened.

Welcome to Apache Cassandra — where deleting data doesn’t actually mean deleting data.

Why Cassandra Doesn’t Immediately Free Disk Space

In Cassandra, when you delete data, it doesn’t get physically removed right away. Instead, Cassandra writes a special marker called a tombstone.

Think of tombstones like sticky notes saying:

“This data is deleted… but don’t clean it up just yet.”

These tombstones are necessary because Cassandra is a distributed system and needs to ensure deletes are properly propagated across replicas before physically removing anything.

So what actually happens?

Your DELETE creates tombstones
Tombstones are written during regular writes or compaction workflows
Actual disk space is only reclaimed later during compaction
Depending on workload, this can take hours or even days

You can make use of these tools provided by Cassandra to trigger cleanup sooner.

Step 1: Flush Memtables to Disk


nodetool flush <keyspace> <table>

What this does:

This forces in-memory data (memtables) to be written into SSTables on disk. It ensures everything is persisted to disk so compaction can work with it.

It was observed to usually take only a few seconds to execute even for tables around 350gb in size.

Step 2: Trigger Garbage Collection via Compaction


nodetool garbagecollect <keyspace> <table>

What this does:

This is essentially a manual compaction-triggered cleanup process that:

Processes SSTables
Drops expired tombstones (based on GC grace period)
Rewrites data into new SSTables
Removes old SSTables when safe

The time taken to finish this process can depend on how large the amount of data that was deleted.

Important note: You will need to have a fair amount of free storage in the server for this garbagecollect command to work as it rotates SSTables and when it starts rewriting, it may allocate some more storage space on the server, growing the table in size! But it will eventually drop and reflect the true size after the data deletion

You can track progress using the following command:

nodetool compactionstats

Do not however be fooled by the progress completion that shows in this output as there can be multiple iterations of these compaction jobs one after another.

Another Important note: Do not use watch nodetool compactionstats

We once experienced a case where this watch command had not exited after ctrl + c input and it started abnormally driving up the server CPU consumption and made it idle around 50-60%. Normal usage for us on this server was under 10%.

DevOps Practices

Menu

Why Your Cassandra Delete Queries Didn’t Free Up Disk Space (And How to Actually Reclaim It)

Why Cassandra Doesn’t Immediately Free Disk Space

So what actually happens?

Step 1: Flush Memtables to Disk

What this does:

Step 2: Trigger Garbage Collection via Compaction

What this does:

0 Comments

Popular Posts

Azure Architecture Fundamentals

Connect to an Azure VM Using RDP

Set Up SSH Key Authentication on Linux (Step-by-Step Guide)

Explore Tags

Contact form

DevOps Practices

Menu

Why Your Cassandra Delete Queries Didn’t Free Up Disk Space (And How to Actually Reclaim It)

Why Cassandra Doesn’t Immediately Free Disk Space

So what actually happens?

Step 1: Flush Memtables to Disk

What this does:

Step 2: Trigger Garbage Collection via Compaction

What this does:

You may like these posts

0 Comments

Popular Posts

Azure Architecture Fundamentals

Connect to an Azure VM Using RDP

Set Up SSH Key Authentication on Linux (Step-by-Step Guide)

Explore Tags

Contact form