Michael J. Swart

May 20, 2015

It’s Hard To Destroy Data

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Had I been born later, I probably would have gone with Wreck-It Ralph It is surprisingly difficult to delete data permanently. SQL Server has sophisticated features that prevent all kinds of data loss, but the opposite is not true. SQL Server has very few features that help you destroy data.

But why would anyone need to destroy data? Lots of reasons.

  • Some laws and data retention policies require that data older than a certain age gets destroyed.
  • Developers often ask for production data, or a subset. A DBA could decide to give them a backup having first dropped certain tables or columns that contain personally identifiable information.

There are countless other reasons why you might want to destroy data and I’m going to focus on three general use cases: destroying tables, destroying columns and destroying rows.

Straight Up Deleting Is Not Enough

I’m going to demonstrate that deleting data is not equivalent to destroying data. Deleting data is just hiding it.

First create a table, populate it and see what it looks like on disk:

CREATE DATABASE DeletedDataDemo;
GO
 
USE DeletedDataDemo;
 
SELECT col1, col2, col3 
INTO dbo.Table1
FROM 
( 
  VALUES (1, 'aaaaaaaaaaaa', 'bbbbbbbbbbbb'),
         (2, 'cccccccccccc', 'dddddddddddd'),
         (3, 'eeeeeeeeeeee', 'ffffffffffff')
) data (col1, col2, col3)
 
DBCC TRACEON (3604);
DBCC IND(DeletedDataDemo, 'dbo.Table1', 1) -- file 1, page 216 for example
 
--update the following command to look at that page
DBCC PAGE (DeletedDataDemo, 1, 216, 2);

Notice that the last command DBCC PAGE shows the contents of the table as it appears on disk.

Data dump

That data is still there

And now drop the table that was just created:

DROP TABLE dbo.Table1;
 
-- or
-- ALTER TABLE dbo.Table1 DROP COLUMN col3;
 
-- or
-- DELETE dbo.Table1 WHERE col1 IN (2,3);
 
DBCC PAGE (DeletedDataDemo, 1, 216, 2);

Run the DBCC PAGE command and notice that the data values are still there! That data cannot be queried but it remains on disk. Even though this data is in deallocated space, this data can survive backups and restores (even after checkpoints).

Repeat the demo from the beginning. But this time, instead of dropping the table, try deleting rows or dropping a column. Just like the dropped table the data remains on disk.

Try to Destroy Data by Cleaning Pages

So SQL Server doesn’t actually delete data, it hides it and deallocates the space. A ghost cleanup job can eventually clean up this data but you can’t depend on how frequently it runs. That leads to stackoverflow questions like this one: How can I securely destroy some data using sql server 2008?

One of the answers there mentions that sp_clean_db_free_space can force the cleanup to run early. Cleaning pages is exactly what’s required to get rid of that data in deallocated space. But cleaning pages consumes a lot of I/O and maybe that’s why SQL Server 2008 introduced a lot of control over the granularity of cleaning pages:

  • Use sp_clean_db_free_space for a whole database.
  • Use sp_clean_db_file_free_space for a single data file.
  • Use DBCC CLEANPAGE(@dbid, @fileid, @pageid) after a CHECKPOINT for a single page.

These commands force SQL Server to write zeros to any deallocated space.

Mostly…

Dropped-Column Data Sticks Around

Columns are a special case. SQL Server doesn’t reclaim the space that is freed by dropped columns. And this dropped column data is not touched by any of the page cleaning methods. In order to destroy this data, it’s necessary to add a few manual steps. These steps can destroy column data by either overwriting column values or by rebuilding its table’s indexes.

How To Destroy Data

Here’s a table that shows the syntax to use when trying to destroy data.

When destroying … Use this syntax
Tables
DROP TABLE dbo.Table1;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Rows
DELETE dbo.Table1 WHERE col1 = @col1;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Columns
(except columns in heaps)
ALTER TABLE dbo.Table1 DROP COLUMN Col3;
ALTER INDEX ALL ON dbo.Table1 REBUILD;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Columns
(alternative syntax)
UPDATE dbo.Table1 SET col3 = N'';
ALTER TABLE Table1 DROP COLUMN Col3;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Other data structures
(Columnstore, Full-text, XML indexes, Service Broker Queues etc…)
Not evaluated. I don’t know how long this data sticks around.

The Lessons

Microsoft warns you that these data cleaning procedures are very I/O intensive so I would be reluctant to suggest these methods when alternatives exist.

The clumsiness of the dropped column scenario makes me wonder about other scenarios where SQL Server deletes data but doesn’t destroy it.

And this leads directly to this bit of advice:

Don’t distribute backups of databases that have contained sensitive information.

Powered by WordPress