Michael J. Swart

June 19, 2020

Problem Solving by Cheating

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:34 am

Solving real-world problems is different than answering interview questions or twitter polls. The biggest difference is that real problems aren’t always fair. There’s not always a right answer.

Answer this multiple choice question:

Which of the following SQL statements is used to modify existing data in a table?
A) SELECT
B) INSERT
C) DELETE

Give it some thought. Which option would you pick? The correct answer is UPDATE but it wasn’t one of the options listed and that’s not fair. But neither is real life. Many real problems don’t have an easy answer and some real problems are impossible to solve. That can be discouraging.

A blue shell from MarioKart is about to attack a runner. Sometimes life just isn't fair.

Real Problems Allow For Creativity

But if your problems are unfair, then maybe you’re allowed to cheat too.

“None of the above” is always an option. Understand the goal so that you can stretch or ignore requirements.

Example – Changing an INT to a BIGINT

I have a table that logs enrollments into courses. It’s append only and looks something like this:

CREATE TABLE dbo.LOG_ENROLL (
    LogId INT IDENTITY NOT NULL,  -- This identity column is running out of space
    UserId INT NOT NULL,
    CourseId INT NOT NULL,
    RoleId INT NULL,
    EnrollmentType INT NOT NULL,
    LogDate DATETIME NOT NULL DEFAULT GETUTCDATE(),
 
    INDEX      IX_LOG_ENROLL_CourseId    CLUSTERED    ( CourseId, UserId ),
    CONSTRAINT PK_LOG_ENROLL PRIMARY KEY NONCLUSTERED ( LogId ),
    INDEX      IX_LOG_ENROLL_UserId      NONCLUSTERED ( UserId, CourseId ),
    INDEX      IX_LOG_ENROLL_LogDate     NONCLUSTERED ( LogDate, LogId )
);

The table has over 2 billion rows and it looks like it’s going to run out of space soon because the LogId column is defined as an INT. I need to change this table so that it’s a BIGINT. But changing an INT to a BIGINT is known as a “size of data” operation. This means SQL Server has to process every row to expand the LogId column from 4 to 8 bytes. But it gets trickier than that.

The biggest challenge is that the table has to remain “online” (available for queries and inserts).

Compression?
Gianluca Sartori (spaghettidba) had the idea of enlarging the columns with no downtime using compression. It’s promising, but I discovered that for this to work, all indexes need to be compressed not just the ones that contain the changed column. Also, any indexes which use the column need to be disabled for this to work.

Cheating
I gave up on solving this problem in general and constrained my focus to the specific problem I was facing. There’s always some context that lets us bend the rules. In my case, here’s what I did.

Ahead of time:

  • I removed extra rows. I discovered that many of the rows were extraneous and could be removed. After thinning out the table, the number of rows went from 2 billion down to 300 million.
  • I compressed two of the indexes online (IX_LOG_ENROLL_UserId and IX_LOG_ENROLL_CourseId) because I still want to use the compression trick.

But I’m not ready yet. I still can’t modify the column because the other two columns depend on the LogId column. If I tried, I get this error message:


Msg 5074, Level 16, State 1, Line 22
The index ‘IX_LOG_ENROLL_LogDate’ is dependent on column ‘LogId’.
Msg 5074, Level 16, State 1, Line 22
The object ‘PK_LOG_ENROLL’ is dependent on column ‘LogId’.
Msg 4922, Level 16, State 9, Line 22
ALTER TABLE ALTER COLUMN LogId failed because one or more objects access this column.

So I temporarily drop those indexes!

  • Drop the constraint PK_LOG_ENROLL and the index IX_LOG_ENROLL_LogDate
  • Do the switch! ALTER TABLE LOG_ENROLL ALTER COLUMN LogId BIGINT NOT NULL; This step takes no time!
  • Recreate the indexes online that were dropped.

Hang on, that last step is a size of data operation. Anyone who needs those indexes won’t be able to use them while they’re being built.
Exactly, and this is where I cheat. It turns out those indexes were used for infrequent reports and I was able to co-ordinate my index rebuild around the reporting schedule.

You can’t always make an operation online, but with effort and creativity, you can get close enough. I have found that every real problem allows for a great degree of creativity when you’re allowed to bend the rules or question requirements.

3 Comments »

  1. […] Michael J. Swart finds a creative way around a problem, and then a creative way around the problem i…: […]

    Pingback by Expanding INT to BIGINT Without (Much) Downtime – Curated SQL — June 22, 2020 @ 8:00 am

  2. Michael, if i understand correctly what you are saying: you say increasing a colmn’s width forces a re-write of the whole table. if there is ennough empty space in this table the re-write takes no time. Correct?

    Comment by Herbert Tobisch — July 6, 2020 @ 12:22 pm

  3. Actually no, not quite.
    When an int is compressed then it is stored as a vardecimal. And so are bigints. So when the column is compressed, no extra space is needed. Think of it this way, say I want to store 5 digit numbers:
    00022, 02020, 55001
    If I wanted to store them as 10 digit numbers, that would need more space:
    0000000022, 000002020, 0000055001

    But if I compress them and get rid of leading zeroes, (like vardecimal format)
    Then I have
    22,2020,55001
    The existing data requires no new space.

    Now ints, and bigints are different than 5 or 10 digit numbers but the idea is the same.

    Comment by Michael J. Swart — July 6, 2020 @ 12:29 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress