Overly Complex Views, Procedures And Functions

December 4, 2013

Overly Complex Views, Procedures And Functions

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 4:03 pm

Takeaway: I define a metric which indicates a code-smell for overly complex views, procedures and functions: “FROM” count.

In the programming world, developers have defined metrics for how complex a piece of code is. For example, Visual Studio defines several metrics that are meant to give developers some idea about how complex their code is getting. These measures won’t be perfect because code complexity is not something that can be measured directly. But many think that these measures indicate complexity often enough to be useful.

Code Smells

That’s what a code smell is meant for. My own definition is that a code smell is an anti-rule-of-thumb <cough>cursors!</cough>. It doesn’t necessarily indicate something’s rotten, but like a bad smell, it’s worth checking into.

Measuring Complexity in SQL Modules

And here’s my idea about measuring complexity for SQL. It’s based on the premise that complex procedures or views will have many queries/subqueries and that most queries have FROM clauses. So what are my procedures/views/functions that may be too complex? Let’s find out:

DECLARE @Odor NVARCHAR(30) = N'FROM';
 
with
   L0 as (select 1 as C union all select 1)       --2 rows
  ,L1 as (select 1 as C from L0 as A, L0 as B)    --4 rows
  ,L2 as (select 1 as C from L1 as A, L1 as B)    --16 rows
  ,L3 as (select 1 as C from L2 as A, L2 as B)    --256 rows
  ,L4 as (select 1 as C from L3 as A, L3 as B)    --65536 rows
  ,Nums as (select row_number() over (order by (select 0)) as N from L4)
SELECT 
    OBJECT_SCHEMA_NAME(m.object_id) as SchemaName,
    OBJECT_NAME(m.object_id) as ObjectName, 
    count(1) as OdorCount
FROM Nums
CROSS JOIN sys.sql_modules m
WHERE Nums.N < LEN(m.definition)
    AND SUBSTRING(m.definition, Nums.N, LEN(@Odor)) = @Odor
GROUP BY m.object_id
ORDER BY count(1) desc, object_name(m.object_id) asc

I’ve found the metric slightly better than the simpler “procedure size” metric:

SELECT 
    OBJECT_SCHEMA_NAME(m.object_id) as SchemaName,
    OBJECT_NAME(m.object_id) as ObjectName, 
    LEN(m.definition) as ModuleSize
FROM sys.sql_modules m
ORDER BY LEN(m.definition) desc, object_name(m.object_id) asc

Try it out on your own environments and let me know if it identifies the monster procedures you know are lurking in your database.

Thanks to Adam Machanic for the substring counting syntax.

Performance Improvement!

Update December 5, 2013: In the comments, George Mastros provided a simpler and faster version of this query which does the same thing:

DECLARE @Odor NVARCHAR(30) = N'FROM';
 
Select	OBJECT_SCHEMA_NAME(object_id) As SchemaName, 
        OBJECT_NAME(object_id) As ObjectName, 
        (DataLength(definition) - DataLength(Replace(definition, @Odor, ''))) / DataLength(@Odor) As OdorCount
From	sys.sql_modules
Order By OdorCount DESC;

This is much simpler and much faster. What’s extra interesting is that George’s query itself has an odor count of 2 while my original one had a count of 7. Thanks so much George!

-- Comments (21)

21 Comments »

Just found a 1176 lines monster I didn’t knew about, “OdorCount = 164″…is to bad doctor? :O

Comment by Yarik — December 5, 2013 @ 2:50 am
Thank you, Michael. Very interesting approach to complexity issue.

Comment by Sergio — December 5, 2013 @ 8:45 am
@Sergio Thanks!

I’m getting other feedback that the metric seems to measure complexity fairly well.

Comment by Michael J. Swart — December 5, 2013 @ 9:36 am
@Yarik,

Wow, 164 is crazy! If there aren’t false positives. I mean if you’re not dealing with column names that also contain “FROM” like “FromAddress” or “ExcludeFromProcessing”, then yes, 164 is probably beyond any reasonable threshold.

I’m glad you found something.

Comment by Michael J. Swart — December 5, 2013 @ 9:40 am

Wow. You just scared the stuffing out of me. I ran this code and had a procedure come up with an odor count = 114. The odd thing is that the proc is only 450 lines long, so I would consider it “medium complex”. After carefully looking at your code and mine, I realized that you are “simply” counting the occurrences of from within the code.

By the way, I think the following code generates the same results but executes a bit quicker. The counting in this code is accomplished by using the replace function and calculating the difference in lengths. Ex: Replace(‘abc’, ‘a’, ”). Original length = 3, new length = 2, (3 – 2) = 1, therefore 1 occurrence found.

Select	OBJECT_SCHEMA_NAME(object_id) As SchemaName, 
        OBJECT_NAME(object_id) As ObjectName, 
        (DataLength(definition) - DataLength(Replace(definition, N'From', ''))) / DataLength(N'From') As OdorCount
From	sys.sql_modules
Order By OdorCount DESC;

Comment by George Mastros — December 5, 2013 @ 9:42 am

Ouch! Didn’t update my comment. I’m not into digging too much into this sp as is a legacy code and, as usual,there are other priorities. But taking after spending some minutes on the code now the OdorCount is “just” 149. Still bad but I can’t spend much time on it. By the way, the solution proposed by George Mastros is indeed faster and throws same results.

Comment by Yarik — December 5, 2013 @ 9:55 am
Thanks so much George, I’ve updated the post with using your feedback.

Comment by Michael J. Swart — December 5, 2013 @ 9:55 am
Adding an sp’s number of lines column helps too:
,(LEN(definition) - LEN(REPLACE(definition, CHAR(10), ''))) AS LINES_OF_CODE,
,(LEN(definition) - LEN(REPLACE(definition, CHAR(10), ''))) AS LINES_OF_CODE,
Comment by Yarik — December 5, 2013 @ 10:17 am
It absolutely does Yarik, It helps give a bit of context to the results. In my case, it helped me sort out complex procedures from procedures that are merely really big.

Comment by Michael J. Swart — December 5, 2013 @ 10:23 am
Another metric that came to mind, would be to compare the # of FROM’s to the # of JOIN’s. To me, knowing that ratio is also a good “smell”. 1-5 FROMs but 50 JOINs? Yeah… 🙂

Comment by Andy Yun — December 5, 2013 @ 11:05 am
Oh yeah, nothing stinks like a query that’s getting too “Joiny”

Comment by Michael J. Swart — December 5, 2013 @ 11:08 am
Top 10:
180, 177, 150, 147, 144, 143, 142, 139, 136, 132

Comment by Jeff Cheney — December 6, 2013 @ 3:16 pm

Here is another version of the query. I work on a mapping application so I have a lot of columns named things like LeftFromHouseNumber, RightFromHouseNumber, etc… All of these were getting added to the counts, so rose smelling code was being reported as stinky socks.

;With Data As
(
Select	OBJECT_SCHEMA_NAME(object_id) As SchemaName, 
        OBJECT_NAME(object_id) As ObjectName, 
        Replace(Replace(Replace(definition, Char(9), N' '), Char(13), N' '), Char(10), N' ') As Definition
From	sys.sql_modules
)
Select	SchemaName, 
        ObjectName, 
        (DataLength(definition) - DataLength(Replace(definition, N' From ', ''))) / DataLength(N' From ') As Complexity
From	Data
Order By Complexity DESC

This code is a bit slower than the code I posted earlier, but it is also a lot more useful to me and I thought I would share. Basically, this code replaces tabs, carriage returns, and line feeds with a space character. It then only counts occurrences of from that have a space before and after it.

The interesting thing about this is that I got a bug report with the 4th most complex procedure today (complexity = 71).

Lastly… the only false positives that I can think of that would be included in the count would be queries that are commented out.

Comment by George Mastros — December 6, 2013 @ 6:30 pm

Wow George,
That’s above and beyond. It certainly gives something closer to a true “FROM” count.
You may see some other false positives when coders don’t rely on whitespace to separate tokens. e.g.:
SELECT * FROM[USERS]
SELECT * FROM[USERS]
But aside from uncommon whitespace habits and aside from code in comments (which is it’s own smell), it’s quite good at getting an accurate “FROM” count.

Comment by Michael J. Swart — December 9, 2013 @ 10:17 am
Did you ever run into function point analysis?
http://en.wikipedia.org/wiki/Function_point

SQL is in the set of languages with scoring data:

http://www.qsm.com/resources/function-point-languages-table

Comment by Joe Celko — December 17, 2013 @ 10:40 am
Hi Joe,

I have to admit that function point analysis is new to me. But it’s interesting, the wikipedia article suggests that function points can be used to “incentivize developers to be more productive”. I’m not sold on the idea of measuring productivity with function points because according to function point advocates:

More function points == More functionality delivered == More productivity

But according to me:

Fewer function points == Less complex code == Better quality

I think simpler is better here. In practice, I would never use these measures to evaluate developers. I do advocate using these measures as a tool for identifying code that may be overly complex. Since all these measures are imperfect any way, I’ll likely stick with the simpler “FROM count” for now.

Comment by Michael J. Swart — December 17, 2013 @ 11:27 am
A 100 years ago I worked for a consulting/software development company that used function point counting to determine the cost of their projects. The more function points, the higher the cost. I’m not sure how effective it was because that company closed their doors (they did have a pretty good run though).

Comment by George Mastros — December 17, 2013 @ 12:17 pm
[…] Michael J. Swart – Overly Complex Views, Procedures And Functions […]

Pingback by SQL Server Radio: Show 8 – Works Great with NOLOCK — December 21, 2013 @ 4:29 pm
[…] so, this technique is not always appropriate. I’ve seen it work best on complicated queries (How can you if it’s complicated?). And I’ve seen this work best against large datasets (processing millions of rows for […]

Pingback by SQL Simplicity Methods | Michael J. Swart — January 6, 2014 @ 9:19 am
Hi,

Might it make sense to add any or all of these Counts: # of Joins (as suggested above), # of of parameters, and # of “control-of-flow” statements (IF, WHILE, FOR…). Perhaps if all of these were included one would have to determine a discount factor for double counting (i.e. there would be some correlation between # parms and # of control-of-flow statements).

2nd Question: Have people used Function Point analysis in combination with unit test count to estimate code coverage and compute a C# style % coverage.

thx

Comment by lorrin — August 20, 2014 @ 4:31 pm
Ironically when I ran the check Adam’s sp_whoisactive comes out an order of magnitude greater than any of my own routines at a whopping 150 ” FROM ” count using George’s routine above.
Just going through your past posts Michael – a treasure trove. Thank you.

Comment by Alex Weatherall — November 28, 2018 @ 6:27 am