Archive for June, 2008

Working On SQL Admin Hacks, Tips and Tricks

I haven't posted for a while because I have been working on SQL Admin Hacks, Tips and Tricks lately. it is still a work in progress but below is what is on the wiki currently. It is not yet categorized but we will do that once we get more of these hacks done. To see what it will look like when it is done take a look at the SQL Server Programming Hacks

Can you think of any admin stuff you would like to see? This is what we have right now

Find Primary Keys and Columns Used in SQL Server
Get The Domain Name Of Your SQL Server Machine With T-SQL
Grant Execute/SELECT Permissions For All User Defined Functions To A User
Grant Execute Permissions For All Stored Procedures To A User
Kill All Active Connections To A Database
SQL Server 2008: When Was The Server Last Started?
Check If Auto Update Statistics Is Enabled By Using DATABASEPROPERTY
Three Way To List All Databases On Your Server
Generate A List Of Object Types By Using OBJECTPROPERTY
How to find all the tables and views in a database
Find Out Server Roles For a SQL Server Login
Which Service Pack Is Installed On My SQL Server
Test SQL Server Login Permissions With SETUSER
Login failed for user 'sa'. Reason: Not associated with a trusted SQL Server connection. SQL 2000
Login failed for user 'sa'. Reason: Not associated with a trusted SQL Server connection. SQL 2005
Compare Tables With Tablediff
Find All Tables Without Triggers In SQL Server
Find All Tables With Triggers In SQL Server
Create Stored Procedures That Run At SQL Server Startup
Cycle The SQL Server Error Log
How to read sql server error messages
Use OBJECT_DEFINITION To Track SQL Server Stored Procedure Changes
SQL Compare Without The Price Tag
How To Get The Database Name For The Current User Process
How To Find Out Which Columns Have Defaults And What Those Default Values Are
Fixing Cannot add, update, or delete a job that originated from an MSX Server Error after renaming a server

My Latest Work

Sky Lynn Reed's 0th birthday is 5/15/2008 at 5:44pm, when she weighed 8 pounds, 0 ounces. A nice round binary number. Our first born! There are simply no words to describe the experience. We debated on whether we should have kids for a long time. After...(read more)

The Golden Rule of Data Manipulation

Introduction

There is a very simple rule when it comes to storing (and returning) data, which I see violated all the time, making life so much more complicated for everyone involved.  In case you haven't noticed, that's a common theme I discuss here on this blog -- different ways programmers make life more difficult for themselves, instead of simply following good practices and doing things the easy way.  This is yet another example of that situation.

The "Golden Rule of Data Manipulation" is a simple, but important rule that you should always follow when designing a database,  writing database code, or really writing any application code at all for that matter:

"It is always easier and more flexible to combine data elements rather than to break them apart"

In other words: Concatenation is easy. Parsing is hard.  Often, very hard -- or even impossible depending on the data.

Problems with Parsing

It is amazing how often I see people struggling with "difficult SQL problems" such as:
  • Working with CSV lists of values in a single column, such as "1,3,56,2"
  • Breaking out a FirstName/MiddleName/LastName/Suffix from a single "Name" column
  • Parsing address strings into City/State/ZIP, or Number/Street/Unit
  • Parsing Phone Numbers to get just an area code, or to take different phone formats and present them all uniformly
  • Figuring out how to calculate the Day,Month, and/or Year from different string values such as "23-Jan-08", "2008-02", "20070303", "03032007"
And on and on it goes....

Now, sometimes you inherit or import data that needs to be parsed -- that's a fact of life.  You've got to figure out how to do it, and the key in those cases is to accept that because the data itself is essentially random, nothing you can write will perfectly work 100% of the time on all of it.  Often, the best you can do is handle most of the data, and then do some manual clean up. 

Parsing strings can be a very difficult task for any programmer, and the challenge isn't writing the code, it's coming up with the algorithm (another common theme on this blog).  Consider my new favorite example of why parsing a single Name column into a First/Middle/Last is not as easy as it seems:

    Oscar De La Hoya

How would your algorithm parse that one?  Never mind prefixes such as "Dr." and suffixes such as "Jr."! 

Please don't interpret what I am saying as a programming challenge -- I understand that it is possible to write long code with a list of exceptions or rules and have that algorithm work pretty well in most cases.  The point is that writing that algorithm is a lot of work, running it will be inefficient, and it will never be exact because the data itself that is being processed is essentially random.  It's just like the old saying: "garbage in, garbage out".  Still one of my favorites, after all these years, and it still applies!

A Data Model that requires Parsing = A Poor Data Model

So, we need to accept that sometimes you've got to parse data like this.  And that's OK; it happens, it can be done, even if some manual work is often involved.

However, there's no excuse when you design your database, your SQL code, or applications so that free-form data must be parsed, when you can simply design it correctly in the first place and store your data already broken out into the smallest possible units with the correct data types.

If breaking out a contact's name into First, Last, Middle, etc is important to your application, then you should force the point of data entry to accept input broken out into those columns.  The same goes for phone numbers, addresses, and so on.  Any time you have the option of accepting  input as clean, short, raw, segments of data you should always do it.   Once you have data at that smaller resolution, it is trivial to combine it any way that you want for presentation, formatting, filtering, and so on.

It may seem like overkill to break out a phone number into 4 columns:

AreaCode
Exchange
Number
Extension
   
And, in fact, it might be more complicated than that if you need to deal with international phone numbers.  You may look at your tables, and your code, and even the UI that accepts these fields and think "that is way too precise and unnecessary, breaking out phone numbers like this sure makes things complicated!"

But by doing this, and only accepting user input that follows precise rules of what is allowed in these fields, and storing each of them in their own column, you can now easily and efficiently:
  1. Sort these numbers any way you want, without worrying about extra characters like parenthesis or dashes, or leading 1s, messing things up
  2. Filter quickly on an area code without the need to use LIKE, and again worrying about extra characters getting in the way
  3. Present the phone number quickly and easily any way you want without any parsing, be it as 123.123.1345 x123 or "(123) 123-1345 extension 123", or anything you want.
  4. Validate your phone numbers, ensuring you have all the necessary parts and they are the proper length, without worrying about parsing strings
Considering doing any of those things if your data is stored in random strings like:

1-123-124-1234
(123) 124-1234
123-124-1234, ex. 123
123.124.1234 x123
(123)124.1234 ext. 123
1 123 124 1234 123
   
and so on ...  Not so easy in that case, just as parsing simple "Name" columns into First/Last, or addresses into  Number/Street/Unit is not so easy as well. 

Again, this is not a programming challenge -- I am sure it can done. (In fact, phone numbers are generally the easiest because you can usually just ignore anything other than digits.)  Most of us have done it before.  But designing something in such a way that parsing is required to do simple filtering, sorting, or formatting, is a bad design

It's Not About the UI

As I wrote here, you should never think "I want to display phone numbers like 123.123.1234, so I should store them and return them that way."  You should always think "How can I break this down into small, concrete parts that are easily validated and easy to combine any way I want at any time?"

So, what if you need fine detail when storing addresses, but you don't want your UI to present Street Number, Street Name, Unit Type, Unit Number as different data entry fields for usability or aesthetic reasons?  That's fine, but that doesn't mean you should not set up your database properly.  Your UI can certainly still present that one single "Address" text box for the user to fill out, parse that text at data entry, show the user the parsed result in multiple fields, and ask "Please verify for your address" or something along those lines.   Then, if not, the user can tweak the results and save it.  If you do things along those lines, and focus on getting the data parsed and stored correctly at the earliest point possible, every other part of your code will be that much more efficient.

All of this applies not only to data storage, but to how data is returned and passed between tiers as well.  Again, if you just return separate columns to your client application, instead of focusing on making them "look nice" in your database code by returning nothing but long, "pre-formatted" strings, your client can simply concatenate and format those columns any way it needs.  And, different clients can format that same database output in different ways -- all without ever altering any database code! 

Conclusion

In short, remember that writing concatenation is easy, efficient, and exact.  Writing a parsing routine, on the other hand, is often none of those things.   You may not always be able to control the design of the data you are working with, but be sure that when you can, you do it right.  If you find yourself using lots of LIKE expressions, or string parsing for simple data retrieval operations, something is wrong.   Time to fix up your database and your code, store the parsed and validated data permanently, and make things easier and cleaner for everyone.

Whether you are designing a schema, writing a SELECT, or writing code in any other programming language, remember that the Golden Rule of Data Manipulation always applies.  Accept this rule, learn from it, and practice it, and you might be surprised to find that programming isn't quite as hard as you thought it was.



SQL Server 2008 Release Candidate 0 Available To The General Public For Download

SQL Server 2008 Release Candidate 0 has been made available to the general public.

There are 2 versions (and 3 flavors of each) ISO or DVD

Download it here: http://www.microsoft.com/downloads/details.aspx?FamilyId=35F53843-03F7-4ED5-8142-24A4C024CA05&displaylang=en

Don't forget to also visit the SQL Server 2008 Release Candidate 0 connect site here: https://connect.microsoft.com/SQLServer/content/content.aspx?ContentID=5395

365 Tips for Green IT

COMING SOON!  If you are interested in Green IT, this is a not-to-miss eBook.




eBook Tip Categories

  • Developing a Green IT Strategy
  • Governance
  • Lifecycle Management
  • Energy
  • Paper
  • Electronic Waste
  • Water

Register to receive our Newsletter (right) and we'll let you know when the eBook is out!


  • Sponsored Links

  •  

    June 2008
    M T W T F S S
    « May   Jul »
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    30  
  • .

    Copyright © 1996-2010 Answer My Query. All rights reserved.
    iDream theme by Templates Next | Powered by WordPress