so this has been a pet peeve of mine for a while. lack of data normalization of ANY sorts. people will argue on how many levels of data normalization you should go through to make your data good. I will hold my opinion to the end.this info was stolen from wikipedia.
-1NF: Chaos (no normalization/data standardization)* UNF: UnNormalized Form 1NF: First Normalized Form 2NF: Second Normalized Form 3NF: Third Normalized Form 4NF: Fourth Form 5NF: Fifth Normalized Form 6NF: Sixth Normalized Form
note: * this is a normalized form that I have defined.
data generally starts out in -1NF (aka Chaos) and if you have a good Data Architect, DBA, Data Modeller, etc then they will very quickly move you to 1NF, 2NF, etc. How far down the normalized form list should you go will depend on your data sets.
to keep my customer anonymous, I am going to talk about their -1NF data set in a format that cant be identified as theirs. in one of their tables (2300 entries in it), they had a single column with 874 unique entries. that isnt bad in itself. but when I looked at those 800 entries I saw that they didnt normalize/standardize on anything. lets say that (column) was street names. they would have “drive”, “dr”, & “dr.” as how they would write “drive”. so when I replaced all the dr/dr. with drive (I chose for them to spell it out), I got rid of roughly 200 entries in the unique listing. when I continued to normalize/standardize the data for that column I had it under 134 unique entries down 874. then I went on to the next column and then the next repeating the normalization/standardization of the data. When we ported those changes back to their current data set (I was working on a dev copy), all of their staff were happier immediately. when they would look for things in the existing system they were able to find things now. things got even better by migrating from UNF to 1NF then to 2NF, etc. and NO I have never taken anyone all the way down the NF scale.
so here is my opinion. most customers data is -1NF (Chaos) and I feel it needs to be normalized/standardized enough that the data set doesnt get into their way of doing business. so any form of NF, excluding -1NF.