I have a database design question. I am dealing with a bunch of
standard content categories: people, addresses and companies. I don't
know whether I should have a table for each type of data and associate
addresses (via join tables) with people or companies or both; or
whether the companies table and the people tables should both include
address info. How far does one usually go to normalize these standard
types of data?
Sure, you could say that it depends on the specific business logic,
but I am curious if anyone out there has a way of normalizing this
type of data that has proven effective for most applications. I'm not
sure how our app will grow over time and therefore I want to
accommodate unforeseen requirements upfront without wasting untold
hours on far edge cases. Should I just bite the bullet and assume
we'll need a many-to-many relationship for each?
Any advice or suggested reading would be much appreciated.>> How far does one usually go to normalize these standard types of data?
First of all there are no "standard types" of data. Every piece of data one
deals with during logical modeling depends on the conceptual model and
business rule. Of course, you could find certain aspects of commonality
among models, but relying on them as certain patterns is often a bad
approach.
In general, normalization beyond 1NF is the process of reducing certain
types of redundancies and removing database bias. In the process certain
known problems like update/delete anomalies are elimination. Overall, this
leads to a better quality design. Therefore the answer to your question is
you go as far as you want to keep your database redundancy-free.
>> Sure, you could say that it depends on the specific business logic, but I
>> am curious if anyone out there has a way of normalizing this type of data
>> that has proven effective for most applications.
There are many who claim (universal models, b-patterns etc. for more, google
for David Hayes or Len Silverston) that this is possible, but many
recognized scholars and experts in the database field are not in favor of
such approaches.
>> I'm not sure how our app will grow over time and therefore I want to
>> accommodate unforeseen requirements upfront without wasting untold hours
>> on far edge cases. Should I just bite the bullet and assume we'll need a
>> many-to-many relationship for each?
Expecting the worst is often the wisest policy :-)
--
Anith|||For efficiency's sake it is common practice design such tables with the
address (and often contact) information of the entity (Business, Employee,
Customer, etc) contained within the main entity table.
"Chris" <christopher.b.lewis@.gmail.com> wrote in message
news:1191611766.490849.42420@.50g2000hsm.googlegroups.com...
>I have a database design question. I am dealing with a bunch of
> standard content categories: people, addresses and companies. I don't
> know whether I should have a table for each type of data and associate
> addresses (via join tables) with people or companies or both; or
> whether the companies table and the people tables should both include
> address info. How far does one usually go to normalize these standard
> types of data?
> Sure, you could say that it depends on the specific business logic,
> but I am curious if anyone out there has a way of normalizing this
> type of data that has proven effective for most applications. I'm not
> sure how our app will grow over time and therefore I want to
> accommodate unforeseen requirements upfront without wasting untold
> hours on far edge cases. Should I just bite the bullet and assume
> we'll need a many-to-many relationship for each?
> Any advice or suggested reading would be much appreciated.
>|||> For efficiency's sake it is common practice design such tables with the
> address (and often contact) information of the entity (Business, Employee,
> Customer, etc) contained within the main entity table.
But this quickly changes if the app requires separate addresses for mailing,
shipping, billing, physical, etc. And then when a company can have multiple
branches the schema becomes even more complex.
Same is true for an employee, or a customer, if multiple addresses are
possible then the main entity table is out of the question. Consider Amazon
or PayPal, where you can store as many addresses as you like. I'm not sure
what their upper bound is (and haven't tried), but I think I have 5 on
PayPal, and it's highly doubtful they have a table like this:
CREATE TABLE dbo.Customers
(
FirstName,
LastName,
PrimaryAddress1,
PrimaryAddress2,
PrimaryCity,
PrimaryRegion,
PrimaryZip,
SecondaryAddress1,
...,
SecondaryZip,
TertiaryAddress1,
...,
TertiaryZip,
/*
etc.
etc.
*/
);
A|||>> Same is true for an employee, or a customer, if multiple addresses are
>> possible then the main entity table is out of the question. Consider
>> Amazon or PayPal, where you can store as many addresses as you like.
That is almost an universal requirement these days with many financial
institutions too. While it is hard to generalize, one pattern for a commonly
occurring entity class like customer is to have 3 core m:n relationship
designations in the schema,
-- the core customer entity:
name, ssn, passport_nbr, drivers_license etc.
-- the locality:
address, state, zip, region, country etc.
-- the contact/communication information:
phone_nbr, email_address, icq/mkt/linkedin etc.
One book I recently read, Data Model Patterns by D. Hay emphasizes on this
approach. His line of thought is mostly on the Zachman Framework to derive
at this common pattern and provides some reasonable justifications to adopt
it as a baseline. The UDM folks also have a similar approach :
http://www.univdata.com/images/sep98752-fig6.gif
--
Anith|||>Should I just bite the bullet and assume we'll need a many-to-many
>relationship for each?
Yes. Some years back I fell into the "put the address in the base table"
trap.
People -< PeopleCompany >- Companies
| |
^ ^
PeopleAddresses CompanyAddresses
Y Y
| |
Addresses >-- AddressTypes
"Chris" <christopher.b.lewis@.gmail.com> wrote in message
news:1191611766.490849.42420@.50g2000hsm.googlegroups.com...
>I have a database design question. I am dealing with a bunch of
> standard content categories: people, addresses and companies. I don't
> know whether I should have a table for each type of data and associate
> addresses (via join tables) with people or companies or both; or
> whether the companies table and the people tables should both include
> address info. How far does one usually go to normalize these standard
> types of data?
> Sure, you could say that it depends on the specific business logic,
> but I am curious if anyone out there has a way of normalizing this
> type of data that has proven effective for most applications. I'm not
> sure how our app will grow over time and therefore I want to
> accommodate unforeseen requirements upfront without wasting untold
> hours on far edge cases. Should I just bite the bullet and assume
> we'll need a many-to-many relationship for each?
> Any advice or suggested reading would be much appreciated.
>|||On Fri, 05 Oct 2007 12:16:06 -0700, Chris
<christopher.b.lewis@.gmail.com> wrote:
>I have a database design question. I am dealing with a bunch of
>standard content categories: people, addresses and companies. I don't
>know whether I should have a table for each type of data and associate
>addresses (via join tables) with people or companies or both; or
>whether the companies table and the people tables should both include
>address info. How far does one usually go to normalize these standard
>types of data?
The more time you spend with databases, the more you tend to factor
things into 3NF or better as a default. Break out the addresses!
>Sure, you could say that it depends on the specific business logic,
>but I am curious if anyone out there has a way of normalizing this
>type of data that has proven effective for most applications. I'm not
>sure how our app will grow over time and therefore I want to
>accommodate unforeseen requirements upfront without wasting untold
>hours on far edge cases. Should I just bite the bullet and assume
>we'll need a many-to-many relationship for each?
>Any advice or suggested reading would be much appreciated.
Here's my advice: don't be afraid of the join.
Josh|||> Here's my advice: don't be afraid of the join.
Depends on what your objective is. Maximum flexibility or maximum
performance.
"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:5dqgg3ho4gqhc31kbcg4m2u76snergnt9f@.4ax.com...
> On Fri, 05 Oct 2007 12:16:06 -0700, Chris
> <christopher.b.lewis@.gmail.com> wrote:
>>I have a database design question. I am dealing with a bunch of
>>standard content categories: people, addresses and companies. I don't
>>know whether I should have a table for each type of data and associate
>>addresses (via join tables) with people or companies or both; or
>>whether the companies table and the people tables should both include
>>address info. How far does one usually go to normalize these standard
>>types of data?
> The more time you spend with databases, the more you tend to factor
> things into 3NF or better as a default. Break out the addresses!
>
>>Sure, you could say that it depends on the specific business logic,
>>but I am curious if anyone out there has a way of normalizing this
>>type of data that has proven effective for most applications. I'm not
>>sure how our app will grow over time and therefore I want to
>>accommodate unforeseen requirements upfront without wasting untold
>>hours on far edge cases. Should I just bite the bullet and assume
>>we'll need a many-to-many relationship for each?
>>Any advice or suggested reading would be much appreciated.
> Here's my advice: don't be afraid of the join.
> Josh
>|||>> > Here's my advice: don't be afraid of the join.
>> Depends on what your objective is. Maximum flexibility or maximum
>> performance.
Except that one does not always have to sacrifice one to gain the other. And
a logical operation like join has little to do with either anyway.
Otherwise, anyone whose objective is maximum performance would be better of
avoiding joins altogether by having a single table :-)
--
Anith|||On Sun, 7 Oct 2007 19:52:32 -0500, "Anith Sen"
<anith@.bizdatasolutions.com> wrote:
>> > Here's my advice: don't be afraid of the join.
>> Depends on what your objective is. Maximum flexibility or maximum
>> performance.
>Except that one does not always have to sacrifice one to gain the other. And
>a logical operation like join has little to do with either anyway.
>Otherwise, anyone whose objective is maximum performance would be better of
>avoiding joins altogether by having a single table :-)
Well, by having a lot of redundant data pre-digested (joined and/or
aggregated) for common queries.
J.
No comments:
Post a Comment