Sunday, February 19, 2012

A Query

Hi All,

I am working on a project where I had to upload data from Excel
file into SQL Server.

Now the problem is that the data in Excel sheet can have any
data(garbage) ,the data entered may not have any significance to the
data already present in the database.

e.g I have an Author table
I have an Titles table

Now if I try to upload data present in Excel sheet which contains
Author name and title .It should show me all exceptions like
title is empty Author is not present in the Author Table etc.
My Solution

What I am doing is loading all the data into a temporary table checking
for not null , length of fields (using cursor) etc.
If null value exist then stop the upload

otherwise

One by one pick the data from the table (temporary) then insert into
another table (which can be used for updation purpose becuase not all
records are going to be incorrect) now this table has trigger for
insert
Now each column is checked for the integrity constraints [ referential
,check etc ]
With present Solution if my upload has 18 fields and probable
combination of 60 exceptions [ referential ,check etc ] and my upload
excel file has 20000 records It will take nearly 16 Hours and still
going.

Can you suggest me a better method the present method is killing my
application

With Warm Regards
JatinderYou don't need to validate row by row. Two inserts should do it:

/* the valid rows... */

INSERT INTO TargetTable (author_name, foo_name, bar_name, ...)
SELECT DISTINCT S.author_name, S.foo_name, S.bar_name, ...
FROM StagingTable AS S
JOIN Authors AS A
ON S.author_name = A.author_name
JOIN foo AS F
ON S.foo_name = F.foo_name
JOIN bar AS B
ON S.foo_name = B.foo_name
WHERE ... / * domain checks go here */

/* ... and the invaid ones */

INSERT INTO ExceptionTable (author_name, foo_name, bar_name, ...)
SELECT DISTINCT S.author_name, S.foo_name, S.bar_name, ...
FROM StagingTable AS S
LEFT JOIN Authors AS A
ON S.author_name = A.author_name
LEFT JOIN foo AS F
ON S.foo_name = F.foo_name
LEFT JOIN bar AS B
ON S.foo_name = B.foo_name
WHERE A.author_name IS NULL
OR F.foo_name IS NULL
OR B.bar_name IS NULL
OR ... / * domain checks go here */

--
David Portas
SQL Server MVP
--|||Thanks David,

This will surely help me solve the problem.
I was thinking of the SET BASED Solution but could not figure out one
..Thanks again

It is not be related to the post but can you suggest some ways to
improve performance of an ASP page.

With warm regards
Jatinder

No comments:

Post a Comment