Sunday, February 19, 2012

A Puzzling Parsing Problem

Greetings and happy new year, all!

I've once again run into a most perplexing problem and I'm hoping someone here can help.

I have a table with record numbers (col A) and a memo field of text format (col B). Contained in the memo field are social security numbers associated with that particular record number. What I need to do is somehow parse the text field and extract these numbers as a 1:1 relationship with the record number. Naturally all the memo entries are of different lengths as well.

So I need to take this:

A B
1 123123123987987987321321321
2 654654654

and turn it into this:

A B
1 123123123
1 987987987
1 321321321
2 654654654

I've gotten as far as using the LEN function to determine the size of each memo field, then dividing by 9 to give me a total number of "records" for that particular record number. My thought was to somehow use a cursor or something to parse the record continually until it reached the number of parses denoted by the total number of records I calculated for each line, but I'm not sure how to do it to return a result set as it appears above. I can manually parse it out 9 characters at a time into a new field using a pretty simple substring, but it will end up taking me the long way around the block.

Can anyone out there help?
Thanks in advance,
DirkYou can use something like the following, which is a udf split function for sql server - the change for you would be to base it on length not on a special character:

function (http://www.sqlmag.com/Articles/Index.cfm?ArticleID=21071)

Let me know if this helps.|||This looks promising, but I don't think I can use it. The article is for SQL2K, and we're still running 7 at the moment.

I'm fooling around with some of the concepts to see if I can somehow apply them though. Any other ideas would be most appreciated!

Dirk|||You know - I was going to ask you if you were using ss2k. Yes, that is a gotcha. You can still use the scripting but you will have to use a cursor.|||Thanks for letting me know I was on the right track!

Any additional info you can provide on the best way to create the cursor? Here are my thoughts:

1. Calculate the number of records in the field by using the LEN function and dividing by 9.

2. Using this number to determine how many times to loop through the field, creating a new line with the linking record number and the next 9 digits in the sequence, then decreasing it by one on each pass until it equals zero.

I'm assuming I can declare the number of records as a variable and do some sort of IF/ELSE, but I'm not sure how to make sure EACH line I write have the one linking record number? If I grab it as a variable each time the cursor loops, will it keep writing the same value until it hits the ELSE to trigger the fetch next line?

Dirk|||Cursors are generally bad for performance. If you can move the logic to a VB or PERL script, you may be better off. Especially if this table increases significantly in size.|||Cursors are the first choice for non-SQL programmer, because there are not any set operations in most languages.
But i already saw some select from system processes in Windows scripting file code.

--SourceTable, #ResultTable ,#LookUpTable

--INIT
create table #ResultTable(
NEWA int identity(1,1) primary key clustered
,A int not null
,B char(9) not null
)
create table #LookUpTable(
A int not null primary key nonclustered
,LENB int not null
)
insert #LookUpTable(A,LENB)
select A,LENB=len(B) from SourceTable
create clustered index IC_LookUpTable on #LookUpTable(LENB)
GO

--CHECK SourceTable
select A
from #LookUpTable
where ( (len(LENB) % 9) <> 0 ) or (len(LENB)=0)
GO

--SPLIT BY LOOP
set nocount on
declare @.CurPass smallint
declare @.ProcessedRows int
set @.CurPass=1
set @.ProcessedRows=1
while @.ProcessedRows>0 begin
insert #ResultTable(A,B)
select st.A,B=substring(st.B,(@.CurPass-1)*9+1 ,9)
from SourceTable st
join #LookUpTable lt on st.A=lt.A and lt.LENB>=(@.CurPass*9)
set @.ProcessedRows=@.@.rowcount
set @.CurPass=@.CurPass+1
end
set nocount off
GO

--SEE RESULT
select * from #ResultTable

--CHECK NEWA
if not exists(select * from #ResultTable group by A,B having count(*)>1)
select 'NEWA PK is not needed.'

--CLEARING
drop table #ResultTable
drop table #LookUpTable

I use lookup table, because I don't know how large is your table.|||This is not really a reply, but I saw the problem and the replies look very promissing.
I'm using ss2k, I have a table with an address column.
here is some example of the records under ADDRESS :

WILLOW CREEK PL
RED BARN DR
RED BARN DR
CARRINGTON DR
RENNER RD
EDMONTON CT
SPRINGBRANCH DR
HILLROSE DR
CEDAR RIDGE DR
LARTAN TRL
PRESIDENT GEORGE BUSH HWY

What I want to do is to write script that runs daily and parse the street names (RED BARN) and street types (Dr, PL , etc.. ) to 2 colums. As u can see there is no fixed length or fixed number of words ...etc ...
Any help would be really appreciated.
thnks|||You should probably start a new post...but anyway...

This won't work if your address don't end with a type...but it still might work for the majority...

good luck

USE Northwind
GO

CREATE TABLE myTable99(Col1 varchar(255), Col2 varchar(255), Col3 varchar(50))
GO

INSERT INTO myTable99(Col1)
SELECT 'WILLOW CREEK PL' UNION ALL
SELECT 'RED BARN DR' UNION ALL
SELECT 'RED BARN DR' UNION ALL
SELECT 'CARRINGTON DR' UNION ALL
SELECT 'RENNER RD' UNION ALL
SELECT 'EDMONTON CT' UNION ALL
SELECT 'SPRINGBRANCH DR' UNION ALL
SELECT 'HILLROSE DR' UNION ALL
SELECT 'CEDAR RIDGE DR' UNION ALL
SELECT 'LARTAN TRL' UNION ALL
SELECT 'PRESIDENT GEORGE BUSH HWY'
GO

SELECT * FROM myTable99
GO

UPDATE t
SET Col2 = REVERSE(SUBSTRING(REVERSE(Col1)
,(CHARINDEX(' ',REVERSE(Col1))+1)
,LEN(Col1)-CHARINDEX(' ',REVERSE(Col1))))
, Col3 = REVERSE(SUBSTRING(REVERSE(Col1)
,1
,(CHARINDEX(' ',REVERSE(Col1))-1)))
FROM myTable99 t
GO

SELECT * FROM myTable99
GO

DROP TABLE myTable99
GO

No comments:

Post a Comment