The following are some fine articles I found on Brads blog page.
http://bradsruminations.blogspot.com.au/2009/10/making-list-and-checking-it-twice.html
In my previous blog entry, I described in detail how you could use the FOR XML clause to create a column with a comma-separated list of entries acquired from rows in a table somewhere.
In this article, we will do the opposite… We will take a comma-separated list and “shred” the individual items into rows.
Let’s start by creating and populating a test table:
There is a handy .nodes() method which you can apply to an XML datatype that will shred it into relational data. So our first step is to convert our CSVList column into a valid XML representation of the data. We do that this way:
Now that we have an XML column, we can apply the .nodes() method to it. We will pass 'x' to the .nodes() method so that it knows the tag (<x></x>) to shred. And let’s take a look at that result:
But, as is often the case, you know there’s more. Just like I did in my previous blog entry, I’m going to do something to purposely trip things up. Let’s add a list of TV Networks:
We cannot CAST something as XML if it has an ampersand as part of the data, because XML will try to interpret it as an entity that must end in a semicolon:
One might be tempted to use multiple REPLACE() functions to convert all ampersands to & and less-thans to < and greater-thans to >, but who knows what others characters are out there that will create problems? We already have a tool at our disposal that will encode the characters for us… the FOR XML clause.
Now we can incorporate that CROSS APPLY into our query:
By the way, you may have noticed that the making and un-making (shredding) of these comma-delimited lists is like PIVOTing and UNPIVOTing. PIVOT converts rows to columns and UNPIVOT converts columns to rows. We have done something similar, except in making a list, we converted rows into a single column, and in shredding a list, we converted a single column into rows.
I’ll leave you with one final demonstration. Our #CSVLists table that we created had some comma-separated lists. What if we want to sort the items in the lists?
We would first shred the items into rows, and then, using the FOR XML logic we put together in my previous blog entry, we will re-assemble the lists and use an ORDER BY to sort them:
http://bradsruminations.blogspot.com.au/2009/10/making-list-and-checking-it-twice.html
In my previous blog entry, I described in detail how you could use the FOR XML clause to create a column with a comma-separated list of entries acquired from rows in a table somewhere.
In this article, we will do the opposite… We will take a comma-separated list and “shred” the individual items into rows.
Let’s start by creating and populating a test table:
create table #CSVLists ( ID int ,Description varchar(30) ,CSVList varchar(200) )insert #CSVLists select 1,'Stooges' ,'Moe,Larry,Curly' union all select 2,'Castaways','Gilligan,Skipper,Thurston,Lovey,' +'Ginger,Professor,MaryAnne' union all select 3,'Monkees' ,'Davy,Micky,Peter,Mike'(Okay, I know you Gilligan’s Island fanatics are chomping at the bit to tell me that the Skipper’s name was Jonas and the Professor’s name was Roy. I know that… Don’t bother e-mailing me about it… save a virtual tree).
There is a handy .nodes() method which you can apply to an XML datatype that will shred it into relational data. So our first step is to convert our CSVList column into a valid XML representation of the data. We do that this way:
select ID,Description ,XmlListfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2/* ID Description XmlList -- ----------- ----------------------------------------------------------- 1 Stooges <x>Moe</x><x>Larry</x><x>Curly</x> 2 Castaways <x>Gilligan</x><x>Skipper</x><x>Thurston</x><x>Lovey</x>... 3 Monkees <x>Davy</x><x>Micky</x><x>Peter</x><x>Mike</x> */The first CROSS APPLY surrounded each item in the CSVList with the tags <x></x>. And the second CROSS APPLY converted that into an actual XML datatype.
Now that we have an XML column, we can apply the .nodes() method to it. We will pass 'x' to the .nodes() method so that it knows the tag (<x></x>) to shred. And let’s take a look at that result:
select ID,Description ,XMLNodefrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)/* Msg 493, Level 16, State 1, Line 1 The column 'XmlNode' that was returned from the nodes() method cannot be used directly. It can only be used with one of the four xml data type methods, exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks. */Oops… This new XmlNode object we created is not something that we can actually look at or use directly. So, as the error message indicated, we will use the .value() method against that XmlNode object to pull out an actual VARCHAR(20) value:
select ID,Description ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode) cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/* ID Description ListItem -- ----------- --------- 1 Stooges Moe 1 Stooges Larry 1 Stooges Curly 2 Castaways Gilligan 2 Castaways Skipper 2 Castaways Thurston 2 Castaways Lovey 2 Castaways Ginger 2 Castaways Professor 2 Castaways MaryAnne 3 Monkees Davy 3 Monkees Micky 3 Monkees Peter 3 Monkees Mike */Wow, it works!
But, as is often the case, you know there’s more. Just like I did in my previous blog entry, I’m going to do something to purposely trip things up. Let’s add a list of TV Networks:
insert #CSVListsselect 4,'Networks','ABC,NBC,CBS,FX,TNT,A&E'And let’s try the same query:
select ID,Description ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode) cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/* ID Description ListItem -- ----------- --------- 1 Stooges Moe 1 Stooges Larry 1 Stooges Curly 2 Castaways Gilligan 2 Castaways Skipper 2 Castaways Thurston 2 Castaways Lovey 2 Castaways Ginger 2 Castaways Professor 2 Castaways MaryAnne 3 Monkees Davy 3 Monkees Micky 3 Monkees Peter 3 Monkees Mike Msg 9411, Level 16, State 1, Line 1 XML parsing: line 1, character 56, semicolon expected */Ouch! Once again, we’ve run into the problem with the way that XML encodes and decodes special characters like ampersand (&).
We cannot CAST something as XML if it has an ampersand as part of the data, because XML will try to interpret it as an entity that must end in a semicolon:
select Info=cast('<x>A&E</x>' as xml)/* Msg 9411, Level 16, State 1, Line 1 XML parsing: line 1, character 7, semicolon expected */In order for it to be accepted, we have to encode that ampersand as follows, so that XML will understand it to represent an actual ampersand character:
select Info=cast('<x>A&E</x>' as xml)/* Info -------------- <x>A&E</x> */And, as we learned in my previous blog entry, we can then take the .value() of that to un-encode it:
select Info=cast('<x>A&E</x>' as xml).value('.','varchar(10)')/* Info ---- A&E */So in our shredding query, we have to FIRST encode any special characters in our CSVList column, and THEN we can successfully CAST it as an XML column.
One might be tempted to use multiple REPLACE() functions to convert all ampersands to & and less-thans to < and greater-thans to >, but who knows what others characters are out there that will create problems? We already have a tool at our disposal that will encode the characters for us… the FOR XML clause.
select ID,Description ,XMLEncodedfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0/* ID Description XMLEncoded -- ----------- --------------------------------------------------------- 1 Stooges Moe,Larry,Curly 2 Castaways Gilligan,Skipper,Thurston,Lovey,Ginger,Professor,MaryAnne 3 Monkees Davy,Micky,Peter,Mike 4 Networks ABC,NBC,CBS,FX,TNT,A&E */We learned in my previous blog entry that we can eliminate tags being created by explicitly specifying a column a name of [*]. If we hadn’t done that, then we would end up with our lists surrounded by <CSVList></CSVList> tags. And we eliminated yet another tag surrounding our data by passing the empty string to FOR XML PATH. So our lists are really the same as they were originally, except for the special characters being automatically encoded for us.
Now we can incorporate that CROSS APPLY into our query:
select ID,Description ,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode) cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/* ID Description ListItem -- ----------- --------- 1 Stooges Moe 1 Stooges Larry 1 Stooges Curly 2 Castaways Gilligan 2 Castaways Skipper 2 Castaways Thurston 2 Castaways Lovey 2 Castaways Ginger 2 Castaways Professor 2 Castaways MaryAnne 3 Monkees Davy 3 Monkees Micky 3 Monkees Peter 3 Monkees Mike 4 Networks ABC 4 Networks NBC 4 Networks CBS 4 Networks FX 4 Networks TNT 4 Networks A&E */Ahhh… Success!
By the way, you may have noticed that the making and un-making (shredding) of these comma-delimited lists is like PIVOTing and UNPIVOTing. PIVOT converts rows to columns and UNPIVOT converts columns to rows. We have done something similar, except in making a list, we converted rows into a single column, and in shredding a list, we converted a single column into rows.
I’ll leave you with one final demonstration. Our #CSVLists table that we created had some comma-separated lists. What if we want to sort the items in the lists?
We would first shred the items into rows, and then, using the FOR XML logic we put together in my previous blog entry, we will re-assemble the lists and use an ORDER BY to sort them:
;with ShreddedLists as( select ID,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode) cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4)select ID,Description ,SortedCSVListfrom #CSVListscross apply (select CSVStringXML=(select ','+ListItemfrom ShreddedListswhere ID=#CSVLists.IDorder by ListItemfor xml path(''),type)) F1cross apply (select CSVString=CSVStringXML.value('.','varchar(max)')) F2cross apply (select SortedCSVList=stuff(CSVString,1,1,'')) F3/* ID Description SortedCSVList -- ----------- --------------------------------------------------------- 1 Stooges Curly,Larry,Moe 2 Castaways Gilligan,Ginger,Lovey,MaryAnne,Professor,Skipper,Thurston 3 Monkees Davy,Micky,Mike,Peter 4 Networks A&E,ABC,CBS,FX,NBC,TNT
Comments
Post a Comment