Method of grouping strings with XML

The following are some fine articles I found on  Brads blog page.
http://bradsruminations.blogspot.com.au/2009/10/making-list-and-checking-it-twice.html

In my previous blog entry, I described in detail how you could use the FOR XML clause to create a column with a comma-separated list of entries acquired from rows in a table somewhere.

In this article, we will do the opposite… We will take a comma-separated list and “shred” the individual items into rows.

Let’s start by creating and populating a test table:

create table #CSVLists (   ID          int  ,Description varchar(30)  ,CSVList     varchar(200)
)insert #CSVLists select 1,'Stooges'  ,'Moe,Larry,Curly'       union all select 2,'Castaways','Gilligan,Skipper,Thurston,Lovey,'                                     +'Ginger,Professor,MaryAnne'       union all select 3,'Monkees'  ,'Davy,Micky,Peter,Mike' 
(Okay, I know you Gilligan’s Island fanatics are chomping at the bit to tell me that the Skipper’s name was Jonas and the Professor’s name was Roy. I know that… Don’t bother e-mailing me about it… save a virtual tree).

There is a handy .nodes() method which you can apply to an XML datatype that will shred it into relational data. So our first step is to convert our CSVList column into a valid XML representation of the data. We do that this way:

select ID,Description      ,XmlListfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2/*
ID Description XmlList
-- ----------- -----------------------------------------------------------
 1 Stooges     <x>Moe</x><x>Larry</x><x>Curly</x>
 2 Castaways   <x>Gilligan</x><x>Skipper</x><x>Thurston</x><x>Lovey</x>...
 3 Monkees     <x>Davy</x><x>Micky</x><x>Peter</x><x>Mike</x>
*/ 
The first CROSS APPLY surrounded each item in the CSVList with the tags <x></x>. And the second CROSS APPLY converted that into an actual XML datatype.

Now that we have an XML column, we can apply the .nodes() method to it. We will pass 'x' to the .nodes() method so that it knows the tag (<x></x>) to shred. And let’s take a look at that result:

select ID,Description      ,XMLNodefrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)/*
Msg 493, Level 16, State 1, Line 1
The column 'XmlNode' that was returned from the nodes() method cannot be used 
directly. It can only be used with one of the four xml data type methods, 
exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.
*/
Oops… This new XmlNode object we created is not something that we can actually look at or use directly. So, as the error message indicated, we will use the .value() method against that XmlNode object to pull out an actual VARCHAR(20) value:

select ID,Description      ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
*/
Wow, it works!

But, as is often the case, you know there’s more. Just like I did in my previous blog entry, I’m going to do something to purposely trip things up. Let’s add a list of TV Networks:

insert #CSVListsselect 4,'Networks','ABC,NBC,CBS,FX,TNT,A&E'
And let’s try the same query:

select ID,Description      ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
Msg 9411, Level 16, State 1, Line 1
XML parsing: line 1, character 56, semicolon expected
*/
Ouch! Once again, we’ve run into the problem with the way that XML encodes and decodes special characters like ampersand (&).

We cannot CAST something as XML if it has an ampersand as part of the data, because XML will try to interpret it as an entity that must end in a semicolon:

select Info=cast('<x>A&E</x>' as xml)/*
Msg 9411, Level 16, State 1, Line 1
XML parsing: line 1, character 7, semicolon expected
*/
In order for it to be accepted, we have to encode that ampersand as follows, so that XML will understand it to represent an actual ampersand character:

select Info=cast('<x>A&amp;E</x>' as xml)/*
Info
--------------
<x>A&amp;E</x>
*/
And, as we learned in my previous blog entry, we can then take the .value() of that to un-encode it:

select Info=cast('<x>A&amp;E</x>' as xml).value('.','varchar(10)')/*
Info
----
A&E
*/
So in our shredding query, we have to FIRST encode any special characters in our CSVList column, and THEN we can successfully CAST it as an XML column.

One might be tempted to use multiple REPLACE() functions to convert all ampersands to &amp; and less-thans to &lt; and greater-thans to &gt;, but who knows what others characters are out there that will create problems? We already have a tool at our disposal that will encode the characters for us… the FOR XML clause.

select ID,Description      ,XMLEncodedfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0/*
ID Description XMLEncoded
-- ----------- ---------------------------------------------------------
 1 Stooges     Moe,Larry,Curly
 2 Castaways   Gilligan,Skipper,Thurston,Lovey,Ginger,Professor,MaryAnne
 3 Monkees     Davy,Micky,Peter,Mike
 4 Networks    ABC,NBC,CBS,FX,TNT,A&amp;E
*/
We learned in my previous blog entry that we can eliminate tags being created by explicitly specifying a column a name of [*]. If we hadn’t done that, then we would end up with our lists surrounded by <CSVList></CSVList> tags. And we eliminated yet another tag surrounding our data by passing the empty string to FOR XML PATH. So our lists are really the same as they were originally, except for the special characters being automatically encoded for us.

Now we can incorporate that CROSS APPLY into our query:

select ID,Description      ,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
 4 Networks    ABC
 4 Networks    NBC
 4 Networks    CBS
 4 Networks    FX
 4 Networks    TNT
 4 Networks    A&E
*/
Ahhh… Success!

By the way, you may have noticed that the making and un-making (shredding) of these comma-delimited lists is like PIVOTing and UNPIVOTing. PIVOT converts rows to columns and UNPIVOT converts columns to rows. We have done something similar, except in making a list, we converted rows into a single column, and in shredding a list, we converted a single column into rows.

I’ll leave you with one final demonstration. Our #CSVLists table that we created had some comma-separated lists. What if we want to sort the items in the lists?

We would first shred the items into rows, and then, using the FOR XML logic we put together in my previous blog entry, we will re-assemble the lists and use an ORDER BY to sort them:

;with ShreddedLists as(  select ID,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)  cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4)select ID,Description      ,SortedCSVListfrom #CSVListscross apply (select CSVStringXML=(select ','+ListItemfrom ShreddedListswhere ID=#CSVLists.IDorder by ListItemfor xml path(''),type)) F1cross apply (select CSVString=CSVStringXML.value('.','varchar(max)')) F2cross apply (select SortedCSVList=stuff(CSVString,1,1,'')) F3/*
ID Description SortedCSVList
-- ----------- ---------------------------------------------------------
 1 Stooges     Curly,Larry,Moe
 2 Castaways   Gilligan,Ginger,Lovey,MaryAnne,Professor,Skipper,Thurston
 3 Monkees     Davy,Micky,Mike,Peter
 4 Networks    A&E,ABC,CBS,FX,NBC,TNT

Popular posts from this blog

Using SSRS web services to render a report as a PDF

Integrating MIGS eGate Payment gateway using C# for Oz businesses (Server-Hosted Payment method)

Creating a Link Server to MYSQL from MSSQL