Skip to main content

Method of grouping strings with XML

The following are some fine articles I found on  Brads blog page.
http://bradsruminations.blogspot.com.au/2009/10/making-list-and-checking-it-twice.html

In my previous blog entry, I described in detail how you could use the FOR XML clause to create a column with a comma-separated list of entries acquired from rows in a table somewhere.

In this article, we will do the opposite… We will take a comma-separated list and “shred” the individual items into rows.

Let’s start by creating and populating a test table:

create table #CSVLists (   ID          int  ,Description varchar(30)  ,CSVList     varchar(200)
)insert #CSVLists select 1,'Stooges'  ,'Moe,Larry,Curly'       union all select 2,'Castaways','Gilligan,Skipper,Thurston,Lovey,'                                     +'Ginger,Professor,MaryAnne'       union all select 3,'Monkees'  ,'Davy,Micky,Peter,Mike' 
(Okay, I know you Gilligan’s Island fanatics are chomping at the bit to tell me that the Skipper’s name was Jonas and the Professor’s name was Roy. I know that… Don’t bother e-mailing me about it… save a virtual tree).

There is a handy .nodes() method which you can apply to an XML datatype that will shred it into relational data. So our first step is to convert our CSVList column into a valid XML representation of the data. We do that this way:

select ID,Description      ,XmlListfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2/*
ID Description XmlList
-- ----------- -----------------------------------------------------------
 1 Stooges     <x>Moe</x><x>Larry</x><x>Curly</x>
 2 Castaways   <x>Gilligan</x><x>Skipper</x><x>Thurston</x><x>Lovey</x>...
 3 Monkees     <x>Davy</x><x>Micky</x><x>Peter</x><x>Mike</x>
*/ 
The first CROSS APPLY surrounded each item in the CSVList with the tags <x></x>. And the second CROSS APPLY converted that into an actual XML datatype.

Now that we have an XML column, we can apply the .nodes() method to it. We will pass 'x' to the .nodes() method so that it knows the tag (<x></x>) to shred. And let’s take a look at that result:

select ID,Description      ,XMLNodefrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)/*
Msg 493, Level 16, State 1, Line 1
The column 'XmlNode' that was returned from the nodes() method cannot be used 
directly. It can only be used with one of the four xml data type methods, 
exist(), nodes(), query(), and value(), or in IS NULL and IS NOT NULL checks.
*/
Oops… This new XmlNode object we created is not something that we can actually look at or use directly. So, as the error message indicated, we will use the .value() method against that XmlNode object to pull out an actual VARCHAR(20) value:

select ID,Description      ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
*/
Wow, it works!

But, as is often the case, you know there’s more. Just like I did in my previous blog entry, I’m going to do something to purposely trip things up. Let’s add a list of TV Networks:

insert #CSVListsselect 4,'Networks','ABC,NBC,CBS,FX,TNT,A&E'
And let’s try the same query:

select ID,Description      ,ListItemfrom #CSVListscross apply (select TagsList='<x>'+replace(CSVList,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
Msg 9411, Level 16, State 1, Line 1
XML parsing: line 1, character 56, semicolon expected
*/
Ouch! Once again, we’ve run into the problem with the way that XML encodes and decodes special characters like ampersand (&).

We cannot CAST something as XML if it has an ampersand as part of the data, because XML will try to interpret it as an entity that must end in a semicolon:

select Info=cast('<x>A&E</x>' as xml)/*
Msg 9411, Level 16, State 1, Line 1
XML parsing: line 1, character 7, semicolon expected
*/
In order for it to be accepted, we have to encode that ampersand as follows, so that XML will understand it to represent an actual ampersand character:

select Info=cast('<x>A&amp;E</x>' as xml)/*
Info
--------------
<x>A&amp;E</x>
*/
And, as we learned in my previous blog entry, we can then take the .value() of that to un-encode it:

select Info=cast('<x>A&amp;E</x>' as xml).value('.','varchar(10)')/*
Info
----
A&E
*/
So in our shredding query, we have to FIRST encode any special characters in our CSVList column, and THEN we can successfully CAST it as an XML column.

One might be tempted to use multiple REPLACE() functions to convert all ampersands to &amp; and less-thans to &lt; and greater-thans to &gt;, but who knows what others characters are out there that will create problems? We already have a tool at our disposal that will encode the characters for us… the FOR XML clause.

select ID,Description      ,XMLEncodedfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0/*
ID Description XMLEncoded
-- ----------- ---------------------------------------------------------
 1 Stooges     Moe,Larry,Curly
 2 Castaways   Gilligan,Skipper,Thurston,Lovey,Ginger,Professor,MaryAnne
 3 Monkees     Davy,Micky,Peter,Mike
 4 Networks    ABC,NBC,CBS,FX,TNT,A&amp;E
*/
We learned in my previous blog entry that we can eliminate tags being created by explicitly specifying a column a name of [*]. If we hadn’t done that, then we would end up with our lists surrounded by <CSVList></CSVList> tags. And we eliminated yet another tag surrounding our data by passing the empty string to FOR XML PATH. So our lists are really the same as they were originally, except for the special characters being automatically encoded for us.

Now we can incorporate that CROSS APPLY into our query:

select ID,Description      ,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)
cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4/*
ID Description ListItem
-- ----------- ---------
 1 Stooges     Moe
 1 Stooges     Larry
 1 Stooges     Curly
 2 Castaways   Gilligan
 2 Castaways   Skipper
 2 Castaways   Thurston
 2 Castaways   Lovey
 2 Castaways   Ginger
 2 Castaways   Professor
 2 Castaways   MaryAnne
 3 Monkees     Davy
 3 Monkees     Micky
 3 Monkees     Peter
 3 Monkees     Mike
 4 Networks    ABC
 4 Networks    NBC
 4 Networks    CBS
 4 Networks    FX
 4 Networks    TNT
 4 Networks    A&E
*/
Ahhh… Success!

By the way, you may have noticed that the making and un-making (shredding) of these comma-delimited lists is like PIVOTing and UNPIVOTing. PIVOT converts rows to columns and UNPIVOT converts columns to rows. We have done something similar, except in making a list, we converted rows into a single column, and in shredding a list, we converted a single column into rows.

I’ll leave you with one final demonstration. Our #CSVLists table that we created had some comma-separated lists. What if we want to sort the items in the lists?

We would first shred the items into rows, and then, using the FOR XML logic we put together in my previous blog entry, we will re-assemble the lists and use an ORDER BY to sort them:

;with ShreddedLists as(  select ID,ListItemfrom #CSVListscross apply (select XMLEncoded=(select [*]=CSVList for xml path(''))) F0cross apply (select TagsList='<x>'+replace(XMLEncoded,',','</x><x>')+'</x>') F1cross apply (select XmlList=cast(TagsList as xml)) F2cross apply XmlList.nodes('x') F3(XmlNode)  cross apply (select ListItem=XmlNode.value('.','varchar(20)')) F4)select ID,Description      ,SortedCSVListfrom #CSVListscross apply (select CSVStringXML=(select ','+ListItemfrom ShreddedListswhere ID=#CSVLists.IDorder by ListItemfor xml path(''),type)) F1cross apply (select CSVString=CSVStringXML.value('.','varchar(max)')) F2cross apply (select SortedCSVList=stuff(CSVString,1,1,'')) F3/*
ID Description SortedCSVList
-- ----------- ---------------------------------------------------------
 1 Stooges     Curly,Larry,Moe
 2 Castaways   Gilligan,Ginger,Lovey,MaryAnne,Professor,Skipper,Thurston
 3 Monkees     Davy,Micky,Mike,Peter
 4 Networks    A&E,ABC,CBS,FX,NBC,TNT

Comments

Popular posts from this blog

Troubleshooting Database Mail: General steps

SQL Server 2008 R2 Other Versions SQL Server "Denali" SQL Server 2008 SQL Server 2005 Troubleshooting Database Mail involves checking the following general areas of the Database Mail system. These procedures are presented in a logical order, but can be evaluated in any order. To determine if Database Mail is enabled In SQL Server Management Studio, connect to an instance of SQL Server by using a query editor window, and then execute the following code: Copy sp_configure 'show advanced', 1; GO RECONFIGURE; GO sp_configure; GO In the results pane, confirm that the run_value for Database Mail XPs is set to 1 . If the run_value is not 1 , Database Mail is not enabled. Database Mail is not automatically enabled to reduce the number of features available for attack by a malicious user. For more information, see Understanding Surface Area Configuration . If you decide that it is appropriate to enable Database Mail, execute the fo...

Integrating MIGS eGate Payment gateway using C# for Oz businesses (Server-Hosted Payment method)

If you ever wanted to include the eGate payment gateway in your ASP.Net site and want to save yourself the time in creating the class, here’s how to do it. First things first you will need a merchant account with ANZ who will give you the following ·          Merchant Account ·          User name ·          password The details could be downloaded from the ANZ web site http://www.anz.com.au/australia/business/merchant/DownloadDevKit.asp I am doing this for an Aus base client thus the process might be different to your own country. Please check with the issuing bank for details. Have a read of the Merchant Admin guide. Login to the merchant account and create an operator ID, Login as the operator and obtain the access code. I will not going to the details as this is all documented in the admin guide. What you will need for the implementation would...

Using SSRS web services to render a report as a PDF

I have been looking around the net for some decent code which would explain how I could render a report, using SSRS 2008 web services as a PDF.   The need was to extract reports sitting on a SSRS 2008 server sitting on a NT domain on a trusted network, whereas my web server was sitting in a DMZ. Where the only communication allowed by the network admin was port 80. To do this you will need to use the SSRS2008   ReportExecution2005.asmx web service. This could be accesses using the following URL assuming your SSRS server was installed using the default settings. http://YourServerIP/reportserver/reportexecution2005.asmx?wsdl 1.        Create a user on your AD domain with the least amount of privileges (say ReportUser) 2.        Give this account browse access on the reporting server for the desired reports. 3.        To get this working in visual studio 2010 (I am using t...