XML manipulation

rated by 0 users
This post has 3 Replies | 3 Followers

Not Ranked
Posts 2
Ali Posted: 07-17-2012 9:15 AM

I’m trying to create a script that will read in an XML file, split one of the nodes to create two new ones and to export the result to a CSV file for use elsewhere.

I have managed the import and the split but when I try to export the CSV, the text from the new nodes.e. firstname and lastname is output correctly but text from the existing nodes (i.e. email) are output as System.Xml.XmlElement.  Is it possible to either convert all cdata elements to normal text before output or to obtain text instead of ‘System.Xml.XmlElement’ before exporting to CSV?

Script as follows:

$xmldata=[xml](get-content .\employee3.xml)

$accounts=$xmldata.staff.employee

foreach ($entry in $accounts)

{

$names = $entry.username."#cdata-section" -split ",";

        $lastname = $xmldata.CreateElement("lastname") #create xml element - lastname

        $lastname.set_InnerText($names[0])

        $entry.PrependChild($lastname)

        $firstname = $xmldata.CreateElement("firstname")

        $firstname.set_InnerText($names[1])

        $entry.PrependChild($firstname)

}

$accounts | select firstname, lastname, email | export-csv 'C:\psscripts\emp.csv'

XML file:
<staff branch="Hanover" Type="sales">
<employee>
<username>Jones, Peter</username>
<Email>peterjones@abc.com</Email>
</employee>
<employee>
<username>Walker, Johnny</username>
<Email>jwalker@def.com</Email>
</employee>
<employee>
<username>smith, john</username>
<Email>john.smith@test.com</Email>
</employee>
</staff>

Csv file contents:

firstname    lastname    Email
Peter          Jones        System.Xml.XmlElement
Johnny        Walker       System.Xml.XmlElement
john            smith        System.Xml.XmlElement

Top 25 Contributor
Posts 201
Microsoft MVP
Top Contributor

I think this is going to be inherently difficult because XML is hierarchical and CSV is flat. What you might be able to do is use a custom property to extract the text part of the email element.

$accounts | select firstname,lastname,@{name="Email";Expression={$_.email.innertext}}

I doubt I have the right property name, but I hope you get the idea. You'll need to expand that object to retrieve the text you want so that you have simple text that the CSV format can use.

Top 500 Contributor
Posts 8

I don't see any CDATA in your XML, so I'm a little confused why you're doing the ."#cdata-section" at all, but you should NOT be doing that.  If you have CDATA in your xml, use the .InnerText property (If that isn't exposed, as in PowerShell 1 sometimes, use the .get_InnerText() method).  

The problem comes when PowerShell gets too smart for it's own good and returns you a STRING sometimes, and sometimes an xml node. E.g.:

$xmldata = [Xml]@"
<staff branch="Hanover" Type="sales">
<employee>
<username><![CDATA[Jones, Peter]]></username>
<Email>peterjones@abc.com</Email>
</employee>
<employee>
<username>Walker, Johnny</username>
<Email>jwalker@def.com</Email>
</employee>
<employee>
<username>smith, john</username>
<Email>john.smith@test.com</Email>
</employee>
</staff>
"@

$xmldata.staff.employee[0].username.GetType() # "XmlElement" because there's CDATA text
$xmldata.staff.employee[1].username.GetType() # "String" because there's just regular text

To work around that, use .SelectSingleNode("username").InnerText:

$xmldata.staff.employee |
Select @{ Name = "FirstName"; Expression = { $_.selectSingleNode("username").InnerText.split(",")[-1].trim() } },
@{ Name = "LastName"; Expression = { $_.selectSingleNode("username").InnerText.split(",")[0].trim() } },
Email |
Export-Csv temp.csv

If that doesn't work for you as-is (it works for me on the sample text above, but if you have CDATA in your email addresses ... ), then you'll want to .InnerText the Email too:

$xmldata.staff.employee |
Select @{ Name = "FirstName"; Expression = { $_.selectSingleNode("username").InnerText.split(",")[-1].trim() } },
@{ Name = "LastName"; Expression = { $_.selectSingleNode("username").InnerText.split(",")[0].trim() } },
@{ Name = "Email"; Expression = { $_.selectSingleNode("Email").InnerText } } |
Export-Csv temp.csv

 

 

Not Ranked
Posts 2

Hi there,

Thanks for your help with this. Using a custom property as you suggested produces the correct output.

Best regards,

Ali

 

Page 1 of 1 (4 items) | RSS
Copyright 2012 PowerShell.com. All rights reserved.