Converting Website Data Into PowerShell Objects

Yesterday, I stumbled over an excellent blog post written in 2007 by MOW, a good friend and PowerShell expert. In it, MOW demonstrates how to scrape raw data from an HTML page and convert it to PowerShell objects (http://thepowershellguy.com/blogs/posh/archive/2007/02/13/hey-powershell-how-popular-is-this-baby-name.aspx)

I loved that approach so much that I played a bit with it and refined his code.

It now asks you for a decade (1880 - 2000) and then navigates to a web page at www.ssa.gov with the most popular male and female names in that decade. The script then downloads the raw HTML content and parses it using regular expressions.

The result is then converted into PowerShell objects. The resulting data can now be analyzed, filtered, sorted and exported with all the luxury PowerShell offers.

  1. $decade = Read-host 'Enter decade (1880 - 2000)'  
  2.  
  3. Write-Progress "Connecting Web" "www.ssa.gov" 
  4. $wc = new-Object System.Net.WebClient  
  5. $nl = $wc.DownloadString("http://www.ssa.gov/OACT/babynames/decades/names$($decade)s.html")   
  6. Write-Progress "Analyzing Data" "extracting..." 
  7. $r = [regex]'="15%">(.*?)</td>'  
  8. $m = $r.Matches($nl
  9.  
  10. $list = @()  
  11. $sex = "male" 
  12.  
  13. foreach ($i in 0..($m.count -1) )  {  
  14.  
  15.   # Male  
  16.     $record = '' | Select-Object Name, Count, Percent, Sex 
  17.     $record.Name = $m[$i].groups[1].Value 
  18.     if (!($i % 60)) { 
  19.     Write-Progress "Finding Names ($($i/3))" $record.Name -percentComplete ($i * 100 / $m.count
  20.     
  21.     [void] $foreach.MoveNext()  
  22.   $record.Count  = [int]($m[$foreach.current].groups[1].value
  23.   [void] $foreach.MoveNext()  
  24.   $record.Percent  = "{0:p4}" -f (([double]$m[$foreach.current].groups[1].value) / 100) 
  25.    
  26.   $Record.Sex = $sex 
  27.   if ($sex -eq 'male') { $sex='female' } else { $sex = 'male'
  28.   $list += $record  
  29.  
  30.  
  31.  
  32. $list | Select-Object -first
  33. '#' * 40 
  34. $list | Sort-Object count -descending | Where-Object { $_.Sex -eq 'male' } | Select-Object -first

The example also demonstrates the use of Write-Progress to display status messages and progress bars. The result is a list of the top names in the chosen decade as well as a filtered list of male names only. Of couse, you can elaborate on this.

MOW, great job, this shows how (relatively) easy it is to convert "unmanaged" raw HTML data into managed PowerShell objects.

Cheers

-Tobias

 


Posted Nov 26 2008, 02:58 AM by Tobias Weltner
Filed under: , , ,
Concentrated Tech NSoftware Dell Compellent Sponsored by Idera and Concentrated Tech and NSoftware and Dell Compellent
Copyright 2011 PowerShell.com. All rights reserved.