Extracting Popular Names from Website


posted by Tobias Weltner
11-27-2008

Downloads: 370
File size: 1.3kB
Views: 1,365

Embed
Extracting Popular Names from Website
  1. # based on original work done by MOW 
  2. # please visit http://thepowershellguy.com/blogs/posh/archive/2007/02/13/hey-powershell-how-popular-is-this-baby-name.aspx 
  3.  
  4. $decade = Read-host 'Enter decade (1880 - 2000)'  
  5.  
  6. Write-Progress "Connecting Web" "www.ssa.gov" 
  7. $wc = new-Object System.Net.WebClient  
  8. $nl = $wc.DownloadString("http://www.ssa.gov/OACT/babynames/decades/names$($decade)s.html")   
  9. Write-Progress "Analyzing Data" "extracting..." 
  10. $r = [regex]'="15%">(.*?)</td>'  
  11. $m = $r.Matches($nl
  12.  
  13. $list = @()  
  14. $sex = "male" 
  15.  
  16. foreach ($i in 0..($m.count -1) )  {  
  17.  
  18.   # Male  
  19.     $record = '' | Select-Object Name, Count, Percent, Sex 
  20.     $record.Name = $m[$i].groups[1].Value 
  21.     if (!($i % 60)) { 
  22.     Write-Progress "Finding Names ($($i/3))" $record.Name -percentComplete ($i * 100 / $m.count
  23.     
  24.     [void] $foreach.MoveNext()  
  25.   $record.Count  = [int]($m[$foreach.current].groups[1].value
  26.   [void] $foreach.MoveNext()  
  27.   $record.Percent  = "{0:p4}" -f (([double]$m[$foreach.current].groups[1].value) / 100) 
  28.    
  29.   $Record.Sex = $sex 
  30.   if ($sex -eq 'male') { $sex='female' } else { $sex = 'male'
  31.   $list += $record  
  32.  
  33.  
  34.  
  35. $list | Select-Object -first
  36. '#' * 40 
  37. $list | Sort-Object count -descending | Where-Object { $_.Sex -eq 'male' } | Select-Object -first
Script demos how to extract raw HTML data from a website and turn the data into rich PowerShell objects. It asks for a decade (1880 - 2000) and then presents you the top favorite male and female names. It also demos how to use Write-Progress to display status and progress bars. The script is based on original work done by MOW (http://thepowershellguy.com/blogs/posh/archive/2007/02/13/hey-powershell-how-popular-is-this-baby-name.aspx).
Concentrated Tech NSoftware Dell Compellent Sponsored by Idera and Concentrated Tech and NSoftware and Dell Compellent
Copyright 2011 PowerShell.com. All rights reserved.