Compare-Object is a very powerful Cmdlet that can compare different result sets. The funny thing is: when you try to use Compare-Object in simple scenarios, all works fine. Once you put it to work in production environments, it often fails. Here is why. This article describes what to watch out for and how to correctly configure Compare-Object to make it work for you.
Using Compare-Object To Find New Processes
PowerShell can easily compare result sets for you and filter out only those items that have changed. Let's say you'd like to know which processes have been started after a given point in time. How can you do that?
The workhorse doing the comparinson is called Compare-Object. This Cmdlet takes two resultsets and automatically analyzes them. It then outputs only those items present in either one of the result sets. To find out the processes started after a given point in time, you first create a base resultset of all currently running processes like this:
PS> $shot1 = Get-Process
PS> notepad
PS> $shot2 = Get-Process
PS> Compare-Object $shot1 $shot2
InputObject SideIndicator
----------- -------------
System.Diagnostics.Process (notepad) =>
System.Diagnostics.Process (WmiPrvSE) =>
Now, whenever you want to see what has changed in your environment, create a second snapshot and compare both. Let's start notepad.exe really quick, then create another snapshot and compare both:
Compare-Object returns only those processes that exist in either $shot1 or $shot2, and the SideIndicator property tells you in which result set the object was present. A SideIndicator "=>" indicates that processes existed in only the second result set so you know these must be new processes. Interestingly enough, when I launched notepad, Windows also launched a process called WmiPrvSE.
Finding New Files And Folders
Wow, that's easy! Why not use this to find new files and folders added to a folder you'd like to monitor? To do that, you'd first generate a snapshot of that folder, then wait for content to be changed, and finally create another snapshot and compare contents. Let's take a look:
PS> $shot1 = Dir $home
PS> Set-Content $home\testfile1.txt "A new file"
PS> $shot2 = Dir $home
PS> Compare-Object $shot1 $shot2
InputObject SideIndicator
----------- -------------
testfile1.txt =>
It worked. It will not always work like a charm, though. There is an important caveat you need to know about: SyncWindow.
Adjusting SyncWindow
Whenever Compare-Object compares object sets, it uses a SyncWindow to resync both lists when there is no match. The default SyncWindow setting in PowerShell V1 is 5, so whenever the result sets has too many consecutive differences, the result will not be what you expected. Here is an example:
PS> $shot1 = 1..10
PS> $shot2 = 10..1
PS> Compare-Object $shot1 $shot2
PS> $shot1 = 1..15
PS> $shot2 = 15..1
PS> Compare-Object $shot1 $shot2
InputObject SideIndicator
----------- -------------
15 =>
1 <=
14 =>
2 <=
2 =>
1 =>
14 <=
15 <=
In the first part, Compare-Object compares two lists of numbers. Both lists contain the same numbers but in reverse order. The result is nothing, and that is correct since both sets contain the same numbers.
In the second part, there are 15 numbers in each set. This time, Compare-Object returns a bunch of nonsense information, claiming for example that the number 15 is present only in $shot1 and then only in $shot2. Why?
The default SyncWindow is 5, so whenerver there is no match, Compare-Object uses a delta of +/- 5 items to find the next matching item. When there are 10 elements in a set, a SyncWindow of 5 is sufficient to resync both lists (plus/minus 5 results in a maximum of 10 allowable consecutive differences). When there are 15 elements, SyncWindow would need to be at least 7 (the first comparison would be number 1 of $shot1 against number 15 of $shot2; with a SyncWindow of 7, Compare-Object would move 14 elements in $shot2 to find a match and would indeed find the matching number 1).
Fortunately, you can change the SyncWindow property using the parameter -syncWindow:
PS> $shot1 = 1..15
PS> $shot2 = 15..1
PS> Compare-Object $shot1 $shot2 -syncWindow 7
PS> Compare-Object $shot1 $shot2 -syncWindow 6
InputObject SideIndicator
----------- -------------
15 =>
1 <=
1 =>
15 <=
Now, what would the SyncWindow need to be with an array of 16 or 25 elements? Easy: Take the array size, divide it by two and there you go. For an array of 16 elements, the minimum SyncWindow needs to be 8, and for an array of 25 elements it needs to be 12.
PS> $shot1 = 1..5
PS> $shot2 = 5..1
PS> Compare-Object $shot1 $shot2
PS> $shot1 = 1..10
PS> $shot2 = 10..1
PS> Compare-Object $shot1 $shot2
PS> $shot1 = 1..15
PS> $shot2 = 15..1
PS> Compare-Object $shot1 $shot2
InputObject SideIndicator
----------- -------------
15 =>
1 <=
14 =>
2 <=
2 =>
1 =>
14 <=
15 <=
PS> $shot1 = 1..15
PS> $shot2 = 15..1
PS> Compare-Object $shot1 $shot2 -syncWindow 7
PS> Compare-Object $shot1 $shot2 -syncWindow 6
InputObject SideIndicator
----------- -------------
15 =>
1 <=
1 =>
15 <=
Here are a couple of things to note regarding SyncWindow:
- When SyncWindows is too low, Compare-Object returns false information and reports objects twice, once for each result set
- The default SyncWindow setting of 5 is sufficient only when you expect very small changes in your result sets
- To make sure you catch all matches, you would have to set SyncWindow to half of the number of expected differences. You can also set SyncWindow to a very large number like 1000 as catch-all. This however may cause long delays and a lot of memory consumption
- In PowerShell V2, the default syncWindow setting has been raised as a consequence of this
Picking Properties
Remember everything in PowerShell is represented as object, and objects have properties. If you don't care about properties, Compare-Object picks the information to use for comparison automatically. This may not be what you want. Have a look:
PS> $shot1 = Dir $home
PS> Add-Content $home\testfile1.txt "Another line"
PS> $shot2 = Dir $home
PS> Compare-Object $shot1 $shot2
PS> Compare-Object $shot1 $shot2 -property Name, Length
Name Length SideIndicator
---- ------ -------------
testfile1.txt 26 =>
testfile1.txt 12 <=
This is actually a little modification to the example script earlier. I create a folder snapshot, then I add a line to the file I created in the earlier example. Next, I create another snapshot and compare both. The result is: nothing. Why?
Because I did not pick an object property. So Compare-Object simply looked at the file name, and since I added a line to an existing file, no new file name was created.
To monitor file changes, I need to explicitly tell Compare-Object to compare both the name and the Length property. Once I do that, I get back two results. The SideIndicator tells me that the file testfile1.txt was 12 Bytes in the initial snapshot and now is 26 Bytes.
Working With Results
The results delivered by Compare-Object are custom objects returning the properties you selected as well as the SideIndicator property. You can filter and return only selected information. For example, if you'd like to filter the result to show only new elements (SideIndicator equals "=>"), use Where-Object like this:
PS> Compare-Object $shot1 $shot2 -property Name, Length | Where-Object { $_.SideIndicator -eq '=>' }
Name Length SideIndicator
---- ------ -------------
testfile1.txt 26 =>
You could also use the -passThru parameter to actually return the original objects compared by Compare-Object. When you do this, the SideIndicator property is appended to the original object and you can still use it for filtering:
PS> Compare-Object $shot1 $shot2 -property name, length | fl *
name : testfile.txt
length : 51
SideIndicator : =>
name : testfile.txt
length : 39
SideIndicator : <=
PS> Compare-Object $shot1 $shot2 -property name, length -passThru
Directory: Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 1/9/2009 11:38 AM 51 testfile.txt
-a--- 1/9/2009 11:34 AM 39 testfile.txt
PS> Compare-Object $shot1 $shot2 -property name, length -passThru | fl *
PSPath : Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias\testfile.txt
PSParentPath : Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias
PSChildName : testfile.txt
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
SideIndicator : =>
Mode : -a---
Name : testfile.txt
Length : 51
DirectoryName : C:\Users\Tobias
Directory : C:\Users\Tobias
IsReadOnly : False
Exists : True
FullName : C:\Users\Tobias\testfile.txt
Extension : .txt
CreationTime : 1/9/2009 11:31:05 AM
CreationTimeUtc : 1/9/2009 10:31:05 AM
LastAccessTime : 1/9/2009 11:31:05 AM
LastAccessTimeUtc : 1/9/2009 10:31:05 AM
LastWriteTime : 1/9/2009 11:38:17 AM
LastWriteTimeUtc : 1/9/2009 10:38:17 AM
Attributes : Archive
PSPath : Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias\testfile.txt
PSParentPath : Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias
PSChildName : testfile.txt
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
PSIsContainer : False
SideIndicator : <=
Mode : -a---
Name : testfile.txt
Length : 39
DirectoryName : C:\Users\Tobias
Directory : C:\Users\Tobias
IsReadOnly : False
Exists : True
FullName : C:\Users\Tobias\testfile.txt
Extension : .txt
CreationTime : 1/9/2009 11:31:05 AM
CreationTimeUtc : 1/9/2009 10:31:05 AM
LastAccessTime : 1/9/2009 11:31:05 AM
LastAccessTimeUtc : 1/9/2009 10:31:05 AM
LastWriteTime : 1/9/2009 11:34:58 AM
LastWriteTimeUtc : 1/9/2009 10:34:58 AM
Attributes : Archive
PS> Compare-Object $shot1 $shot2 -property name, length -passThru | Where-Object { $_.SideIndicator -eq '=>' }
Directory: Microsoft.PowerShell.Core\FileSystem::C:\Users\Tobias
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 1/9/2009 11:38 AM 51 testfile.txt
Persisting Comparison Information
All comparisons so far have taken place in memory because we created the result sets on the fly. What if I'd like to compare folder content against a predefined base set?
You can easily do that. Simply export the result sets to XML, then reload them and do the comparison. There are three important rules when you do that:
- Use Select-Object to select only the object properties you really need for your comparison prior to exporting objects to XML or else the XML will be very large
- The result sets you compare both need to be written to XML and re-imported. Do not compare an imported XML result set against a live result set because object types are different
- Specify the properties you want to compare when you use Compare-Object
Here is an example of persisting result sets. First, I create a snapshot of my $home drive and export it as XML. Since I am interested in new files and changed files, I only export Name and Length.
Next, whenever I am in need, I can import the base folder set and compare it against the current folder content. To do that, I export the current folder content as XML, too, and reimport it so that
PS> Dir $home | Select-Object Name, Length | Export-Clixml $home\baseline.xml
PS> Add-Content $home\testfile.txt "Hello World"
PS> $shot1 = Import-Clixml $home\baseline.xml
PS> Dir $home | Select-Object Name, Length | Export-Clixml $home\temp.xml
PS> $shot2 = Import-Clixml $home\temp.xml
PS> Compare-Object $shot1 $shot2 -property Name, Length
Name Length SideIndicator
---- ------ -------------
baseline.xml 58150 =>
temp.xml 45056 =>
testfile.txt 39 =>
baseline.xml 8192 <=
temp.xml 58152 <=
testfile.txt 26 <=
Note that in this example, since I stored the xml files in the same folder I monitored, they will also show up in the result set.
Summary
Compare-Object is a great way of comparing result sets. You just need to be careful to make sure:
- you are comparing the same object types (do not mix imported xml data with live data)
- the syncWindow is large enough to cover the number of expected differences
- you specify the properties you really want to compare
Use the SideIndicator property to filter result so you only get the changes in one of the result sets. And use the -passThru parameter to get the real objects.
Cheerio
-Tobias
Posted
Jan 09 2009, 01:47 AM
by
Tobias Weltner