This afternoon, my buddy Alexandar (and fellow PowerShell MVP from Serbia) reviewed another batch of PowerTips when he stumbled across some lines of code that looked a bit unintuitive to him and asked to rephrase them. Sometimes there is more than one way to Rome. There is nothing bad about having choices. Today I'd like to show how to pick the fastest way to Rome - and how you can speed up your PowerShell code tremendously by knowing about some easy design rules.
Understanding The PowerShell Pipeline
The PowerShell pipeline is a great feature but many users aren't aware that the pipeline actually is a throttling mechanism designed to minimize memory consumption. When you read in a large file, for example, by reading this file line-by-line and processing each line over the pipeline, only one line at a time needs to be stored in memory. That's why PowerShell can easily read and process huge log files.
The backside is that speed is fueled by memory and CPU. The memory throttling done by the PowerShell pipeline takes away memory and reduces speed. And the results can be significant. Have a look:
PS> 1..100 | Get-Random
66
PS> 1..10000 | Get-Random
9746
PS> 1..1000000 | Get-Random
509967
All of these lines generate a range of numbers and then pick one random number from it. When you try this for yourself, you'll notice that the first two commands run almost instantaneously. The third line however takes forever. The overhead produced by the PowerShell pipeline accumulates with each element that needs to travel across it. So in the last command, one million numbers need to travel one by one to Get-Random, and that takes a lot of time.
In many scenarios, the great memory-saving aspect of the PowerShell pipeline is not important at all. In our current example, it definitely does not matter. So here you can speed up things easily by avoiding the pipeline and passing the data directly to the parameter that would else receive it over the pipeline:
PS> Get-Random -InputObject (1..1000000)
770786
Notice how the very same operation takes only a fraction of a second instead of many seconds.
Every Day Scenarios To Save Time
There are tons of every day scenarios that you can speed up this way. Let's start with a very common one: discarding data. When you do not need data, you have the choice of piping it to Out-Null or assigning it to the special variable $null. Guess which way is faster? Test for yourself:
PS> (Measure-Command {1..100000 | Out-Null }).TotalMilliseconds
1114,3006
PS> (Measure-Command {$null = 1..100000 }).TotalMilliseconds
29,3192
PS> (Measure-Command {1..100000 > $null }).TotalMilliseconds
38,1542
PS> (Measure-Command {[void](1..100000) }).TotalMilliseconds
28,2378
All of these lines do the same: they dump data. The first approach uses the pipeline and takes over a second! Assigning the data to $null just takes lightning fast 30ms. Holy Moly. Of course the performance difference depends on how much data you need to dump. Often, it is much less than 100000 elements, so the performance gain becomes less impressive. However, just imagine this statement as part of a loop that runs a number of times. Each time the loop runs, the peformance penalty sums up.
Here is yet another common scenario: a loop. Some people use the pipeline to create loops:
PS> 1..100000 | ForEach-Object { "looping for the $_. time" }
On my machine, it takes approximately 7 seconds (on yours it can be much faster, the point is just the relative comparison).
The same loop can also be written without a pipeline, like this:
PS> for ($x=1; $x -le 100000; $x++) {
"looping for the $x. time"
}
This one takes just 0,5 seconds. That's a heck of a noticable speed increase.
The Pipeline Is EveryWhere
Sometimes, it is not so apparent that PowerShell is using the Pipeline when it really does. Here is some typical code:
PS> (Measure-Command {
Get-Content $env:windir\windowsupdate.log |
Foreach-Object { "reading line $_" }
}).TotalSeconds
1.2655871
Compare this to the version that avoids the pipeline:
PS> (Measure-Command {
$all = Get-Content $env:windir\windowsupdate.log -ReadCount 0
Foreach($line in $all) { "reading line $line" }
}).TotalSeconds
0.0848887
Again, more than 14 times faster. The thing to watch out here is to use Get-Content with the parameter -ReadCount 0. Without it, Get-Content by default is optimized for the pipeline and emits each line as a single object, causing again a lot of work for the pipeline.
Conclusions
The PowerShell pipeline is a great feature! In the last example, thanks to the pipeline, just one line of a file needed to be stored in memory. The faster version without the pipeline, in contrast, needed to hold the entire text file in memory (variable $all). So this article is not about avoiding the pipeline. It is about using the PowerShell resources wisely.
You have seen that there is a tradeoff: speed or small memory footprint. Use the pipeline if you must conserve memory. Avoid the pipeline if you need speed. And take a look again at some of the initial examples: there are a lot of standard code scenarios where conserving memory really isn't an issue. So by not using the Pipeline here, you can improve performance easily.
Stay tuned...!
Tobias
Microsoft MVP PowerShell Germany
P.S.
If you live in Germany or other parts of Europe and your company would like to set up a truly great PowerShell training, just contact me! I regularly train mid- to large-size companies. Trainings are always a blast with tons of real-world-examples and solutions. Here's how to get in touch with me: tobias.weltner@scriptinternals.de
Posted
May 01 2012, 11:35 PM
by
Tobias