Speeding Up Your Scripts!

Every now and then, users come up with questions like this one: "Why is Get-Content so slow? Reading in large text files can take forever!" This is a typical observation, and it illustrates a more general underlying problem: "Why is PowerShell sometimes so slow?".

Easy answer: it is not! You just need to know a little bit about different script designs and may be able to improve performance of your PowerShell scripts drastically, not just for Get-Content. So although you may have seen a lot of good suggestions regarding improving Get-Contents performance, this by itself will get you nowhere and provide no speed improvement. What really makes your scripts fly is a good understanding of what really causes delays , and which script design to use to tackle large amounts of data.

We'll look at this by first examining why Get-Content seems to be slow, and once we have that answer, breaking this down to general PowerShell best practices. At the end you'll know what to do to increase your script performance considerably.

Prerequisites: Measuring Commands

First of all, try to not mix things up. When examining performance, make sure you do not accidentally measure the time the console takes to display information. That's what Measure-Command is for. This cmdlet reports the time a scriptblock took to execute, but it will never output any information.

Time to get some real data:

PS> Measure-Command { Get-Content $env:windir\windowsupdate.log }


Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 239
Ticks : 2392047
TotalDays : 2,76857291666667E-06
TotalHours : 6,644575E-05
TotalMinutes : 0,003986745
TotalSeconds : 0,2392047
TotalMilliseconds : 239,2047

This line tells you how long it takes Get-Content to read the entire windowsupdate.log file which usually is pretty large (as it logs all updates received and installed by Windows). Run this line a couple of times because file I/O speeds can vary a bit, depending on caching, how busy the drive is, etc.

First: Speeding Up Get-Content

Get-Content is slow because it reads text files line by line and emits every single line to the PowerShell pipeline. You can change that by using the ReadCount parameter and asking Get-Content to deliver more than one line at a time. Check out the performance when you set ReadCount to 0, effectively asking Get-Content to deliver all lines in one string array:

PS> Measure-Command { Get-Content $env:windir\windowsupdate.log -ReadCount 0 }


Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 59
Ticks : 593053
TotalDays : 6,86403935185185E-07
TotalHours : 1,64736944444444E-05
TotalMinutes : 0,000988421666666667
TotalSeconds : 0,0593053
TotalMilliseconds : 59,3053

This simple parameter seems to speed things up massively. The same file was read in a couple times faster. Sounds good, but as it turns out, this won't help you a bit. Reading files usually is just one step of many in your PowerShell solutions, and your little tune-up will evaporize when you build on top of it. Let's find out why.

Speeding Up Your PowerShell Scripts

At the end of the day, you won't want to just read in text information. Most likely, you want to process that information, and to do that, you most likely will use the PowerShell pipeline. As it turns out, it is really the pipeline that is causing the delay. You may be able to squeeze out a couple of milliseconds by using ReadCount, but it really is not significant. Let's first measure a plain call to Get-Content, and then include -ReadCount 0 to prove the point:

PS> Measure-Command { Get-Content $env:windir\windowsupdate.log | ^
Where-Object { $_ -like '*successfully installed*' } }

Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 555
Ticks : 15552740
TotalDays : 1,80008564814815E-05
TotalHours : 0,000432020555555556
TotalMinutes : 0,0259212333333333
TotalSeconds : 1,555274
TotalMilliseconds : 1555,274

PS> Measure-Command { Get-Content $env:windir\windowsupdate.log -ReadCount 0 |
Foreach-Object { $_ } | Where-Object { $_ -like '*successfully installed*' } }

Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 349
Ticks : 13490398
TotalDays : 1,56138865740741E-05
TotalHours : 0,000374733277777778
TotalMinutes : 0,0224839966666667
TotalSeconds : 1,3490398
TotalMilliseconds : 1349,0398

The same will happen when you resort to low level .NET methods. These read in text content lightning-fast, but once the data is processed by the PowerShell pipeline, you get the very same performance penalty.

Optimizing Get-Content with ReadCount really makes only sense when you plan not to use the PowerShell pipeline to process the data. Let's be clear: the pipeline is a wonderful and useful concept with very low memory consumption, but when you need to process massive amounts of data and speed is an issue, you should avoid the pipeline and be aware that there are other more suitable designs for that.

So to speed up things, this time we are using ReadCount in conjunction with an old-fashioned foreach loop rather than using the pipeline. Here are the results:

PS> Measure-Command { Get-Content $env:windir\windowsupdate.log -ReadCount 0 |
Foreach-Object { $_ } | Where-Object { $_ -like '*successfully installed*' } }

Days : 0
Hours : 0
Minutes : 0
Seconds : 1
Milliseconds : 358
Ticks : 13586396
TotalDays : 1,57249953703704E-05
TotalHours : 0,000377399888888889
TotalMinutes : 0,0226439933333333
TotalSeconds : 1,3586396
TotalMilliseconds : 1358,6396

PS> Measure-Command { foreach ($line in (
Get-Content $env:windir\windowsupdate.log -ReadCount 0
)) {
if ($line -like '*successfully installed*')
{ $line } }
}

Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 270
Ticks : 2706478
TotalDays : 3,13249768518519E-06
TotalHours : 7,51799444444444E-05
TotalMinutes : 0,00451079666666667
TotalSeconds : 0,2706478
TotalMilliseconds : 270,6478

Speeding up the task was not a matter of simply changing ReadCount but rather to employ a completely different script design. Here is what we did:

foreach ($line in (
Get-Content $env:windir\windowsupdate.log -ReadCount 0
)
) {
if ($line -like '*successfully installed*')
{
$line
}
}

This piece of code is not using the PowerShell pipeline, thus avoiding the overhead and speed penalties associated with it. In this design, Get-Content lives up to maximum speed. Feel free to exchange the cmdlet by any super fast .NET direct call you may know. The result will not improve much more. Get-Content is fast. It is the pipeline overhead that is causing delays.

Summary

Reading large text files reveals a general design aspect of the PowerShell pipeline: it is slow because it has considerable overhead. That is not a bad thing. The pipeline produces readable code, is easy to use and very memory efficient. For many non-time-critical things it is the perfect approach. Yet, when you find yourself waiting for results because your script is taking too long, chances are the pipeline design might be the wrong approach for what you want to automate. Instead, take a look at the classic foreach constructs to process data outside the pipeline. It is not just a historic left-over. It is a powerful alternative.

Have fun! See you next time around,

Tobias

Microsoft MVP PowerShell Germany

P.S.
If you live in Germany or other parts of Europe and your company would like to set up a truly great PowerShell training, just contact me! I regularly train mid- to large-size companies. Trainings are always a blast with tons of real-world-examples and solutions. Here's how to get in touch with me: tobias.weltner@scriptinternals.de


Posted Nov 30 2010, 12:42 PM by Tobias
Copyright 2012 PowerShell.com. All rights reserved.