Use
gc file.csv | ? {$_.trim() -ne "" } | set-content file_trimmed.csv
O que há de errado no comando original (parafraseado Excluir todos os espaços em branco linhas de um arquivo de texto usando o PowerShell no blog do PowerShell de Tim Curwick ):
The parentheses around the
Get-Content
statement force it to finish loading the whole contents into an object before sending them down the pipeline. (If we are writing to a different file than we were reading from, we could speed up the command by eliminating the parentheses, thus allowing us to read from the one and write to the other simultaneously.)
Script de teste 1264263.ps1
mede apenas leitura um arquivo grande e omite a gravação em um resultado:
param (
[Parameter()][string]$file = 'green_tripdata_2014-03.csv'
)
Push-Location 'D:\test'
#$file = 'green_tripdata_2014-03.csv'
"$file': {0:N3} KiB" -f $((Get-Item $file).Length /1024 )
' GC $file :' + ' {0:N7} sec' -f (Measure-Command {
$y = Get-Content $file
}).TotalSeconds
Start-Sleep -Seconds 1
' GC $file | ? {$_.trim()} :' + ' {0:N7} sec' -f (Measure-Command {
$y = (Get-Content $file |
Where-Object {$_.trim()}) #| Set-Content "$file2"
}).TotalSeconds
Start-Sleep -Seconds 1
' GC $file | ? {$_.trim() -ne ""} :' + ' {0:N7} sec' -f (Measure-Command {
$y = (Get-Content $file |
Where-Object {$_.trim() -ne "" }) #| Set-Content "$file2"
}).TotalSeconds
Start-Sleep -Seconds 1
'(GC $file) | ? {$_.trim() -ne ""} :' + ' {0:N7} sec' -f (Measure-Command {
$y = (Get-Content $file) |
Where-Object {$_.trim() -ne ""} #| Set-Content "$file2"
}).TotalSeconds
Pop-Location
Output mostra que o comando melhorado (case # 3) pode funcionar com cca 10 vezes mais rápido que o original (case # 4):
PS D:\PShell> D:\PShell\SU64263.ps1
green_tripdata_2014-03.csv: 197,355.560 KiB
GC $file : 27.4584778 sec
GC $file | ? {$_.trim()} : 59.2003851 sec
GC $file | ? {$_.trim() -ne ""} : 61.0429012 sec
(GC $file) | ? {$_.trim() -ne ""} : 615.8580773 sec
PS D:\PShell>