Este script perl
deve fazer o que você deseja: Dada uma lista separada por NUL de nomes de arquivos (por exemplo, find -print0
), mostre uma lista dos nomes de arquivos modificados mais recentemente, contanto que o tamanho total desses arquivos não excede 1 GB (padrão). Você pode especificar o número de shows para o tamanho máximo na linha de comando - isso pode ser qualquer número válido, número inteiro ou ponto flutuante.
O separador NUL significa que isso funcionará com qualquer nome de arquivo, mesmo que contenham espaços ou novas linhas.
$ cat select-newest-one-gig.pl
#! /usr/bin/perl -0
use strict;
my $gigs = shift || 1;
my $maxsize = $gigs * 1024 * 1024 * 1024 ; # 1GB
my $total = 0;
# a hash to contain the list of input filenames and their modtimes
my %filemtimes=();
# hash to contain the list of input filenames and their sizes
my %filesizes=();
# a hash to contain a list of filenames to output.
# use a hash for this so we don't need to write a 'uniq' function.
my %outfiles=();
while (<>) {
chomp;
# 7th field of stat() is size in bytes.
# 9th field of stat() is modime in secs since epoch
my ($size,$mtime) = (stat($_))[7,9];
$filesizes{$_} = $size;
$filemtimes{$_} = $mtime;
}
# iterate through the %filemtimes hash in order of reverse mtime
foreach (reverse sort { $filemtimes{$b} <=> $filemtimes{$a} } keys %filemtimes) {
my $size = $filesizes{$_};
# add it to our list of filenames to print if it won't exceed $maxsize
if (($size + $total) <= $maxsize) {
$total += $size;
$outfiles{$_}++;
}
}
# now iterate through the %filesizes hash in order of reverse size
# just in case we can sequeeze in a few more files.
foreach (reverse sort { $filesizes{$b} <=> $filesizes{$a} } keys %filesizes) {
my $size = $filesizes{$_};
if (($size + $total) < $maxsize) {
$total += $size;
$outfiles{$_}++;
}
}
# now print our list of files. choose one of the following, for
# newline separated filenames or NUL-separated.
#print join("\n", sort keys %outfiles), "\n";
print join("find /volume1/cctv/ -type f -iname '*.mp4' -print0 | ./select-newest-one-gig.pl 10
0", sort keys %outfiles), "$ cat unlink-others.pl
#! /usr/bin/perl -0
use strict;
my @files=();
# first arg is target dir, with default
my $targetdir = shift || '/path/to/rsync/target/dir/';
while (<>) {
chomp;
s/^.*\///; # strip path
push @files, quotemeta($_)
}
my $regexp=join("|",@files);
opendir(my $dh, $targetdir) || die "can't opendir $targetdir: $!\n";
my @delete = grep { ! /^($regexp)$/o && -f "$targetdir/$_" } readdir($dh);
closedir $dh;
print join(", ",@delete),"\n";
# uncomment next line if you're sure it will only delete what you want
# unlink @delete
0";
Salvar como select-newest-one-gig.pl
e torná-lo executável com chmod +x
.
Execute-o assim (por exemplo, para um tamanho total máximo de 10 GB):
find /volume1/cctv/ -type f -iname '*.mp4' -print0 | \
./select-newest-one-gig.pl 10 > /tmp/files.list
rsync --from0 --files-from /tmp/files.list ... /path/to/rsync/target/dir/
./unlink-others.pl /path/to/rsync/target/dir/ < /tmp/files.list
Esse script perl pode ser facilmente modificado para usar uma ou mais extensões de nome de arquivo (por exemplo, .mp4
) como args e, em seguida, ser executado usando a chamada de função system()
e iterar em vez de while (<>)
. Provavelmente é mais simples apenas canalizar a saída de find
para ele - por que reinventar a roda?
O seguinte script perl listará (ou excluirá se você descomentar a última linha) arquivos que existem no diretório de destino do rsync que não estavam listados no stdin. Ele assume entrada separada por NUL, portanto, é seguro mesmo com nomes de arquivos que contenham novas linhas.
$ cat select-newest-one-gig.pl
#! /usr/bin/perl -0
use strict;
my $gigs = shift || 1;
my $maxsize = $gigs * 1024 * 1024 * 1024 ; # 1GB
my $total = 0;
# a hash to contain the list of input filenames and their modtimes
my %filemtimes=();
# hash to contain the list of input filenames and their sizes
my %filesizes=();
# a hash to contain a list of filenames to output.
# use a hash for this so we don't need to write a 'uniq' function.
my %outfiles=();
while (<>) {
chomp;
# 7th field of stat() is size in bytes.
# 9th field of stat() is modime in secs since epoch
my ($size,$mtime) = (stat($_))[7,9];
$filesizes{$_} = $size;
$filemtimes{$_} = $mtime;
}
# iterate through the %filemtimes hash in order of reverse mtime
foreach (reverse sort { $filemtimes{$b} <=> $filemtimes{$a} } keys %filemtimes) {
my $size = $filesizes{$_};
# add it to our list of filenames to print if it won't exceed $maxsize
if (($size + $total) <= $maxsize) {
$total += $size;
$outfiles{$_}++;
}
}
# now iterate through the %filesizes hash in order of reverse size
# just in case we can sequeeze in a few more files.
foreach (reverse sort { $filesizes{$b} <=> $filesizes{$a} } keys %filesizes) {
my $size = $filesizes{$_};
if (($size + $total) < $maxsize) {
$total += $size;
$outfiles{$_}++;
}
}
# now print our list of files. choose one of the following, for
# newline separated filenames or NUL-separated.
#print join("\n", sort keys %outfiles), "\n";
print join("find /volume1/cctv/ -type f -iname '*.mp4' -print0 | ./select-newest-one-gig.pl 10
0", sort keys %outfiles), "$ cat unlink-others.pl
#! /usr/bin/perl -0
use strict;
my @files=();
# first arg is target dir, with default
my $targetdir = shift || '/path/to/rsync/target/dir/';
while (<>) {
chomp;
s/^.*\///; # strip path
push @files, quotemeta($_)
}
my $regexp=join("|",@files);
opendir(my $dh, $targetdir) || die "can't opendir $targetdir: $!\n";
my @delete = grep { ! /^($regexp)$/o && -f "$targetdir/$_" } readdir($dh);
closedir $dh;
print join(", ",@delete),"\n";
# uncomment next line if you're sure it will only delete what you want
# unlink @delete
0";
Use assim:
find /volume1/cctv/ -type f -iname '*.mp4' -print0 | \
./select-newest-one-gig.pl 10 > /tmp/files.list
rsync --from0 --files-from /tmp/files.list ... /path/to/rsync/target/dir/
./unlink-others.pl /path/to/rsync/target/dir/ < /tmp/files.list