Como recolher números consecutivos em intervalos?

Question

Como recolher números consecutivos em intervalos?

#1 resposta do (5 votos)
#2 resposta do (4 votos)
#3 resposta do (3 votos)
#4 resposta do (3 votos)
#5 resposta do (2 votos)
#6 resposta do (2 votos)
#7 resposta do (2 votos)
#8 resposta do (1 votos)
#9 resposta do (1 votos)
#10 resposta do (1 votos)
#11 resposta do (1 votos)
#12 resposta do (0 votos)

6

Dado um arquivo de entrada ordenado (ou saída de comando) que contém números únicos, um por linha, eu gostaria de recolher todas as execuções de números consecutivos em intervalos tais que

n
n+1
...
n+m

torna-se

n,n+m

amostra de entrada:

resultado esperado:

text-processing numeric-data

por don_crissti 19.09.2018 / 17:16

12 respostas

Tags text-processing numeric-data

Como listar os arquivos que começam com “a” ou “c” de / etc. linux x.org alternative

score 5 · Answer 1

awk '
    function output() { print start (prev == start ? "" : ","prev) }
    NR == 1 {start = prev = $1; next}
    $1 > prev+1 {output(); start = $1}
    {prev = $1}
    END {output()}
'

score 4 · Answer 2

Com dc para o exercício mental:

dc -f "$1" -e '
[ q ]sB
z d 0 =B sc sa z sb
[ Sa lb 1 - d sb 0 <Z ]sZ
lZx
[ 1 sk lf 1 =O lk 1 =M ]sS
[ li p c 0 d sk sf ]sO
[ 2 sf lh d sj li 1 + !=O ]sQ
[ li n [,] n lj p c 0 sf ]sM
[ 0 sk lh sj ]sN
[ 1 sk lj lh 1 - =N lk 1 =M ]sR
[ 1 sf lh si ]sP
[ La sh lc 1 - sc lf 2 =R lf 1 =Q lf 0 =P lc 0 !=A ]sA
lAx
lSx
'

score 3 · Answer 3

awk , com uma abordagem diferente (mais C ):

awk '{ do{ for(s=e=$1; (r=getline)>0 && $1<=e+1; e=$1); print s==e ? s : s","e }while(r>0) }' file

a mesma coisa, ainda menos mal-humorada:

awk 'BEGIN{
    for(r=getline; r>0;){
        for(s=e=$1; (r=getline)>0 && $1<=e+1; e=$1);
        print s==e ? s : s","e
    }
    exit -r
}' file

score 3 · Answer 4

Usando Perl substitute com eval (Desculpe pelo ofuscamento ...):

perl -0pe 's/(\d+)\n(?=(\d+))/ $1+1==$2 ? "$1," : $& /ge; 
           s/,.*,/,/g' ex

a primeira substituição cria linhas com "," seqüências int consecutivas separadas;
segunda substituição, remove os números do meio.

score 2 · Answer 5

Outra abordagem awk (uma variação da resposta de glenn ):

awk '
    function output() { print start (start != end? ","end : "") }
    end==$0-1 || end==$0 { end=$0; next }
    end!=""{ output() }
    { start=end=$0 }
END{ output() }' infile

score 2 · Answer 6

Uma alternativa no awk:

<infile sort -nu | awk '
     { l=p=$1 }
     { while ( (r=getline) >= 0 ){
           if ( $1 == p+1 ) { p=$1;  continue };
           print ( l==p ? l : l","p );
           l=p=$1
           if(r==0){ break };
           }
       if (r == -1 ) { print "Unexpected error in reading file"; quit }
     }
    '

Em uma linha (sem verificação de erros):

<infile awk '{l=p=$1}{while((r=getline)>=0){if($1==p+1){p=$1;continue};print(l==p?l:l","p);l=p=$1;if(r==0){ break };}}'

Com comentários (e pré-processamento do arquivo para garantir uma lista única e classificada):

<infile sort -nu | awk '

     { l=p=$1 }    ## Only on the first line. The loop will read all lines.

     ## read all lines while there is no error.
     { while ( (r=getline) >= 0 ){

           ## If present line ($1) follows previous line (p), continue.
           if ( $1 == p+1 ) { p=$1;  continue };

           ### Starting a new range ($1>p+1): print the previous range.
           print ( l==p ? l : l","p );

           ## Save values in the variables left (l) and previous (p).
           l=p=$1

           ## At the end of the file, break the loop.
           if(r==0){ break };

           }

       ## All lines have been processed or got an error.
          if (r == -1 ) { print "Unexpected error in reading file"; quit }
     }
    '

score 2 · Answer 7

Ainda outra solução awk semelhante à outra:

#!/usr/bin/awk -f

function output() {
    # This function is called when a completed range needs to be
    # outputted. It will use the global variables rstart and rend.

    if (rend != "")
        print rstart, rend
    else
        print rstart
}

# Output field separator is a comma.
BEGIN { OFS = "," }

# At the start, just set rstart and prev (the previous line's number) to
# the first number, then continue with the next line.
NR == 1 { rstart = prev = $0; next }

# Calculate the difference between this line and the previous. If it's
# 1, move the end of the current range here.
(diff = $0 - prev) == 1 { rend = $0 }

# If the difference is more than one, then we're onto a new range.
# Output the range that we were processing and reset rstart and rend.
diff > 1 {
    output()

    rstart = $0
    rend = ""
   }

# Remember this line's number as prev before moving on to the next line.
{ prev = $0 }

# At the end, output the last range.
END { output() }

A variável rend não é realmente necessária, mas eu queria manter o máximo possível de lógica de alcance da função output() .

score 1 · Answer 8

Que tal

awk '
$0 > LAST+1     {if (NR > 1)  print (PR != LAST)?"," LAST:""
                 printf "%s", $0
                 PR = $0
                }
                {LAST  = $0
                }
END             {print (PR != LAST)?"," LAST:""
                }
' file
2,3
9,12
24
28,29
33

score 1 · Answer 9

Perl abordagem!

#!/bin/perl
    print ranges(2,3,9,10,11,12,24,28,29,33), "\n";

sub ranges {
    my @vals = @_;
    my $first = $vals[0];
    my $last;
    my @list;
    for my $i (0 .. (scalar(@vals)-2)) {
        if (($vals[$i+1] - $vals[$i]) != 1) {
            $last = $vals[$i];
            push @list, ($first == $last) ? $first : "$first,$last";
            $first = $vals[$i+1];
        }
    }
    $last = $vals[-1];
    push @list, ($first == $last) ? $first : "$first,$last";
    return join ("\n", @list);
}

score 1 · Answer 10

Ferramentas de software feias bash código shell, em que arquivo é o arquivo de entrada:

diff -y file <(seq $(head -1 file) $(tail -1 file))  |  cut -f1  | 
sed -En 'H;${x;s/([0-9]+)\n([0-9]+\n)*([0-9]+)/,/g;s/\n\n+/\n/g;s/^\n//p}'

Ou com wdiff :

wdiff -12 file <(seq $(head -1 file) $(tail -1 file) ) | 
sed -En 'H;${x;s/([0-9]+)\n([0-9]+\n)*([0-9]+)/,/g;s/=+\n\n//g;s/^\n//p}'

Como funcionam: Faça uma lista sequencial sem intervalos com seq usando o primeiro e último números no arquivo de entrada , (porque arquivo já está classificado), e diff faz a maior parte do trabalho. O código sed é principalmente apenas a formatação e a substituição de números entre uma vírgula.

Para um problema relacionado, que é o inverso deste, veja: Encontrando lacunas em números seqüenciais

score 1 · Answer 11

Uma discussão agradável de 2001 no perlmonks.org, e adaptada para ler a partir de STDIN ou arquivos nomeados no comando linha (como Perl costuma fazer):

#!/usr/bin/env perl
use strict;
use warnings;
use 5.6.0;  # for (??{ ... })
sub num2range {
  local $_ = join ',' => @_;
  s/(?<!\d)(\d+)(?:,((??{$++1}))(?!\d))+/$1-$+/g;
  tr/-,/,\n/;
  return $_;
}
my @list;
chomp(@list = <>);
my $range = num2range(@list);
print "$range\n";

score 0 · Answer 12

Em um site "Unix & Linux", um script de shell simples, legível e puro (bash) parece mais adequado para mim:

#!/bin/bash

inputfile=./input.txt

unset prev begin
while read num ; do
    if [ "$prev" = "$((num-1))" ] ; then
        prev=$num
    else
        if [ "$begin" ] ; then
            [ "$begin" = "$prev" ] && echo "$prev" || echo "$begin,$prev"
        fi
        begin=$num
        prev=$num
    fi
done < $inputfile