Classifique seqüência de data e hora no formato de 12 horas

1

Eu tenho um arquivo de log com conteúdo como este:

11-12-2014 - 03:03:59AM lat = 41.990516; lon = -93.430704<br>
11-12-2014 - 03:05:15AM lat = 41.001546; lon = -93.443352<br>
11-12-2014 - 03:11:50AM lat = 42.039054; lon = -93.442001<br>
11-12-2014 - 12:08:03AM lat = 41.937911; lon = -93.369249<br>
11-12-2014 - 12:11:29AM lat = 41.949656; lon = -93.329133<br>
11-12-2014 - 12:23:02AM lat = 42.025385; lon = -93.347026<br>
11-12-2014 - 12:29:10AM lat = 41.033341; lon = -93.380586<br>
11-12-2014 - 12:38:08AM lat = 41.036720; lon = -93.436851<br>
11-12-2014 - 12:45:20AM lat = 41.998129; lon = -93.400943<br>
11-12-2014 - 12:53:36AM lat = 41.961489; lon = -93.414624<br>

Como posso converter isso para 24 horas e classificá-lo corretamente?

    
por Robert Altman 12.11.2014 / 16:45

4 respostas

0

com perl :

$ perl -MTime::Piece -anle '
  $F[2] = Time::Piece->strptime($F[2],"%r")->strftime("%H:%M:%S");
  push @out, [$F[2]."-".join("-", reverse(split("-",$F[0]))), join(" ",@F)];
  END {
    print for map  { $_->[1] }
              sort { $a->[0] cmp $b->[0] } @out;
}' file
11-12-2014 - 00:08:03 lat = 41.937911; lon = -93.369249<br>
11-12-2014 - 00:11:29 lat = 41.949656; lon = -93.329133<br>
11-12-2014 - 00:23:02 lat = 42.025385; lon = -93.347026<br>
11-12-2014 - 00:29:10 lat = 41.033341; lon = -93.380586<br>
11-12-2014 - 00:38:08 lat = 41.036720; lon = -93.436851<br>
11-12-2014 - 00:45:20 lat = 41.998129; lon = -93.400943<br>
11-12-2014 - 00:53:36 lat = 41.961489; lon = -93.414624<br>
11-12-2014 - 03:03:59 lat = 41.990516; lon = -93.430704<br>
11-12-2014 - 03:05:15 lat = 41.001546; lon = -93.443352<br>
11-12-2014 - 03:11:50 lat = 42.039054; lon = -93.442001<br>
    
por 12.11.2014 / 17:30
0

Você pode fazer isso usando o comando GNU date . Pode usar strings e imprimir a data correspondente:

$ date -d "11/11/2014 04:12:03PM"
Tue Nov 11 16:12:03 CET 2014

Observe, no entanto, que ele não gosta de DD-MM-YYY :

$ date -d "11-11-2014"
date: invalid date ‘11-11-2014’

Portanto, primeiro execute um sed em seu arquivo para substituir todo - por / . Em seguida, passe isso por read para obter cada campo em uma variável separada, converter e classificar:

$ sed 's#-#/#g' file | while read date _ hour rest; do 
    echo "$(date -d "$date $hour" +"%F - %R:%S") $rest"
  done | sort -h
2014-11-12 - 00:08:03 lat = 41.937911; lon = /93.369249<br>
2014-11-12 - 00:11:29 lat = 41.949656; lon = /93.329133<br>
2014-11-12 - 00:23:02 lat = 42.025385; lon = /93.347026<br>
2014-11-12 - 00:29:10 lat = 41.033341; lon = /93.380586<br>
2014-11-12 - 00:38:08 lat = 41.036720; lon = /93.436851<br>
2014-11-12 - 00:45:20 lat = 41.998129; lon = /93.400943<br>
2014-11-12 - 00:53:36 lat = 41.961489; lon = /93.414624<br>
2014-11-12 - 03:03:59 lat = 41.990516; lon = /93.430704<br>
2014-11-12 - 03:05:15 lat = 41.001546; lon = /93.443352<br>

Isso funcionará no seu exemplo, mas falhará se você também precisar classificar fevereiro ( 02 ) antes de novembro ( 11 ). Então, um truque seria imprimir as datas como segundos desde a época , classificar isso e depois removê-las:

$ sed 's#-#/#g' file | while read date _ hour rest; do 
  printf "%s\t%s %s\n" "$(date -d "$date $hour" +"%s")" "$date - $hour" "$rest"
done | sort | cut -f 2-
11/12/2014 - 12:08:03AM lat = 41.937911; lon = /93.369249<br>
11/12/2014 - 12:11:29AM lat = 41.949656; lon = /93.329133<br>
11/12/2014 - 12:23:02AM lat = 42.025385; lon = /93.347026<br>
11/12/2014 - 12:29:10AM lat = 41.033341; lon = /93.380586<br>
11/12/2014 - 12:38:08AM lat = 41.036720; lon = /93.436851<br>
11/12/2014 - 12:45:20AM lat = 41.998129; lon = /93.400943<br>
11/12/2014 - 12:53:36AM lat = 41.961489; lon = /93.414624<br>
11/12/2014 - 03:03:59AM lat = 41.990516; lon = /93.430704<br>
11/12/2014 - 03:05:15AM lat = 41.001546; lon = /93.443352<br>
11/12/2014 - 03:11:50AM lat = 42.039054; lon = /93.442001<br>

Ou para imprimir as datas no formato 24H:

$ sed 's#-#/#g' file | while read date _ hour rest; do 
    printf "%s\t%s %s\n" "$(date -d "$date $hour" +"%s")" \ 
    "$(date -d "$date $hour" +"%F - %R:%S")" "$rest"
  done | sort | cut -f 2-
2014-11-12 - 00:08:03 lat = 41.937911; lon = /93.369249<br>
2014-11-12 - 00:11:29 lat = 41.949656; lon = /93.329133<br>
2014-11-12 - 00:23:02 lat = 42.025385; lon = /93.347026<br>
2014-11-12 - 00:29:10 lat = 41.033341; lon = /93.380586<br>
2014-11-12 - 00:38:08 lat = 41.036720; lon = /93.436851<br>
2014-11-12 - 00:45:20 lat = 41.998129; lon = /93.400943<br>
2014-11-12 - 00:53:36 lat = 41.961489; lon = /93.414624<br>
2014-11-12 - 03:03:59 lat = 41.990516; lon = /93.430704<br>
2014-11-12 - 03:05:15 lat = 41.001546; lon = /93.443352<br>
2014-11-12 - 03:11:50 lat = 42.039054; lon = /93.442001<br>
    
por 12.11.2014 / 17:07
0

Com python e o módulo dateutil ( pip install dateutil ), você pode fazer a classificação diretamente nos objetos datetime:

#! /usr/bin/env python
import sys
from dateutil.parser import parse

lines = []
for line in open(sys.argv[1]):
    d, rest = line[:24], line[24:]
    lines.append((parse(d), rest))

for x in sorted(lines):
    print x[0], x[1],

comece com python program.py inputfile

Isso pode ser feito sem dateutil , mas a vantagem de usá-lo é que você não precisa especificar o formato de tempo de entrada, desde que não seja ambíguo.

    
por 12.11.2014 / 17:28
0

Eu usaria awk com sort :

awk '{while("date +%T -d" $3|getline x){$3=x}}1' logfile | sort -t- -n -k3 -k1 -k2

Vamos primeiro modificar um pouco o seu log para ter datas diferentes:

11-12-2010 - 03:03:59AM lat = 41.990516; lon = -93.430704
11-12-1998 - 03:05:15AM lat = 41.001546; lon = -93.443352
11-12-2030 - 03:11:50AM lat = 42.039054; lon = -93.442001
11-12-2014 - 12:08:03AM lat = 41.937911; lon = -93.369249
11-12-2014 - 12:11:29AM lat = 41.949656; lon = -93.329133
11-11-2014 - 12:23:02AM lat = 42.025385; lon = -93.347026
11-12-2011 - 12:29:10AM lat = 41.033341; lon = -93.380586
11-12-2011 - 12:38:08AM lat = 41.036720; lon = -93.436851
10-12-2014 - 12:45:20AM lat = 41.998129; lon = -93.400943
11-12-2014 - 12:53:36AM lat = 41.961489; lon = -93.414624

O resultado seria:

11-12-1998 - 03:05:15 lat = 41.001546; lon = -93.443352
11-12-2010 - 03:03:59 lat = 41.990516; lon = -93.430704
11-12-2011 - 00:29:10 lat = 41.033341; lon = -93.380586
11-12-2011 - 00:38:08 lat = 41.036720; lon = -93.436851
10-12-2014 - 00:45:20 lat = 41.998129; lon = -93.400943
11-11-2014 - 00:23:02 lat = 42.025385; lon = -93.347026
11-12-2014 - 00:08:03 lat = 41.937911; lon = -93.369249
11-12-2014 - 00:11:29 lat = 41.949656; lon = -93.329133
11-12-2014 - 00:53:36 lat = 41.961489; lon = -93.414624
11-12-2030 - 03:11:50 lat = 42.039054; lon = -93.442001

O truque aqui é usar - como um separador de campo e classificar primeiro no terceiro campo (ano) e depois no primeiro (mês) e depois no segundo (dia). Todos na classificação são numéricos ( -n opção).

    
por 12.11.2014 / 17:15