Obtenha linhas exclusivas do log usando grep e tail

1

Eu tenho o seguinte arquivo de log. Eu quero extrair as 10 últimas entradas exclusivas deste arquivo. É possível fazer com grep e tail?

2016-04-18 10:13:11,925 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.7088348025036650 on 711b3fb7d875:80
2016-04-18 10:13:12,383 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9277403071419588 on 711b3fb7d875:80
2016-04-18 10:13:14,000 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5617050735043505 on 711b3fb7d875:80
2016-04-18 10:13:18,305 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3502119403604215 on 711b3fb7d875:80
2016-04-18 10:13:25,571 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1448386101904803 on 711b3fb7d875:80
2016-04-18 10:13:42,529 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6017618280263232 on 711b3fb7d875:80
2016-04-18 10:21:20,257 (glastopf.glastopf) 150.70.188.165 requested GET / on 711b3fb7d875:80
2016-04-18 10:35:27,775 (glastopf.glastopf) 150.70.173.55 requested GET / on 711b3fb7d875:80
2016-04-18 10:44:21,799 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.8457383350172993 on 711b3fb7d875:80
2016-04-18 10:44:23,550 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2291251627482913 on 711b3fb7d875:80
2016-04-18 10:44:24,885 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9121516725350658 on 711b3fb7d875:80
2016-04-18 10:44:28,611 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6517709326810913 on 711b3fb7d875:80
2016-04-18 10:44:36,656 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3339893597346100 on 711b3fb7d875:80
2016-04-18 10:44:52,579 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9653746532564283 on 711b3fb7d875:80
2016-04-18 11:07:15,576 (glastopf.glastopf) 204.12.196.236 requested GET / on 711b3fb7d875:80
2016-04-18 11:14:46,990 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6323574164650954 on 711b3fb7d875:80
2016-04-18 11:14:49,798 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1343994230148844 on 711b3fb7d875:80
2016-04-18 11:14:50,923 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2092851733275502 on 711b3fb7d875:80
2016-04-18 11:14:54,015 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6364011485956100 on 711b3fb7d875:80
2016-04-18 11:15:02,021 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2105667716533854 on 711b3fb7d875:80
2016-04-18 11:15:17,763 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5330510476532333 on 711b3fb7d875:80
2016-04-18 11:45:51,204 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.7162577798366348 on 711b3fb7d875:80
2016-04-18 11:45:51,456 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.4097472747050946 on 711b3fb7d875:80
2016-04-18 11:45:53,562 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0435891326571879 on 711b3fb7d875:80
2016-04-18 11:45:57,368 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9764200678378154 on 711b3fb7d875:80
2016-04-18 11:46:05,598 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.2539390798717596 on 711b3fb7d875:80
2016-04-18 11:53:59,103 (glastopf.glastopf) 150.70.173.9 requested GET / on 711b3fb7d875:80
2016-04-18 12:16:07,343 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0022258971071879 on 711b3fb7d875:80
2016-04-18 12:16:07,411 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.6543056525672964 on 711b3fb7d875:80
2016-04-18 12:16:09,210 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0771392409002968 on 711b3fb7d875:80
2016-04-18 12:16:21,475 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.4621648610735409 on 711b3fb7d875:80
2016-04-18 12:16:37,413 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1810763849106982 on 711b3fb7d875:80
2016-04-18 12:46:31,160 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.0759114015016254 on 711b3fb7d875:80
2016-04-18 12:46:33,023 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.9823929541441208 on 711b3fb7d875:80
2016-04-18 12:46:42,262 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.1670975464416704 on 711b3fb7d875:80
2016-04-18 12:46:44,977 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.3061602425336546 on 711b3fb7d875:80
2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80

Então, basicamente, eu quero as 10 últimas entradas de log exclusivas, identificadas por IPs únicos.

EDIT. Exemplo das duas últimas entradas únicas:

2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80
    
por firepro20 18.04.2016 / 15:58

2 respostas

0

Usando sort com um pouco de ajuda de tac :

sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2

Para receber as últimas 10 entradas, envie para tail -10 no final:

sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2 | tail -10
  • -k opção de sort deixe-nos sort por número de campo separado por espaço como chave

  • tac inverterá as linhas do conteúdo de entrada, ou seja, a última será a primeira e a primeira será a última; isso é necessário, pois sort -u produzirá a primeira entrada como única ao usar a tecla sort , ou seja, nem todas as linhas têm conteúdo semelhante, mas elas correspondem a um campo específico

Exemplo:

$ sort -k4,4 file.log | tac | sort -uk4,4 | sort -k1,2
2016-04-18 10:21:20,257 (glastopf.glastopf) 150.70.188.165 requested GET / on 711b3fb7d875:80
2016-04-18 10:35:27,775 (glastopf.glastopf) 150.70.173.55 requested GET / on 711b3fb7d875:80
2016-04-18 11:07:15,576 (glastopf.glastopf) 204.12.196.236 requested GET / on 711b3fb7d875:80
2016-04-18 11:53:59,103 (glastopf.glastopf) 150.70.173.9 requested GET / on 711b3fb7d875:80
2016-04-18 12:47:00,555 (glastopf.glastopf) 115.239.248.245 requested GET http://zc.qq.com/cgi-bin/common/attr?id=260714&r=0.5692431772822398 on 711b3fb7d875:80
2016-04-18 12:50:34,078 (glastopf.glastopf) 150.70.188.178 requested GET / on 711b3fb7d875:80
    
por heemayl 18.04.2016 / 16:19
0

O comando uniq pode ser usado para eliminar todas as linhas consecutivas que são idênticas, no todo ou em parte. Por padrão, ele opera somente em linhas inteiras. Ou seja, se em um arquivo você tiver várias linhas idênticas consecutivas, uniq removerá as duplicatas.

$ cat foo.txt 
foo
foo
foo
bar
baz
baz
foo
foo
$ uniq foo.txt 
foo
bar
baz
foo

Para remover todas as linhas duplicadas, mesmo as não consecutivas, pode ser executado após sort :

$ sort foo.txt | uniq
bar
baz
foo

Alguns sinalizadores podem ser usados para considerar apenas uma parte de uma linha ao determinar duplicatas. Aqui, queremos considerar apenas endereços IP, que estão na quarta coluna, então primeiro precisamos informar uniq para desconsiderar as três primeiras colunas, isso é feito com o -f flag. E depois disso, precisamos dizer para considerar apenas os endereços IP. Essa pode ser complicada, porque só podemos dizer que considere um número fixo de caracteres (com o -w flag), mas os endereços IP podem variar em tamanho. Felizmente, isso não é um problema porque os endereços IP são sempre seguidos por requested , portanto, mesmo que os primeiros caracteres desta palavra sejam incluídos na comparação, isso não terá efeito sobre se uma linha é corretamente detectada como duplicada. No final, aplicar uniq -f 3 - w 15 à entrada parece produzir o resultado desejado.

Uma coisa adicional a ser observada é que quando consideramos apenas uma parte das linhas na detecção de duplicatas, todas as linhas no grupo de "duplicatas" não precisam ser totalmente idênticas e, portanto, precisamos decidir qual delas será impressa. a saída. uniq imprime o primeiro, mas pode ser feito para imprimir o último executando primeiro a entrada por tac .

    
por fkraiem 21.04.2016 / 01:19