Grep para uma string com lista de arquivos como fonte

2

Eu tenho um arquivo de texto simples, em cada linha é um caminho para um arquivo. Agora eu tenho que grep para uma string nesses arquivos.

Existe alguma maneira de usar este arquivo como "fonte de pesquisa" para grep ? Ou eu tenho que copiar e colar todos os caminhos para o bash?

2º: Existe uma maneira de fornecer grep com arquivos diferentes como fonte de pesquisa em uma linha? Como

egrep --color -i "test" /tmp/1.txt, /tmp/2.txt ...?

    
por opHASnoNAME 22.02.2012 / 10:03

4 respostas

3

Eu reescrevi esta resposta, pois ela levantou algumas questões, cujas respostas foram um pouco nebulosas. Espero que esta resposta limpe um pouco do nevoeiro ...

Nota: O uso de xargs é adequado para passar parâmetros posicionais (args) para um programa quando existem tantos argumentos que excedem a memória disponível para a linha de comando ...

As notas estão no script.

#!/bin/bash

  rm -f "/tmp/file   "*

# Create some dummy test files and write their names to /tmp/list
  for x in {A..D} ;do 
      echo "text-$x" >"/tmp/file   $x"
      echo "/tmp/file   $x"
  done >"/tmp/list"

# Set up Quirk 1... with an escaped char \A in the file-name.
        # Replace one of the file-names in the list with a quirky but valid one.
          echo 'quirk 1. \A in filename' >'/tmp/file   \A'
          sed -i 's/A/\A/'  "/tmp/list"

        # The next two lines show  that 
        #            'file   \A' is in the list      and DOES exist.
        #            'file   A'  is NOT in the list, but DOES exist.
        # Therefore, 'file   A'  should NOT produce a 'grep' match
          echo "Quirk 1. backslash in file-name"  
          echo "   ls:"; ls -1 '/tmp/file   '*A    |nl  
          echo " list:"; sed -n '/A/p' "/tmp/list" |nl
          echo "===================="

# Set up Quirk 2... with $D in the file name
        # Replace one of the file-names in the list with a quirky but valid one.
          echo 'quirk 2. $D in filename' >'/tmp/file   $D'
          sed -i 's/D/\$D/'  "/tmp/list"
          D='D' 
        # The next two lines show  that 
        #            'file   $D' is in the list      and DOES exist.
        #            'file   D'  is NOT in the list, but DOES exist.
          echo "Quirk 2. var \$D=$D in file-name"  
          echo "   ls:"; ls -1 '/tmp/file   '*D    |nl  
          echo " list:"; sed -n '/D/p' "/tmp/list" |nl
          echo "===================="

# The regex search pattern
  regex='(A|C|D)'

# Read lines of a file, and use them as positional parameters.
#  Note: 'protection' means protected from bash pre-processing. (eg path expansion) 
# ============================================================
  ###  
  echo 
  echo "========================================"
  echo "Passing parameters to 'grep' via 'xargs'"    
  echo "========================================"
  echo 
  ###
    echo "# Use 'xargs' with the assumption that every file name contains no meta characters."
    echo "# The result is that file names which contain meta characters, FAILS."   
    echo "# So it interprets '\A' as 'A' and whitespace as a delimiter!"
      <"/tmp/list" xargs  grep -E -H "$regex" 
      echo =====; echo "ERROR: All files in the sample list FAIL!" 
      echo =====; echo
  ###  
    echo "# Use xargs -I{} to avoid problems of whitespace in filenames"
    echo "# But the args are further interpreted by bash, as in escape '\' expansion."
    echo "# Bash still interprets xarg's '\A' as 'A' and so 'grep' processes the wrong file"
    echo "# However the -I{} does protect the $D from var expansion"
      <"/tmp/list" xargs -I{} grep -E -H "$regex" {}
      echo =====; echo "ERROR: The 1st line refers to 'file   A' which is NOT in the list!" 
      echo =====; echo
  ###  
    echo "# Use xargs -0 to avoid problems of whitespace in filenames"
    echo "# 'xargs -0' goes further with parameter protection than -I." 
    echo "# Quotes and backslash are not special (every character is taken literally)" 
      <"/tmp/list" tr '\n' '
Quirk 1. backslash in file-name
   ls:
     1  /tmp/file   A
     2  /tmp/file   \A
 list:
     1  /tmp/file   \A
====================
Quirk 2. var $D=D in file-name
   ls:
     1  /tmp/file   D
     2  /tmp/file   $D
 list:
     1  /tmp/file   $D
====================

========================================
Passing parameters to 'grep' via 'xargs'
========================================

# Use 'xargs' with the assumption that every file name contains no meta characters.
# The result is that file names which contain meta characters, FAILS.
# So it interprets '\A' as 'A' and whitespace as a delimiter!
grep: /tmp/file: No such file or directory
grep: A: No such file or directory
grep: /tmp/file: No such file or directory
grep: B: No such file or directory
grep: /tmp/file: No such file or directory
grep: C: No such file or directory
grep: /tmp/file: No such file or directory
grep: $D: No such file or directory
=====
ERROR: All files in the sample list FAIL!
=====

# Use xargs -I{} to avoid problems of whitespace in filenames
# But the args are further interpreted by bash, as in escape '\' expansion.
# Bash still interprets xarg's '\A' as 'A' and so 'grep' processes the wrong file
# However the -I{} does protect the D from var expansion
/tmp/file   A:text-A
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
=====
ERROR: The 1st line refers to 'file   A' which is NOT in the list!
=====

# Use xargs -0 to avoid problems of whitespace in filenames
# 'xargs -0' goes further with parameter protection than -I.
# Quotes and backslash are not special (every character is taken literally)
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

=====================================
Passing parameters directly to 'grep'
=====================================

# Use 'grep' with the assumption that every file name contains no meta characters.
# The result is that file names which contain meta characters, FAILS.
grep: /tmp/file: No such file or directory
grep: \A: No such file or directory
grep: /tmp/file: No such file or directory
grep: B: No such file or directory
grep: /tmp/file: No such file or directory
grep: C: No such file or directory
grep: /tmp/file: No such file or directory
grep: $D: No such file or directory
=====
ERROR: All files in the sample list FAIL!
=====

# Set bash positional parameters "$1" "$2" ... "$n"
# Note: destructive... original parameters are overwritten
#   and, you may need to reset $IFS to its original value
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

# Set bash positional parameters "$1" "$2" ... "$n"
# Note: non-destructive... original parameters are not overwritten
# Variable set in the sub-shell are NOT accessible on return.
# There is no need to reset $IFS
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

# Using bash array elements "${list[0]}" "${list[1]}" ... "${list[n-1]}"
# Note: you may need to reset $IFS to its original value
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==
' |xargs -0 grep -E -H "$regex" echo ==; echo "OK" echo ==; echo ### echo "=====================================" echo "Passing parameters directly to 'grep'" echo "=====================================" echo ### echo "# Use 'grep' with the assumption that every file name contains no meta characters." echo "# The result is that file names which contain meta characters, FAILS." grep -E -H "$regex" $(cat "/tmp/list") echo =====; echo "ERROR: All files in the sample list FAIL!" echo =====; echo ### echo '# Set bash positional parameters "$1" "$2" ... "$n"' echo "# Note: destructive... original parameters are overwritten" echo '# and, you may need to reset $IFS to its original value' IFS=$'\n' set $(cat "/tmp/list") grep -E "$regex" "$@" echo ==; echo "OK" echo ==; echo ### echo '# Set bash positional parameters "$1" "$2" ... "$n"' echo '# Note: non-destructive... original parameters are not overwritten' echo '# Variable set in the sub-shell are NOT accessible on return.' echo '# There is no need to reset $IFS' ( IFS=$'\n' set $(cat "/tmp/list") grep -E "$regex" "$@" ) echo ==; echo "OK" echo ==; echo ### echo '# Using bash array elements "${list[0]}" "${list[1]}" ... "${list[n-1]}"' echo '# Note: you may need to reset $IFS to its original value' IFS=$'\n' list=($(cat "/tmp/list")) grep -E "$regex" "${list[@]}" echo ==; echo "OK" echo ==; echo ###

Aqui está a saída

#!/bin/bash

  rm -f "/tmp/file   "*

# Create some dummy test files and write their names to /tmp/list
  for x in {A..D} ;do 
      echo "text-$x" >"/tmp/file   $x"
      echo "/tmp/file   $x"
  done >"/tmp/list"

# Set up Quirk 1... with an escaped char \A in the file-name.
        # Replace one of the file-names in the list with a quirky but valid one.
          echo 'quirk 1. \A in filename' >'/tmp/file   \A'
          sed -i 's/A/\A/'  "/tmp/list"

        # The next two lines show  that 
        #            'file   \A' is in the list      and DOES exist.
        #            'file   A'  is NOT in the list, but DOES exist.
        # Therefore, 'file   A'  should NOT produce a 'grep' match
          echo "Quirk 1. backslash in file-name"  
          echo "   ls:"; ls -1 '/tmp/file   '*A    |nl  
          echo " list:"; sed -n '/A/p' "/tmp/list" |nl
          echo "===================="

# Set up Quirk 2... with $D in the file name
        # Replace one of the file-names in the list with a quirky but valid one.
          echo 'quirk 2. $D in filename' >'/tmp/file   $D'
          sed -i 's/D/\$D/'  "/tmp/list"
          D='D' 
        # The next two lines show  that 
        #            'file   $D' is in the list      and DOES exist.
        #            'file   D'  is NOT in the list, but DOES exist.
          echo "Quirk 2. var \$D=$D in file-name"  
          echo "   ls:"; ls -1 '/tmp/file   '*D    |nl  
          echo " list:"; sed -n '/D/p' "/tmp/list" |nl
          echo "===================="

# The regex search pattern
  regex='(A|C|D)'

# Read lines of a file, and use them as positional parameters.
#  Note: 'protection' means protected from bash pre-processing. (eg path expansion) 
# ============================================================
  ###  
  echo 
  echo "========================================"
  echo "Passing parameters to 'grep' via 'xargs'"    
  echo "========================================"
  echo 
  ###
    echo "# Use 'xargs' with the assumption that every file name contains no meta characters."
    echo "# The result is that file names which contain meta characters, FAILS."   
    echo "# So it interprets '\A' as 'A' and whitespace as a delimiter!"
      <"/tmp/list" xargs  grep -E -H "$regex" 
      echo =====; echo "ERROR: All files in the sample list FAIL!" 
      echo =====; echo
  ###  
    echo "# Use xargs -I{} to avoid problems of whitespace in filenames"
    echo "# But the args are further interpreted by bash, as in escape '\' expansion."
    echo "# Bash still interprets xarg's '\A' as 'A' and so 'grep' processes the wrong file"
    echo "# However the -I{} does protect the $D from var expansion"
      <"/tmp/list" xargs -I{} grep -E -H "$regex" {}
      echo =====; echo "ERROR: The 1st line refers to 'file   A' which is NOT in the list!" 
      echo =====; echo
  ###  
    echo "# Use xargs -0 to avoid problems of whitespace in filenames"
    echo "# 'xargs -0' goes further with parameter protection than -I." 
    echo "# Quotes and backslash are not special (every character is taken literally)" 
      <"/tmp/list" tr '\n' '
Quirk 1. backslash in file-name
   ls:
     1  /tmp/file   A
     2  /tmp/file   \A
 list:
     1  /tmp/file   \A
====================
Quirk 2. var $D=D in file-name
   ls:
     1  /tmp/file   D
     2  /tmp/file   $D
 list:
     1  /tmp/file   $D
====================

========================================
Passing parameters to 'grep' via 'xargs'
========================================

# Use 'xargs' with the assumption that every file name contains no meta characters.
# The result is that file names which contain meta characters, FAILS.
# So it interprets '\A' as 'A' and whitespace as a delimiter!
grep: /tmp/file: No such file or directory
grep: A: No such file or directory
grep: /tmp/file: No such file or directory
grep: B: No such file or directory
grep: /tmp/file: No such file or directory
grep: C: No such file or directory
grep: /tmp/file: No such file or directory
grep: $D: No such file or directory
=====
ERROR: All files in the sample list FAIL!
=====

# Use xargs -I{} to avoid problems of whitespace in filenames
# But the args are further interpreted by bash, as in escape '\' expansion.
# Bash still interprets xarg's '\A' as 'A' and so 'grep' processes the wrong file
# However the -I{} does protect the D from var expansion
/tmp/file   A:text-A
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
=====
ERROR: The 1st line refers to 'file   A' which is NOT in the list!
=====

# Use xargs -0 to avoid problems of whitespace in filenames
# 'xargs -0' goes further with parameter protection than -I.
# Quotes and backslash are not special (every character is taken literally)
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

=====================================
Passing parameters directly to 'grep'
=====================================

# Use 'grep' with the assumption that every file name contains no meta characters.
# The result is that file names which contain meta characters, FAILS.
grep: /tmp/file: No such file or directory
grep: \A: No such file or directory
grep: /tmp/file: No such file or directory
grep: B: No such file or directory
grep: /tmp/file: No such file or directory
grep: C: No such file or directory
grep: /tmp/file: No such file or directory
grep: $D: No such file or directory
=====
ERROR: All files in the sample list FAIL!
=====

# Set bash positional parameters "$1" "$2" ... "$n"
# Note: destructive... original parameters are overwritten
#   and, you may need to reset $IFS to its original value
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

# Set bash positional parameters "$1" "$2" ... "$n"
# Note: non-destructive... original parameters are not overwritten
# Variable set in the sub-shell are NOT accessible on return.
# There is no need to reset $IFS
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==

# Using bash array elements "${list[0]}" "${list[1]}" ... "${list[n-1]}"
# Note: you may need to reset $IFS to its original value
/tmp/file   \A:quirk 1. \A in filename
/tmp/file   C:text-C
/tmp/file   $D:quirk 2. $D in filename
==
OK
==
' |xargs -0 grep -E -H "$regex" echo ==; echo "OK" echo ==; echo ### echo "=====================================" echo "Passing parameters directly to 'grep'" echo "=====================================" echo ### echo "# Use 'grep' with the assumption that every file name contains no meta characters." echo "# The result is that file names which contain meta characters, FAILS." grep -E -H "$regex" $(cat "/tmp/list") echo =====; echo "ERROR: All files in the sample list FAIL!" echo =====; echo ### echo '# Set bash positional parameters "$1" "$2" ... "$n"' echo "# Note: destructive... original parameters are overwritten" echo '# and, you may need to reset $IFS to its original value' IFS=$'\n' set $(cat "/tmp/list") grep -E "$regex" "$@" echo ==; echo "OK" echo ==; echo ### echo '# Set bash positional parameters "$1" "$2" ... "$n"' echo '# Note: non-destructive... original parameters are not overwritten' echo '# Variable set in the sub-shell are NOT accessible on return.' echo '# There is no need to reset $IFS' ( IFS=$'\n' set $(cat "/tmp/list") grep -E "$regex" "$@" ) echo ==; echo "OK" echo ==; echo ### echo '# Using bash array elements "${list[0]}" "${list[1]}" ... "${list[n-1]}"' echo '# Note: you may need to reset $IFS to its original value' IFS=$'\n' list=($(cat "/tmp/list")) grep -E "$regex" "${list[@]}" echo ==; echo "OK" echo ==; echo ###
    
por 22.02.2012 / 10:34
4

e

fgrep <pattern> 'cat file_list.txt'

preste atenção para colocar as aspas corretas 'e não' - se eu entendi o que você quer fazer

    
por 22.02.2012 / 11:04
3

Se você quiser pesquisar em vários arquivos de uma só vez, você pode digitá-los todos no final da linha de comando, separados por espaços.

grep -i test /path/to/file /some/other/file

Você pode usar padrões de curinga.

grep -i test README ChangeLog *.txt

Se você tem uma lista de arquivos, com um nome de arquivo por linha, então você tem várias possibilidades. Se não houver nenhum caractere exótico em seus nomes de arquivos, qualquer um deles funcionará:

grep -i test -- $(cat list_of_file_names.txt)
<list_of_file_names.txt xargs grep -i test -H --

O primeiro comando substitui a saída do comando cat list_of_file_names.txt na linha de comando. Ele falhará se qualquer um dos nomes de arquivo contiver curingas de espaço em branco ou shell ( \[?* ). Ele também falhará se a lista for tão grande que ultrapasse o limite de comprimento da linha de comando (mais de 128kB, em muitos sistemas). O segundo comando falhará se algum dos nomes de arquivo contiver espaços em branco de \"' . Ele cuida da execução de egrep várias vezes se o limite de comprimento da linha de comando exigir isso. A opção -H garante que grep sempre imprima o nome do arquivo correspondente, mesmo que tenha sido chamado com um único arquivo. O -- garante que, se o primeiro nome de arquivo começar com - , ele será tratado como um nome de arquivo e não como uma opção.

Uma maneira segura de lidar com nomes de arquivos que podem conter qualquer caractere que não seja newlines é desativar a divisão em espaços em branco que não sejam novas linhas e desativar globbing (expansão de caractere curinga).

set -f; IFS='
'
grep -i test -- $(cat list_of_file_names.txt)
set +f; unset IFS
    
por 23.02.2012 / 03:16
1
  1. cat filenames.txt | xargs grep <pattern>

  2. grep <pattern> filename1 filename2 filename*

por 22.02.2012 / 12:01