grep / parse text

4

Eu preciso analisar os nomes dos medicamentos dos resumos do Medline. Eu estava esperando fazer isso obtendo saídas de grep -wf e grep -owf usando colar, mas as saídas não correspondem, porque grep -owf cria uma saída para cada correspondência, mesmo se estiver na mesma linha.

Arquivo padrão:

DrugA
DrugB
DrugC
DrugD

Arquivo a analisar:

In our study, DrugA and DrugB were found to be effective.  DrugA was more effective than DrugB.
In our study, DrugC was found to be effective
In our study, DrugX was found to be effective

Saída desejada:

DrugA    In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugB    In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugC    In our study, DrugC was found to be effective
    
por Nasir 29.12.2016 / 22:06

3 respostas

2

Não é estritamente grep sozinho, mas isso é o truque:

while IFS= read -r pattern; do
    grep "$pattern" input | awk -v drug="$pattern" 'BEGIN {OFS="\t"} { print drug,$0}'
done < "patterns"
    
por 29.12.2016 / 22:37
3

Uma awk maneira, talvez?

awk '
    NR == FNR {
        a[$0] = 1
        n = length($0)
        w = n > w ? n : w
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                printf "%-* s %s\n", w, i, $0
    } 
' pattern_file.txt data_file.txt
    
por 29.12.2016 / 22:41
2

Uma solução sed :

sed 's|.*|/&/{h;s/^/&\t/p;g}|' pattern_file | sed -nf - input
    
por 30.12.2016 / 08:55