sed: Substitua um número desconhecido de padrões na mesma linha

Question

sed: Substitua um número desconhecido de padrões na mesma linha

#1 resposta do (1 votos)
#2 resposta do (1 votos)

3

Estou tentando usar sed para procurar um determinado padrão 'primário' que pode existir em várias linhas, com cada padrão primário seguido por um --unknown-- número de padrões 'secundários'.

As linhas que contêm o padrão começam com: test(header_name) Na mesma linha é um número arbitrário de seqüências de caracteres que vêm depois dele. Eu quero mover essas strings para suas próprias linhas para que cada uma delas seja precedida por seus próprios test(header_name) .

por exemplo. Arquivo original (mytest.txt):

apples
test("Type1", "hat", "cat", "dog", "house");
bananas
oranges
test("Type2", "brown", "red", "green", "yellow", "purple", "orange");

Eu quero que isso se torne:

apples
test("Type1", "hat");
test("Type1", "cat");
test("Type1", "dog");
test("Type1", "house");
bananas
oranges
test("Type2", "brown");
test("Type2", "red");
test("Type2", "green");
test("Type2", "yellow");
test("Type2", "purple");
test("Type2", "orange");

Isso seria fácil se soubéssemos o número de strings por linha, mas, nesse caso, não é fixo.

O jeito desajeitado seria fazer isso:

while ( a line exists that starts with 'test' and contains more than two args)
do

   Keep the first and second args
   Move the rest of that line to a new line and add 'test(header)' to the front

done

Mas isso é demorado, especialmente se houver centenas de strings.

Alguma idéia?

regex sed linux shell script

por Dan 05.07.2011 / 17:00

2 respostas

Tags regex sed linux shell script

SSD Corsair Force no Windows XP HDD-Erase trimestral necessário? Atalhos do Windows XP Quick Launch com o WinKey

score 1 · Answer 1

Não é bonito, mas:

awk '
    /test\(/ {
        split($0, a)
        i=2
        while (a[i]) {
            sub(/(,|\);)$/, "", a[i])
            printf("%s %s);\n", a[1], a[i])
            i++
        }
        next
    }
    {print}
'

score 1 · Answer 2

Ok, encontrei uma solução usando um loop WHILE e SED. Sim, é confuso, mas é mais rápido que o algoritmo que eu havia postado anteriormente!

# Look for first line that has more than two args
line_num='sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt'

while [ "$line_num" != "" ]
do

    # Get the first argument
    first_arg='sed -ne ''$line_num' s/test("\([^"]*\)".*//pg' myfile.txt'

    # All strings will be moved to their own line that includes 'test(first_arg)'
    sed -i -e ''$line_num' s/", "/\ntest("'"$first_arg"'", "/g' myfile.txt

    # No longer need first line after all targets moved to other lines     
    sed -i -e ''$line_num'd' myfile.txt


    # Check for remaining lines with more than two args
    line_num='sed -n -e '/test("[^"]*", "[^"]*",/{=;q}' myfile.txt'

done


# Minor adjustments to the output (add close-quotation, close-bracket and semi-colon)
sed -i \
    -e 's/");//g' \
    -e 's/\(test("[^)]*\)/");/g' \
myfile.txt