AWK
Usando o GNU awk ou mawk:
$ awk '$1~"^"word{printf("--\n%s",$0)}' word='are' RS='--\n' infile
--
are you happy
--
are(you hungry
too
Isto ajusta a palavra variável para a palavra correspondente no início do registro e RS (separador de registro) para '-' seguido por uma nova linha \n
. Em seguida, para qualquer registro que comece com a palavra correspondente ( $1~"^"word
), imprima um registro formatado. O formato é um começo '-' com uma nova linha com o registro exato encontrado.
GREP
Usando (GNU para a opção -z
) grep:
grep -Pz -- '--\nare(?:[^\n]*\n)+?(?=--|\Z)' infile
grep -Pz -- '(?s)--\nare.*?(?=\n--|\Z)\n' infile
grep -Pz -- '(?s)--\nare(?:(?!\n--).)*\n' infile
Descrição (ões)
Para as descrições a seguir, a opção (?x)
da PCRE é usada para adicionar (muitos) comentários explicativos (e espaços) em linha com a regex real (em funcionamento). Se os comentários (e a maioria dos espaços) (até a próxima nova linha) forem removidos, a string resultante ainda será a mesma regex. Isso permite a descrição da regex em detalhes no código de trabalho. Isso facilita muito a manutenção do código.
Opção 1 regex (?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)
(?x) # match the remainder of the pattern with the following
# effective flags: x
# x modifier: extended. Spaces and text after a #
# in the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-Capturing Group (?:[^\n]*\n)+?
[^\n] # matches non-newline characters
* # Quantifier — Matches between zero and unlimited times, as
# many times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
) # Close the Non-Capturing Group
+? # Quantifier — Matches between one and unlimited times, as
# few times as possible, expanding as needed (lazy)
# A repeated capturing group will only capture the last iteration.
# Put a capturing group around the repeated group to capture all
# iterations or use a non-capturing group instead if you're not
# interested in the data
(?= # Positive Lookahead (?=--|\Z)
# Assert that the Regex below matches
# 1st Alternative --
-- # matches the characters -- literally (case sensitive)
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or before
# the line terminator right at the end of the
# string (if any)
) # Closing the lookahead.
Opção 2 regex (?sx)--\nare.*?(?=\n--|\Z)\n
(?sx) # match the remainder of the pattern with the following eff. flags: sx
# s modifier: single line. Dot matches newline characters
# x modifier: extended. Spaces and text after a # in
# the pattern are ignored
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
.*? # matches any character
# Quantifier — Matches between zero and unlimited times,
# as few times as possible, expanding as needed (lazy).
(?= # Positive Lookahead (?=\n--|\Z)
# Assert that the Regex below matches
# 1st Alternative \n--
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally.
| # 2nd Alternative \Z
\Z # \Z asserts position at the end of the string, or
# before the line terminator right at
# the end of the string (if any)
) # Close the lookahead parenthesis.
\n # matches a line-feed (newline) character (ASCII 10)
Opção 3 regex (?xs)--\nare(?:(?!\n--).)*\n
(?xs) # match the remainder of the pattern with the following eff. flags: xs
# modifier x : extended. Spaces and text after a # in are ignored
# modifier s : single line. Dot matches newline characters
-- # matches the characters -- literally (case sensitive)
\n # matches a line-feed (newline) character (ASCII 10)
are # matches the characters are literally (case sensitive)
(?: # Non-capturing group (?:(?!\n--).)
(?! # Negative Lookahead (?!\n--)
# Assert that the Regex below does not match
\n # matches a line-feed (newline) character (ASCII 10)
-- # matches the characters -- literally
) # Close Negative lookahead
. # matches any character
) # Close the Non-Capturing group.
* # Quantifier — Matches between zero and unlimited times, as many
# times as possible, giving back as needed (greedy)
\n # matches a line-feed (newline) character (ASCII 10)
sed
$ sed -nEe 'bend
:start ;N;/^--\nare/!b
:loop ;/^--$/!{p;n;bloop}
:end ;/^--$/bstart' infile