Atualização:
Os melhores seriam:
sed 's/^.\+\(\/\|\&\|\?\)v=\([^\&]*\).*//' awk 'match($0,/((\/|&|\?)v=)([^&]*)/,x){print x[3]}' grep -Po '(?<=(\/|&|\?)v=)[^&]*' # Saying match / or & then v=
RFC 3986 afirma:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] query = *( pchar / "/" / "?" ) fragment = *( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" …
Então, para ser um uso seguro:
| sed 's/#.*//' | - to remove #fragment part
na frente.
Ou seja,
| sed 's/#.*//' | grep -Po '(?<=(\/|&)v=)[^&]*'
SED (2):
echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \ | sed 's/^.\+\Wv=\([^\&]*\).*//'
Explicação:
's /…/…/ /THIS/WITH THIS/ 'substitute/MATCH 0 or MORE THINGS and GROUP them in ()/WITH THIS/ +-------------------------- s _s_ubsititute |+------------------------- / START MATCH || +---- / END MATCH || | +-- REPLACE WITH - ==Group 1. Or FIRS low (). || | | +- / End of SUBSTITUTE s/^.\+\Wv=\([^\&]*\).*//' +++-+-+-+-+-----+-+------- ^ Match from beginning of line ++-+-+-+-+-----+-+------- . Match any character +-+-+-+-+-----+-+------- \+ multiple times (grep (greedy +, * *? etc)) +-+-+-+-----+-+------- \W Non-word-character +-+-+-----+-+------- v= Literally match "v=" +-+-----+-+------- \( Start MATCH GROUP +-----+-+------- [^\&]* Match any character BUT & - as many as possible +-+------- \) End MATCH GROUP +------- .* Match anything; *As many times as possible - aka to end of line; as there is no [abc] would match a OR b OR c [abc]* would match a AND/OR b AND/OR c - as many times as possible [^abc] would match anything BUT a,b or c // Replace ENTIRE match with MATCH GROUP number 1. That would be - everything between \( and \) - which his anything but "&" after the literal string "v=" - which in turn has a non word letter in front of it. That also means that no match means no substitution which ultimately result in no change.
Resultado: qdRaf3-OEh4
Nota: Se nenhuma sequência inteira for retornada.
(G) AWK:
echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' \ | awk 'match($0,/(\Wv=)([^&]*)/,v){print v[2]}'
Resultado: qdRaf3-OEh4
Explicação:
Em Awk match(string, regexp)
é uma função que procura a correspondência mais longa e mais à esquerda de regexp em string. Aqui eu usei uma extensão que vem com o Gawk. (veja Awk , GAwk ; MAwk etc.) que coloca as correspondências individuais - isto é: o que está entre parênteses - em uma matriz de correspondências.
O padrão é bem parecido com o do Perl / Grep abaixo.
+-------------------------------------- Built in function | +--------------------------------- Entire input ($1 would have been filed 1) | | etc. (Using default delimiters " "*) | | | | | | (....)(....) ------------------ Places \Wv= in one group 1, and [^&]* group 2. match($0, /(\Wv=)([^&]*)/, v){print v[2]} | | | | | | +-+---- Use "v" from /, v; v is a user defined name | | +---- 2 specifies index in v, which is group from | | what is between ()'s in /…/ | | | +----------- Print is another built in function. +--------------- Group name that one can use in print.
GREP (usando Perl-compatível):
echo 'http://www.youtube.com/watch?v=qdRaf3-OEh4&playnext=1&list=PL4367CEDBC117AEC6&feature=results_main' | \ grep -Po '(?<=\Wv=)[^&]*'
Resultado: qdRaf3-OEh4
Explicação:
-P Use Perl compatible -o Only print match of the expression. - That means: Of our pattern only print/return what it matches. If nothing matches; return nothing. +------- ^ Negate math to - do not match (ONLY as it is FIRST between []) |+------ & A literal "&" character || (?<=\Wv=)[^&]* | | | | || | | | | |+---- * Greedy; as many times as possible. | | | +--+----- [] Wild order/any order of what is inside [] | | +----------- v= Literal v= | +------------- \W Non Word character +----------------- (?<= What follows should be (mediately) preceded by. ?=Huh, <=left, = =Equals to So: Match literal "v=" where "v" is preceded by an non-word-character. Then match anything; as many times as possible until we are at end of line or we meet an "&". As you can't have "&" in an URL between key/value pairs this should be OK.