find /mnt/test -name "*.txt" -print0 -printf "%ffind /mnt/test -name "*.txt" -exec bash -c './thulac < "$1" \
> "/mnt/tokenized/${1##*/}"' {} {} \;
" |
xargs -0 -n 2 bash -c 'shift $1; ./thulac < $1 > /mnt/tokenized/$2' 2 1
Você também deseja passar o nome do caminho completo com o delimitador nulo, para que, quando chegar a hora de xargs
para desmontar a lista delimitada por nulo, pode fazê-lo de forma correta.
Caso contrário, o que acontecerá é que o nome do caminho completo de um arquivo será mesclado no nome de base do próximo arquivo, o fenômeno que você observou no caso de vários nomes de arquivos!
E então você precisa alimentar 2 argumentos por vez para o bash alligator
, caso contrário, ele consumirá tantos quantos forem permitidos, mas passará apenas os dois primeiros para o executável ./thulac
.
Uma alternativa melhor é dispensar xargs
& faça todo o seu trabalho em find
, pois como está, o xargs está lidando com 2 argumentos por vez, o que tira quaisquer vantagens de xargs
.
Nesta versão, fornecemos o nome completo do caminho para bash
e temos o nome do arquivo computado por bash
em vez de confiar em find
para fazê-lo.
1. Good case when only 1 file present
-print0 -printf '%f'
/mnt/test/test.txtWe saw that the mixup occurred due to the absence of the delimiter 'find /mnt/test -name "*.txt" -print0 -printf "%ffind /mnt/test -name "*.txt" -exec bash -c './thulac < "$1" \
> "/mnt/tokenized/${1##*/}"' {} {} \;
" |
xargs -0 -n 2 bash -c 'shift $1; ./thulac < $1 > /mnt/tokenized/$2' 2 1
' in the -printf "%f"
So the correct way is:
find ... -print0 -printf "%f1. Good case when only 1 file present
-print0 -printf '%f'
/mnt/test/test.txtWe saw that the mixup occurred due to the absence of the delimiter '%pre%' in the -printf "%f"
So the correct way is:
find ... -print0 -printf "%f%pre%" | xargs ...
Ensuring that the list is partitioned at the right places and the
sequence of fullpath1+file1%pre%fullpath2+file2%pre%... is maintained.
Now coming to the 'xargs' part, we write:
xargs -0 -n 2 bash -c '...' 2 1
Points to observe are the following:
a) '-0' => arguments to xargs will be taken to be NULL separated.
b) -n 2 => we feed 2 args at a time to bash from the total pool
delivered to xargs by find.
c) 2 1 is just a best practice to get over different shell's behavior
regarding what construes as $0, $1, $2, ...; In your particular case since you
already know that $0 -> first arg, $1 -> 2nd arg, we could just as well have
written what you did:
find ... | xargs -0 -n 2 bash -c './thulac < $0 > /mnt/tokenized/$1'
test.txt
|-----------------|--------|
arg0 = /mnt/test/test.txt
arg1 = test.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt
2. Error case when > 1 file present
-print0 -printf '%f'
/mnt/test/test.txt%pre%test.txt/mnt/test/test33.txt%pre%test33.txt
|-----------------|-----------------------------|----------|
arg0 = /mnt/test/test.txt
arg1 = test.txt/mnt/test/test33.txt
arg2 = test33.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt/mnt/test/test33.txt
" | xargs ...
Ensuring that the list is partitioned at the right places and the
sequence of fullpath1+file1%pre%fullpath2+file2%pre%... is maintained.
Now coming to the 'xargs' part, we write:
xargs -0 -n 2 bash -c '...' 2 1
Points to observe are the following:
a) '-0' => arguments to xargs will be taken to be NULL separated.
b) -n 2 => we feed 2 args at a time to bash from the total pool
delivered to xargs by find.
c) 2 1 is just a best practice to get over different shell's behavior
regarding what construes as $0, $1, $2, ...; In your particular case since you
already know that $0 -> first arg, $1 -> 2nd arg, we could just as well have
written what you did:
find ... | xargs -0 -n 2 bash -c './thulac < $0 > /mnt/tokenized/$1'
test.txt
|-----------------|--------|
arg0 = /mnt/test/test.txt
arg1 = test.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt
2. Error case when > 1 file present
-print0 -printf '%f'
/mnt/test/test.txt%pre%test.txt/mnt/test/test33.txt%pre%test33.txt
|-----------------|-----------------------------|----------|
arg0 = /mnt/test/test.txt
arg1 = test.txt/mnt/test/test33.txt
arg2 = test33.txt
bash -c 'thulac < $0 > /mnt/tokenized/$1'
thulac < /mnt/test/test.txt > /mnt/tokenized/test.txt/mnt/test/test33.txt
Gênese do problema
%pre%
Corrigir
%pre%