Usando a junção com dois arquivos com falha em tamanhos de arquivo maiores

2

Estou tendo alguns problemas com um script que está usando o join para unir dois arquivos. Os arquivos de entrada do Eaxmple contêm linhas como esta:

Aqui está o arquivo de entrada e a saída do comando join:

D:\work\BuildScriptsC>cat D:\temp\aaa.txt
hzapplications\adn\adn4\adn4density\adn4_idd_module.cpp,83
hzapplications\adn\adn4\adn4density\adn4dencalmodule.cpp,73
hzapplications\adn\adn4\adn4density\adn4denimagemodulerm.cpp,111
hzapplications\adn\adn4\adn4density\adn4denimagemodulert.cpp,202
hzapplications\adn\adn4\adn4density\adn4densityanqmodules.cpp,445
hzapplications\adn\adn4\adn4density\adn4densityappl.cpp,378
hzapplications\adn\adn4\adn4density\adn4densityappl.h,50
hzapplications\adn\adn4\adn4density\adn4densityevrmodules.cpp,272
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,490
hzapplications\adn\adn4\adn4density\adn4densitykernel.h,65
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.cpp,209
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.h,70
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.cpp,218
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.h,70
hzapplications\adn\adn4\adn4density\adn4dphimodules.cpp,610
hzapplications\adn\adn4\adn4density\adn4dphimodulesrt.cpp,115
hzapplications\adn\adn4\adn4density\adn4rhomodulesrt.cpp,102

D:\work\BuildScriptsC>cat D:\temp\bbb.txt
hzapplications\activect\ptc\ictsx01\ictsx01_bootuptask.cpp,1
hzapplications\activeps\iola\acquisition\iola_acqmodule.cpp,4
hzapplications\activeps\iola\simulation\iola_simmodule.cpp,3
hzapplications\activeps\iolr\simulation\iolr_simmodule.cpp,1
hzapplications\activeps\iolr\task\iolr_poweron200vhitask.cpp,1
hzapplications\activeps\iolr\task\iolr_poweron200vlowtask.cpp,1
hzapplications\activeps\iolr\task\iolr_poweronnrlvtask.cpp,1
hzapplications\activeps\iolr\task\iolrtaskcommon.cpp,2
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4adseelem.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4collar.cpp,1
hzapplications\adn\adn4\adn4equipment\adn4tool.cpp,2
hzapplications\adn\adn6c\adn6cequipment\adn6ccollar.cpp,1
hzapplications\adn\adn8\adn8equipment\adn8tool.cpp,1
hzapplications\adn\adn8\adn8neutron\adn8neutronkernel.cpp,1
hzapplications\adn\adn8d\adn8ddensity\adn8ddensitykernel.cpp,1
hzapplications\adn\adn8d\adn8dequipment\adn8dtool.cpp,1

D:\work\BuildScriptsC>join --ignore-case -1 1 -2 1 -t"," -o "1.1,1.2,2.2" -e "0" -a 1 D:\temp\aaa.txt D:\temp\bbb.txt
hzapplications\adn\adn4\adn4density\adn4_idd_module.cpp,83,0
hzapplications\adn\adn4\adn4density\adn4dencalmodule.cpp,73,0
hzapplications\adn\adn4\adn4density\adn4denimagemodulerm.cpp,111,0
hzapplications\adn\adn4\adn4density\adn4denimagemodulert.cpp,202,0
hzapplications\adn\adn4\adn4density\adn4densityanqmodules.cpp,445,0
hzapplications\adn\adn4\adn4density\adn4densityappl.cpp,378,0
hzapplications\adn\adn4\adn4density\adn4densityappl.h,50,0
hzapplications\adn\adn4\adn4density\adn4densityevrmodules.cpp,272,0
hzapplications\adn\adn4\adn4density\adn4densitykernel.cpp,490,0
hzapplications\adn\adn4\adn4density\adn4densitykernel.h,65,0
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.cpp,209,0
hzapplications\adn\adn4\adn4density\adn4densitysecimgmodule.h,70,0
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.cpp,218,0
hzapplications\adn\adn4\adn4density\adn4densitysecmodule.h,70,0
hzapplications\adn\adn4\adn4density\adn4dphimodules.cpp,610,0
hzapplications\adn\adn4\adn4density\adn4dphimodulesrt.cpp,115,0
hzapplications\adn\adn4\adn4density\adn4rhomodulesrt.cpp,102,0

D:\work\BuildScriptsC>

A saída esperada é que essa linha específica é unida da seguinte forma:     hzapplications \ adn \ adn4 \ adn4density \ adn4densitykernel.cpp, 490,1

Qualquer sugestão é bem-vinda. Estou usando o pacote unxutils no windows, esta é a versão exata:

D:\work\BuildScriptsC>join --version
join (GNU textutils) 2.0
Written by Mike Haertel.

Copyright (C) 1999 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
por mikelong 30.08.2012 / 08:10

1 resposta

3

Acontece que --ignore-case é o problema. Ele tem um efeito mesmo quando não há letras maiúsculas, pois trata todas as letras minúsculas como maiúsculas, fazendo com que elas pulem para o outro lado dos caracteres que estão entre maiúsculas e minúsculas em ordem ASCII: [\]^_

Na ordem normal de classificação, iolrt vem depois de iolr_ , mas em --ignore-case , eles são revertidos.

O comando sort precisa da opção -f para produzir a ordem correta. (Além de -t, e -k1,1 )

    
por 31.08.2012 / 05:47