Mesclando colunas de mais de 200 arquivos grandes em uma tabela

0

Eu tenho mais de 200 arquivos grandes que possuem exatamente 1 coluna e 76 milhões de linhas. Eu quero iniciar um newfile.txt e tapa as colunas ao lado do outro (coincidir com a linha 1 do arquivo 1 com a linha 1 do arquivo 2 ... e continue adicionando até a linha 1 de 200). Em seguida, repita isso para todas as linhas. Eu estou lutando com isso. Alguma sugestão?

Eu tentei as respostas de Gilles e Glens aqui e aqui mas não consigo descobrir como fazer o loop e adicionar repetidamente colunas delimitadas por tabulações para a saída newfile.txt. Eu só posso usar métodos que não armazenam os arquivos na memória (o arquivo final deve ser de 120GB +).

Obrigado

    
por Ehi Akhirome 06.02.2018 / 21:06

1 resposta

0

Com alguma paciência e assumindo exatamente 200 arquivos chamados "arquivo {1..200}" e 76.000.000 linhas em cada:

for ((line=1; line <= 76000000; line++))
do 
  printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" "$(sed -n ${line}p "file1")" "$(sed -n ${line}p "file2")" "$(sed -n ${line}p "file3")" "$(sed -n ${line}p "file4")" "$(sed -n ${line}p "file5")" "$(sed -n ${line}p "file6")" "$(sed -n ${line}p "file7")" "$(sed -n ${line}p "file8")" "$(sed -n ${line}p "file9")" "$(sed -n ${line}p "file10")" "$(sed -n ${line}p "file11")" "$(sed -n ${line}p "file12")" "$(sed -n ${line}p "file13")" "$(sed -n ${line}p "file14")" "$(sed -n ${line}p "file15")" "$(sed -n ${line}p "file16")" "$(sed -n ${line}p "file17")" "$(sed -n ${line}p "file18")" "$(sed -n ${line}p "file19")" "$(sed -n ${line}p "file20")" "$(sed -n ${line}p "file21")" "$(sed -n ${line}p "file22")" "$(sed -n ${line}p "file23")" "$(sed -n ${line}p "file24")" "$(sed -n ${line}p "file25")" "$(sed -n ${line}p "file26")" "$(sed -n ${line}p "file27")" "$(sed -n ${line}p "file28")" "$(sed -n ${line}p "file29")" "$(sed -n ${line}p "file30")" "$(sed -n ${line}p "file31")" "$(sed -n ${line}p "file32")" "$(sed -n ${line}p "file33")" "$(sed -n ${line}p "file34")" "$(sed -n ${line}p "file35")" "$(sed -n ${line}p "file36")" "$(sed -n ${line}p "file37")" "$(sed -n ${line}p "file38")" "$(sed -n ${line}p "file39")" "$(sed -n ${line}p "file40")" "$(sed -n ${line}p "file41")" "$(sed -n ${line}p "file42")" "$(sed -n ${line}p "file43")" "$(sed -n ${line}p "file44")" "$(sed -n ${line}p "file45")" "$(sed -n ${line}p "file46")" "$(sed -n ${line}p "file47")" "$(sed -n ${line}p "file48")" "$(sed -n ${line}p "file49")" "$(sed -n ${line}p "file50")" "$(sed -n ${line}p "file51")" "$(sed -n ${line}p "file52")" "$(sed -n ${line}p "file53")" "$(sed -n ${line}p "file54")" "$(sed -n ${line}p "file55")" "$(sed -n ${line}p "file56")" "$(sed -n ${line}p "file57")" "$(sed -n ${line}p "file58")" "$(sed -n ${line}p "file59")" "$(sed -n ${line}p "file60")" "$(sed -n ${line}p "file61")" "$(sed -n ${line}p "file62")" "$(sed -n ${line}p "file63")" "$(sed -n ${line}p "file64")" "$(sed -n ${line}p "file65")" "$(sed -n ${line}p "file66")" "$(sed -n ${line}p "file67")" "$(sed -n ${line}p "file68")" "$(sed -n ${line}p "file69")" "$(sed -n ${line}p "file70")" "$(sed -n ${line}p "file71")" "$(sed -n ${line}p "file72")" "$(sed -n ${line}p "file73")" "$(sed -n ${line}p "file74")" "$(sed -n ${line}p "file75")" "$(sed -n ${line}p "file76")" "$(sed -n ${line}p "file77")" "$(sed -n ${line}p "file78")" "$(sed -n ${line}p "file79")" "$(sed -n ${line}p "file80")" "$(sed -n ${line}p "file81")" "$(sed -n ${line}p "file82")" "$(sed -n ${line}p "file83")" "$(sed -n ${line}p "file84")" "$(sed -n ${line}p "file85")" "$(sed -n ${line}p "file86")" "$(sed -n ${line}p "file87")" "$(sed -n ${line}p "file88")" "$(sed -n ${line}p "file89")" "$(sed -n ${line}p "file90")" "$(sed -n ${line}p "file91")" "$(sed -n ${line}p "file92")" "$(sed -n ${line}p "file93")" "$(sed -n ${line}p "file94")" "$(sed -n ${line}p "file95")" "$(sed -n ${line}p "file96")" "$(sed -n ${line}p "file97")" "$(sed -n ${line}p "file98")" "$(sed -n ${line}p "file99")" "$(sed -n ${line}p "file100")" "$(sed -n ${line}p "file101")" "$(sed -n ${line}p "file102")" "$(sed -n ${line}p "file103")" "$(sed -n ${line}p "file104")" "$(sed -n ${line}p "file105")" "$(sed -n ${line}p "file106")" "$(sed -n ${line}p "file107")" "$(sed -n ${line}p "file108")" "$(sed -n ${line}p "file109")" "$(sed -n ${line}p "file110")" "$(sed -n ${line}p "file111")" "$(sed -n ${line}p "file112")" "$(sed -n ${line}p "file113")" "$(sed -n ${line}p "file114")" "$(sed -n ${line}p "file115")" "$(sed -n ${line}p "file116")" "$(sed -n ${line}p "file117")" "$(sed -n ${line}p "file118")" "$(sed -n ${line}p "file119")" "$(sed -n ${line}p "file120")" "$(sed -n ${line}p "file121")" "$(sed -n ${line}p "file122")" "$(sed -n ${line}p "file123")" "$(sed -n ${line}p "file124")" "$(sed -n ${line}p "file125")" "$(sed -n ${line}p "file126")" "$(sed -n ${line}p "file127")" "$(sed -n ${line}p "file128")" "$(sed -n ${line}p "file129")" "$(sed -n ${line}p "file130")" "$(sed -n ${line}p "file131")" "$(sed -n ${line}p "file132")" "$(sed -n ${line}p "file133")" "$(sed -n ${line}p "file134")" "$(sed -n ${line}p "file135")" "$(sed -n ${line}p "file136")" "$(sed -n ${line}p "file137")" "$(sed -n ${line}p "file138")" "$(sed -n ${line}p "file139")" "$(sed -n ${line}p "file140")" "$(sed -n ${line}p "file141")" "$(sed -n ${line}p "file142")" "$(sed -n ${line}p "file143")" "$(sed -n ${line}p "file144")" "$(sed -n ${line}p "file145")" "$(sed -n ${line}p "file146")" "$(sed -n ${line}p "file147")" "$(sed -n ${line}p "file148")" "$(sed -n ${line}p "file149")" "$(sed -n ${line}p "file150")" "$(sed -n ${line}p "file151")" "$(sed -n ${line}p "file152")" "$(sed -n ${line}p "file153")" "$(sed -n ${line}p "file154")" "$(sed -n ${line}p "file155")" "$(sed -n ${line}p "file156")" "$(sed -n ${line}p "file157")" "$(sed -n ${line}p "file158")" "$(sed -n ${line}p "file159")" "$(sed -n ${line}p "file160")" "$(sed -n ${line}p "file161")" "$(sed -n ${line}p "file162")" "$(sed -n ${line}p "file163")" "$(sed -n ${line}p "file164")" "$(sed -n ${line}p "file165")" "$(sed -n ${line}p "file166")" "$(sed -n ${line}p "file167")" "$(sed -n ${line}p "file168")" "$(sed -n ${line}p "file169")" "$(sed -n ${line}p "file170")" "$(sed -n ${line}p "file171")" "$(sed -n ${line}p "file172")" "$(sed -n ${line}p "file173")" "$(sed -n ${line}p "file174")" "$(sed -n ${line}p "file175")" "$(sed -n ${line}p "file176")" "$(sed -n ${line}p "file177")" "$(sed -n ${line}p "file178")" "$(sed -n ${line}p "file179")" "$(sed -n ${line}p "file180")" "$(sed -n ${line}p "file181")" "$(sed -n ${line}p "file182")" "$(sed -n ${line}p "file183")" "$(sed -n ${line}p "file184")" "$(sed -n ${line}p "file185")" "$(sed -n ${line}p "file186")" "$(sed -n ${line}p "file187")" "$(sed -n ${line}p "file188")" "$(sed -n ${line}p "file189")" "$(sed -n ${line}p "file190")" "$(sed -n ${line}p "file191")" "$(sed -n ${line}p "file192")" "$(sed -n ${line}p "file193")" "$(sed -n ${line}p "file194")" "$(sed -n ${line}p "file195")" "$(sed -n ${line}p "file196")" "$(sed -n ${line}p "file197")" "$(sed -n ${line}p "file198")" "$(sed -n ${line}p "file199")" "$(sed -n ${line}p "file200")"
done > newfile.txt
    
por 06.02.2018 / 23:03