como mesclar dois arquivos com base em uma única coluna

1

Eu gostaria de mesclar dois arquivos. Eu olhei para as perguntas e respostas anteriores, mas nenhuma delas corresponde à minha saída desejada.
Existem dois arquivos csv - file1.csv, file2.csv. file1.csv e file2.csv não são do mesmo tamanho. o programa deve mesclar os arquivos e imprimir tudo de ambos os arquivos com base na coluna 1. Entrada de dados

file1.csv tem 4 colunas.

$ cat file1.csv
Contig_Spider_Gland_98_1_1,>Contig_Spider_Gland_98_1_1 [1169 - 963] (REVERSE SENSE),MQGHRRKLATPRQRAPRKERQRALLLRLQWRIGLQPCSRRNKSLDRKNIYWRYLVEYGSWKGRTHISDV,C# 
Contig_Spider_Gland_98_7_3,>Contig_Spider_Gland_98_17965_1 [90 - 278],MADVEKTSCCTETKECCKDETCCENGQGACHTGKEECKDTCHKKACGCKAGEDCKCSDGKCGC,CC#CC#CC#C#C#C#C#C#C#C#C#C# 

$ cat file2.csv
Contig_Spider_Gland_98_1_1, SignalP-4.1,     SIGNAL,  1,    22, 0.808,  YES
Contig_Spider_Gland_98_8_2, SignalP-4.1,    SIGNAL  1,  20, 0.877,  YES

Saída de produtos

Contig_Spider_Gland_98_1_1,>Contig_Spider_Gland_98_1_1 [1169 - 963] (REVERSE SENSE),MQGHRRKLATPRQRAPRKERQRALLLRLQWRIGLQPCSRRNKSLDRKNIYWRYLVEYGSWKGRTHISDV,C#,Contig_Spider_Gland_98_1_1, SignalP-4.1,     SIGNAL,  1,   22, 0.808,  YES
Contig_Spider_Gland_98_7_3,>Contig_Spider_Gland_98_17965_1 [90 - 278],MADVEKTSCCTETKECCKDETCCENGQGACHTGKEECKDTCHKKACGCKAGEDCKCSDGKCGC,CC#CC#CC#C#C#C#C#C#C#C#C#C#,no match

Obrigado pela sua ajuda

    
por user138116 12.10.2015 / 11:48

1 resposta

1

É isso que você quer?

join -t, file1.csv file2.csv -a 1 -o auto -e 'no match'
Contig_Spider_Gland_98_1_1,>Contig_Spider_Gland_98_1_1 [1169 - 963] (REVERSE SENSE),MQGHRRKLATPRQRAPRKERQRALLLRLQWRIGLQPCSRRNKSLDRKNIYWRYLVEYGSWKGRTHISDV,C# , SignalP-4.1,     SIGNAL,  1,    22, 0.808,  YES
Contig_Spider_Gland_98_7_3,>Contig_Spider_Gland_98_17965_1 [90 - 278],MADVEKTSCCTETKECCKDETCCENGQGACHTGKEECKDTCHKKACGCKAGEDCKCSDGKCGC,CC#CC#CC#C#C#C#C#C#C#C#C#C# ,no match,no match,no match,no match,no match,no match

E se o arquivo de linhas2 precisar ser impresso também:

join -t, file1.csv file2.csv -a 1 -a2 -o auto -e 'no match'
Contig_Spider_Gland_98_1_1,>Contig_Spider_Gland_98_1_1 [1169 - 963] (REVERSE SENSE),MQGHRRKLATPRQRAPRKERQRALLLRLQWRIGLQPCSRRNKSLDRKNIYWRYLVEYGSWKGRTHISDV,C# , SignalP-4.1,     SIGNAL,  1,    22, 0.808,  YES
Contig_Spider_Gland_98_7_3,>Contig_Spider_Gland_98_17965_1 [90 - 278],MADVEKTSCCTETKECCKDETCCENGQGACHTGKEECKDTCHKKACGCKAGEDCKCSDGKCGC,CC#CC#CC#C#C#C#C#C#C#C#C#C# ,no match,no match,no match,no match,no match,no match
Contig_Spider_Gland_98_8_2,no match,no match,no match, SignalP-4.1,    SIGNAL  1,  20, 0.877,  YES,no match
    
por 12.10.2015 / 12:13