Eu usei texttopdf no linux para extrair todo o texto de um pdf de múltiplas páginas, está tudo bem, exceto que cada página tem um alinhamento diferente, embora no pdf original este não é o caso, aqui está uma amostra dos 3 primeiros páginas:
Instructor First Number Students Who Number Students Who
Subject Course Section Instructor Last Name A B C D F
Name Completed the Class Dropped the Class
ACCT 201 01 Karin Hatheway Dial 56 6 19 9 16 2 5
ACCT 202 01 Karin Hatheway Dial 69 11 37 14 7 2 6
ACCT 205 01 Darryl Woolley 20 1 3 7 6 1 3
ACCT 205 02 Darryl Woolley 28 1 6 7 13 2
ACCT 205 03 Darryl Woolley 42 5 4 13 21 1 3
ACCT 205 04 Darryl Woolley 23 1 9 5 8 1
ACCT 205 05 Darryl Woolley 30 2 11 7 9 2 1
ACCT 205 06 Darryl Woolley 25 3 8 9 6 1 1
ACCT 275 01 Darryl Woolley 33 2 7 15 9 1 1
ACCT 310 01 Marla Kraut 16 1 1 6 7 2
ACCT 310 02 Marla Kraut 64 5 43 15 1
ACCT 310 03 Marla Kraut 72 3 11 47 10 3 1
ACCT 311 01 Karin Hatheway Dial 45 13 20 11 1
ACCT 311 02 Karin Hatheway Dial 25 10 12 3
ACCT 315 01 Jason Porter 26 6 5 8 6 1
ACCT 315 02 Jason Porter 29 1 6 10 5 7 1
ACCT 414 01 Teresa Gordon 22 1 6 6 9 1
ACCT 483 01 Glen Utzman 26 1 7 13 6
ACCT 486 01 Teresa Gordon 33 13 14 6
ACCT 492 01 Jason Wills 23 5 8 9 1
ACCT 515 01 Jeffrey Harkins 15 7 6 1
ACCT 561 01 Jason Porter 18 1 10 7 1
ADOL 526 13 Charles Gagel 21 2 19 1 1
ADOL 573 13 Martha Yopp 28 16 3 1
ADOL 574 01 Laura Holyoke 16 12 3 1
ADOL 574 11 Laura Holyoke 9 1 8 1
ADOL 574 13 Laura Holyoke 15 10 4 1
ADOL 600 13 Roger Scott 19 4 1
AERO 101 01 William Beauter 11 8 2 1
AERO 103 01 Sarah Babbitt 15 7 6 1 1
AERO 411 01 Sarah Babbitt 11 6 4 1
AERO 413 01 Sarah Babbitt 12 8 3 1
AGEC 101 01 Larry Van Tassell 36 1 20 15 1
AGEC 278 01 Larry Makus 21 1 2 6 8 5
AGEC 278 02 Larry Makus 18 5 10 2 1
AGEC 278 03 Larry Makus 17 1 2 7 5 2 1
AGEC 301 01 Christopher McIntosh 18 9 4 5
AGEC 356 01 Joseph Guenthner 23 15 6 2
AGEC 361 01 Ruby Stroschein 11 4 1 6
AGEC 411 01 Robert Haggerty 11 6 4 1
AGEC 413 01 Robert Spear 12 3 4 5 2 1
AGEC 415 01 Larry Van Tassell 11 10 1
AGEC 526 01 Scott Matulich 7 2 5
AGEC 527 01 Stephen Cooke 5 3 2
AGED 180 01 Lori Moore 23 1 14 5 1 3
AGED 351 01 Lou Riesenberg 11 4 6 1
AMST 301 01 Walter Hesford 26 14 8 3 1
ANTH 100 01 Mark Warner 104 15 31 31 21 8 12
ANTH 220 01 Fumiyasu Arakawa 138 4 48 53 19 10 8
ANTH 230 01 Robert Sappington 28 1 7 9 9 2 1
ANTH 251 01 Donald Tyler 36 1 10 14 8 1 3
ANTH 420 01 Laura Putsche 12 3 4 2 2
ANTH 422 01 Rodney Frey 13 11 2
ANTH 427 02 Virginia Babcock 13 1 2 6 4 1
ANTH 462 01 Laura Putsche 33 3 8 20 3 1
ARBC 101 01 Anisah El-Mansouri 14 1 8 5 1
ARCH 151 01 Randall Teal 150 8 72 40 13 6 19
ARCH 253 01 Roman Montoto 23 1 9 10 2 1
ARCH 253 02 Randall Teal 22 2 9 11 2
ARCH 253 03 Xiao Hu 23 2 11 12
ARCH 353 01 Matthew Brehm 16 7 7 1
ARCH 353 02 Dillon Ellefson 16 4 11 1
ARCH 353 03 Xiao Hu 10 4 6
ARCH 385 01 Anne Marshall 68 5 29 22 11 2 4
ARCH 404 04 Matthew Brehm 10 1 5 3 1
ARCH 453 01 Roman Montoto 10 5 4 1
ARCH 453 02 Anne Marshall 13 6 5 1
ARCH 463 01 Phillip Mead 63 1 26 31 5 1
ARCH 465 01 Kenneth Carper 51 1 8 26 12 3
ARCH 483 01 D. Reese 71 2 27 35 8
ARCH 504 02 Randall Teal 15 9 6
ARCH 504 03 Kevin Van Den Wymelenberg 6 3 1 1
ARCH 504 04 Frank Jacobus 12 1 8 4
ARCH 510 02 D. Reese 13 9 4
ARCH 510 04 Robert Thornton 9 7 1
ARCH 510 05 Roman Montoto 11 2 7 4
ARCH 553 01 Bruce Haglund 14 12 2
Como você pode ver, o alinhamento muda em cada página, o que eu gostaria de alcançar é um único alinhamento para todas as páginas, é possível? Eu tentei usar expandir -8, sed com diferentes padrões, mas sem sucesso.
Obrigado,