recorte os dois primeiros caracteres na segunda coluna

3

Eu tenho um arquivo com uma lista de estados / províncias americanas e canadenses e é assim:

    id,name,abbreviation,country,type,sort,status,occupied,notes,fips_state,assoc_press,standard_federal_region,census_region,census_region_name,census_division,census_division_name,circuit_court
"1","Alabama","AL","USA","state","10","current","occupied","","1","Ala.","IV","3","South","6","East South Central","11"
"2","Alaska","AK","USA","state","10","current","occupied","","2","Alaska","X","4","West","9","Pacific","9"
"3","Arizona","AZ","USA","state","10","current","occupied","","4","Ariz.","IX","4","West","8","Mountain","9"
"4","Arkansas","AR","USA","state","10","current","occupied","","5","Ark.","VI","3","South","7","West South Central","8"
"5","California","CA","USA","state","10","current","occupied","","6","Calif.","IX","4","West","9","Pacific","9"
"6","Colorado","CO","USA","state","10","current","occupied","","8","Colo.","VIII","4","West","8","Mountain","10"
"7","Connecticut","CT","USA","state","10","current","occupied","","9","Conn.","I","1","Northeast","1","New England","2"
"8","Delaware","DE","USA","state","10","current","occupied","","10","Del.","III","3","South","5","South Atlantic","3"
"9","Florida","FL","USA","state","10","current","occupied","","12","Fla.","IV","3","South","5","South Atlantic","11"
"10","Georgia","GA","USA","state","10","current","occupied","","13","Ga.","IV","3","South","5","South Atlantic","11"
"11","Hawaii","HI","USA","state","10","current","occupied","","15","Hawaii","IX","4","West","9","Pacific","9"
"12","Idaho","ID","USA","state","10","current","occupied","","16","Idaho","X","4","West","8","Mountain","9"
"13","Illinois","IL","USA","state","10","current","occupied","","17","Ill.","V","2","Midwest","3","East North Central","7"
"14","Indiana","IN","USA","state","10","current","occupied","","18","Ind.","V","2","Midwest","3","East North Central","7"
"15","Iowa","IA","USA","state","10","current","occupied","","19","Iowa","VII","2","Midwest","4","West North Central","8"
"16","Kansas","KS","USA","state","10","current","occupied","","20","Kan.","VII","2","Midwest","4","West North Central","10"
"17","Kentucky","KY","USA","state","10","current","occupied","","21","Ky.","IV","3","South","6","East South Central","6"
"18","Louisiana","LA","USA","state","10","current","occupied","","22","La.","VI","3","South","7","West South Central","5"
"19","Maine","ME","USA","state","10","current","occupied","","23","Maine","I","1","Northeast","1","New England","1"
"20","Maryland","MD","USA","state","10","current","occupied","","24","Md.","III","3","South","5","South Atlantic","4"
"21","Massachusetts","MA","USA","state","10","current","occupied","","25","Mass.","I","1","Northeast","1","New England","1"
"22","Michigan","MI","USA","state","10","current","occupied","","26","Mich.","V","2","Midwest","3","East North Central","6"
"23","Minnesota","MN","USA","state","10","current","occupied","","27","Minn.","V","2","Midwest","4","West North Central","8"
"24","Mississippi","MS","USA","state","10","current","occupied","","28","Miss.","IV","3","South","6","East South Central","5"
"25","Missouri","MO","USA","state","10","current","occupied","","29","Mo.","VII","2","Midwest","4","West North Central","8"
"26","Montana","MT","USA","state","10","current","occupied","","30","Mont.","VIII","4","West","8","Mountain","9"
"27","Nebraska","NE","USA","state","10","current","occupied","","31","Neb.","VII","2","Midwest","4","West North Central","8"
"28","Nevada","NV","USA","state","10","current","occupied","","32","Nev.","IX","4","West","8","Mountain","9"
"29","New Hampshire","NH","USA","state","10","current","occupied","","33","N.H.","I","1","Northeast","1","New England","1"
"30","New Jersey","NJ","USA","state","10","current","occupied","","34","N.J.","II","1","Northeast","2","Mid-Atlantic","3"
"31","New Mexico","NM","USA","state","10","current","occupied","","35","N.M.","VI","4","West","8","Mountain","10"
"32","New York","NY","USA","state","10","current","occupied","","36","N.Y.","II","1","Northeast","2","Mid-Atlantic","2"
"33","North Carolina","NC","USA","state","10","current","occupied","","37","N.C.","IV","3","South","5","South Atlantic","4"
"34","North Dakota","ND","USA","state","10","current","occupied","","38","N.D.","VIII","2","Midwest","4","West North Central","8"
"35","Ohio","OH","USA","state","10","current","occupied","","39","Ohio","V","2","Midwest","3","East North Central","6"
"36","Oklahoma","OK","USA","state","10","current","occupied","","40","Okla.","VI","3","South","7","West South Central","10"
"37","Oregon","OR","USA","state","10","current","occupied","","41","Ore.","X","4","West","9","Pacific","9"
"38","Pennsylvania","PA","USA","state","10","current","occupied","","42","Pa.","III","1","Northeast","2","Mid-Atlantic","3"
"39","Rhode Island","RI","USA","state","10","current","occupied","","44","R.I.","I","1","Northeast","1","New England","1"
"40","South Carolina","SC","USA","state","10","current","occupied","","45","S.C.","IV","3","South","5","South Atlantic","4"
"41","South Dakota","SD","USA","state","10","current","occupied","","46","S.D.","VIII","2","Midwest","4","West North Central","8"
"42","Tennessee","TN","USA","state","10","current","occupied","","47","Tenn.","IV","3","South","6","East South Central","6"
"43","Texas","TX","USA","state","10","current","occupied","","48","Texas","VI","3","South","7","West South Central","5"
"44","Utah","UT","USA","state","10","current","occupied","","49","Utah","VIII","4","West","8","Mountain","10"
"45","Vermont","VT","USA","state","10","current","occupied","","50","Vt.","I","1","Northeast","1","New England","2"
"46","Virginia","VA","USA","state","10","current","occupied","","51","Va.","III","3","South","5","South Atlantic","4"
"47","Washington","WA","USA","state","10","current","occupied","","53","Wash.","X","4","West","9","Pacific","9"
"48","West Virginia","WV","USA","state","10","current","occupied","","54","W.Va.","III","3","South","5","South Atlantic","4"
"49","Wisconsin","WI","USA","state","10","current","occupied","","55","Wis.","V","2","Midwest","3","East North Central","7"
"50","Wyoming","WY","USA","state","10","current","occupied","","56","Wyo.","VIII","4","West","8","Mountain","10"
"51","Washington DC","DC","USA","capitol","10","current","occupied","","11","","III","3","South","5","South Atlantic","D.C."
"60","Alberta","AB","Canada","province","30","current","occupied","","","","","","","","",""
"61","British Columbia","BC","Canada","province","30","current","occupied","","","","","","","","",""
"62","Manitoba","MB","Canada","province","30","current","occupied","","","","","","","","",""
"63","New Brunswick","NB","Canada","province","30","current","occupied","","","","","","","","",""
"64","Newfoundland and Labrador","NL","Canada","province","30","current","occupied","","","","","","","","",""
"65","Nova Scotia","NS","Canada","province","30","current","occupied","","","","","","","","",""
"66","Ontario","ON","Canada","province","30","current","occupied","","","","","","","","",""
"67","Prince Edward Island","PE","Canada","province","30","current","occupied","","","","","","","","",""
"68","Quebec","QC","Canada","province","30","current","occupied","","","","","","","","",""
"69","Saskatchewan","SK","Canada","province","30","current","occupied","","","","","","","","",""

e gostaria de fazer isso:

name,country
Alabama,US
...
Wyoming,US
Alberta,Ca
Saskatchewan,Ca

Primeiro os estados dos EUA e depois as províncias de Ca.

Minha solução é esta:

#!/bin/sh

cat north_america.csv | head -n1 | cut -d',' -f2,4 > title
cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k 2  | head -n10 > Canada
cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k 2  | tail -n +11  > USA

cat USA | rev | cut -c-1 --complement | rev > file1
cat Canada | rev | cut -c 1-4 --complement | rev > file2

cat title > states
cat file1 >> states
cat file2 >> states

A minha pergunta é, se eu posso "cortar" de alguma forma os dois primeiros personagens da segunda coluna? Em vez de 'cabeça' e 'cauda' eu vou usar

cat north* | tail -n +2 | cut -d',' -f2,4 | tr -d '"' | sort -t','  -k2,2r >> states

e depois eu faria um comando "cut". Mas eu não sei como fazer isso. E eu não quero usar cabeça e cauda e dividir o arquivo em dois arquivos. Eu gostaria de facilitar a abordagem.

Eu apreciarei qualquer conselho.

    
por Muffy 23.03.2017 / 17:42

1 resposta

3

Tudo que você precisa para isso é:

awk -F, -vOFS="," '{print $2,$4}' file 

O -F, define o separador de campo como , e o -vOFS="," define o separador do campo de saída como , . Então, apenas imprimimos o segundo e o quarto campo de cada linha. No seu arquivo de exemplo, isso retorna:

$ awk -F, -vOFS="," '{print $2,$4}' file 
name,country
"Alabama","USA"
"Alaska","USA"
"Arizona","USA"
"Arkansas","USA"
"California","USA"
"Colorado","USA"
"Connecticut","USA"
"Delaware","USA"
"Florida","USA"
"Georgia","USA"
"Hawaii","USA"
"Idaho","USA"
"Illinois","USA"
"Indiana","USA"
"Iowa","USA"
"Kansas","USA"
"Kentucky","USA"
"Louisiana","USA"
"Maine","USA"
"Maryland","USA"
"Massachusetts","USA"
"Michigan","USA"
"Minnesota","USA"
"Mississippi","USA"
"Missouri","USA"
"Montana","USA"
"Nebraska","USA"
"Nevada","USA"
"New Hampshire","USA"
"New Jersey","USA"
"New Mexico","USA"
"New York","USA"
"North Carolina","USA"
"North Dakota","USA"
"Ohio","USA"
"Oklahoma","USA"
"Oregon","USA"
"Pennsylvania","USA"
"Rhode Island","USA"
"South Carolina","USA"
"South Dakota","USA"
"Tennessee","USA"
"Texas","USA"
"Utah","USA"
"Vermont","USA"
"Virginia","USA"
"Washington","USA"
"West Virginia","USA"
"Wisconsin","USA"
"Wyoming","USA"
"Washington DC","USA"
"Alberta","Canada"
"British Columbia","Canada"
"Manitoba","Canada"
"New Brunswick","Canada"
"Newfoundland and Labrador","Canada"
"Nova Scotia","Canada"
"Ontario","Canada"
"Prince Edward Island","Canada"
"Quebec","Canada"
"Saskatchewan","Canada"

Para remover as aspas, você pode passar por tr :

awk -F, -vOFS="," '{print $2,$4}' file | tr -d \"

Para obter a saída exatamente como você mostra (portanto, não há " , US em vez de USA e Ca em vez de Canada ), você pode usar (assumindo o GNU sed ):

awk -F, -vOFS="," '{print $2,$4}' file | sed 's/"//g; s/USA/US/; s/Canada/Ca/'

Ou, se você não tiver o GNU sed :

awk -F, -vOFS="," '{print $2,$4}' file | sed -e 's/"//g' -e 's/USA/US/' -e 's/Canada/Ca/'
    
por 23.03.2017 / 17:54