sed, grep ou comando tr que retorna apenas caracteres latinos de um arquivo UTF-8

2

Estou trabalhando com o texto dos poemas sobre 300 tang , que infelizmente são um único arquivo contendo os termos chineses. e inglês. Como estou interessado em "extrair" o inglês, espero usar sed, grep, ou tr para simplesmente retornar todas as linhas que contenham caracteres latinos. Então, por exemplo, este texto:

051
七言古詩
李頎
聽安萬善吹觱篥歌

南山截竹為觱篥, 此樂本自龜茲出。 
流傳漢地曲轉奇, 涼州胡人為我吹; 
傍鄰聞者多歎息, 遠客思鄉皆淚垂。 
世人解聽不解賞, 長飆風中自來往。 
枯桑老柏寒颼飀, 九雛鳴鳳亂啾啾。 
龍吟虎嘯一時發, 萬籟百泉相與秋。 
忽然更作漁陽摻, 黃雲蕭條白日暗。 
變調如聞楊柳春, 上林繁花照眼新。 
歲夜高堂列明燭, 美酒一杯聲一曲。

Seven-character-ancient-verse
Li Qi
ON HEARING AN WANSHAN PLAY THE REED-PIPE

Bamboo from the southern hills was used to make this pipe. 
And its music, that was introduced from Persia first of all, 
Has taken on new magic through later use in China. 
And now the Tartar from Liangzhou, blowing it for me, 
Drawing a sigh from whosoever hears it, 
Is bringing to a wanderer's eyes homesick tears.... 
Many like to listen; but few understand. 
To and fro at will there's a long wind flying, 
Dry mulberry-trees, old cypresses, trembling in its chill. 
There are nine baby phoenixes, outcrying one another; 
A dragon and a tiger spring up at the same moment; 
Then in a hundred waterfalls ten thousand songs of autumn 
Are suddenly changing to The Yuyang Lament; 
And when yellow clouds grow thin and the white sun darkens, 
They are changing still again to Spring in the Willow Trees. 
Like Imperial Garden flowers, brightening the eye with beauty, 
Are the high-hall candles we have lighted this cold night, 
And with every cup of wine goes another round of music.

Eu gostaria de um comando que retorna apenas a linha 051, pula o chinês e retorna a linha 'seven character ancient versus' e tudo o que segue.

    
por ixtmixilix 29.05.2011 / 14:26

3 respostas

6

Por que não apenas:

# grep -e "[a-zA-Z0-9]\|^$" file.txt
    
por 29.05.2011 / 14:52
6

O seguinte comando Perl imprime as linhas que não contêm caracteres chineses (Han script ). -CIO diz ao perl que a entrada e a saída estão codificadas em UTF-8.

perl -CIO -lne '/\p{Han}/ or print'
    
por 30.05.2011 / 00:23
3

COMANDO

uconv -c -s -t ASCII <<\POEM

051
七言古詩
李頎
聽安萬善吹觱篥歌

南山截竹為觱篥, 此樂本自龜茲出。
流傳漢地曲轉奇, 涼州胡人為我吹;
傍鄰聞者多歎息, 遠客思鄉皆淚垂。
世人解聽不解賞, 長飆風中自來往。
枯桑老柏寒颼飀, 九雛鳴鳳亂啾啾。
龍吟虎嘯一時發, 萬籟百泉相與秋。
忽然更作漁陽摻, 黃雲蕭條白日暗。
變調如聞楊柳春, 上林繁花照眼新。
歲夜高堂列明燭, 美酒一杯聲一曲。

Seven-character-ancient-verse
Li Qi
ON HEARING AN WANSHAN PLAY THE REED-PIPE

Bamboo from the southern hills was used to make this pipe.
And its music, that was introduced from Persia first of all,
Has taken on new magic through later use in China.
And now the Tartar from Liangzhou, blowing it for me,
Drawing a sigh from whosoever hears it,
Is bringing to a wanderer's eyes homesick tears....
#END
POEM

OUTPUT

051














Seven-character-ancient-verse
Li Qi
ON HEARING AN WANSHAN PLAY THE REED-PIPE

Bamboo from the southern hills was used to make this pipe.
And its music, that was introduced from Persia first of all,
Has taken on new magic through later use in China.
And now the Tartar from Liangzhou, blowing it for me,
Drawing a sigh from whosoever hears it,
Is bringing to a wanderer's eyes homesick tears....
    
por 07.05.2014 / 06:09