Eu tenho tentado converter vários arquivos xlsx que são muito largos (mais de 200 colunas) em arquivos csv para que eu possa agregá-los em um único arquivo para análise. Eu tenho tentado usar o csvkit, mas ele fica pendurado nos arquivos.
A mensagem de erro quando eu alterno o modo verboso varia cada vez que eu corro, aqui está uma:
C:\_analysis>in2csv -v 85910332_PE20160101_RECLMEXP.xlsx > test.csv
c:\python35\lib\site-packages\openpyxl\workbook\names\named_range.py:121: UserWarning: Discarded range with reserved name
warnings.warn("Discarded range with reserved name")
Traceback (most recent call last):
File "C:\Python35\Scripts\in2csv-script.py", line 9, in <module>
load_entry_point('csvkit==0.9.1', 'console_scripts', 'in2csv')()
File "c:\python35\lib\site-packages\csvkit\utilities\in2csv.py", line 82, in launch_new_instance
utility.main()
File "c:\python35\lib\site-packages\csvkit\utilities\in2csv.py", line 76, in main
data = convert.convert(self.input_file, filetype, **kwargs)
File "c:\python35\lib\site-packages\csvkit\convert\__init__.py", line 39, in convert
return xlsx2csv(f, **kwargs)
File "c:\python35\lib\site-packages\csvkit\convert\xlsx.py", line 66, in xlsx2csv
value = c.value
File "c:\python35\lib\site-packages\openpyxl\cell\read_only.py", line 107, in value
if self.data_type == 'b':
KeyboardInterrupt
Quando o executei novamente, o erro foi ligeiramente diferente:
C:\_analysis>in2csv -v 85910332_PE20160101_RECLMEXP.xlsx > test.csv
c:\python35\lib\site-packages\openpyxl\workbook\names\named_range.py:121: UserWarning: Discarded range with reserved name
warnings.warn("Discarded range with reserved name")
Traceback (most recent call last):
File "C:\Python35\Scripts\in2csv-script.py", line 9, in <module>
load_entry_point('csvkit==0.9.1', 'console_scripts', 'in2csv')()
File "c:\python35\lib\site-packages\csvkit\utilities\in2csv.py", line 82, in launch_new_instance
utility.main()
File "c:\python35\lib\site-packages\csvkit\utilities\in2csv.py", line 76, in main
data = convert.convert(self.input_file, filetype, **kwargs)
File "c:\python35\lib\site-packages\csvkit\convert\__init__.py", line 39, in convert
return xlsx2csv(f, **kwargs)
File "c:\python35\lib\site-packages\csvkit\convert\xlsx.py", line 58, in xlsx2csv
for i, row in enumerate(sheet.iter_rows()):
File "c:\python35\lib\site-packages\openpyxl\worksheet\iter_worksheet.py", line 103, in get_squared_range
for _event, element in p:
File "c:\python35\lib\xml\etree\ElementTree.py", line 1290, in __next__
for event in self._parser.read_events():
File "c:\python35\lib\xml\etree\ElementTree.py", line 1257, in read_events
index = self._index
KeyboardInterrupt
Alguma idéia?
Estou executando o Windows 10, os arquivos de origem estão no Excel 2013 e o Python 3.5.1 e as seguintes versões de biblioteca csvkit == 0.9.1 jdcal == 1,2 numpy == 1.10.2 openpyxl == 2.2.0b1 python-dateutil == 2,2 seis == 1.10.0 SQLAlchemy == 1.0.13 xlrd == 1.0.0