Quando um arquivo PDF contém apenas uma imagem digitalizada, é apenas uma imagem JPG dentro de um contêiner PDF?

3

Muitos scanners podem digitalizar uma página em um arquivo PDF.

Quando isso é feito, o arquivo PDF é realmente apenas um contêiner que contém uma única imagem? Essa imagem é tipicamente uma imagem JPG, uma imagem PDF ou um formato proprietário?

    
por RockPaperLizard 13.03.2016 / 07:14

1 resposta

3

De acordo com este link , no - PDF rasga a imagem e recria - às vezes usando a codificação JPEG ou JPEG2000.

A PDF file usually stores an image as a separate object (an XObject) which contains the raw binary data for the image.

It is important to appreciate that this is not usually an image in the sense of a Tif or a Jpg or a Png image – it is the binary data for the pixels, the colorspace used for the image, information about the Image. The image is ripped apart when the PDF is created and different PDF creation tools may store the same image in very different ways.

Sometimes the raw image data is adjusted to the required size needed for the page and sometimes it is not – in that case it is scaled up or down when it is drawn – different PDF creation tools create PDF files in very different ways.

The actual pixel data can be compressed and one of the compression formats (DCTDecode) is the same used as in a JPEG (JPX is the same as Jpeg2000). If you save this data, it can be opened as a JPEG file, but it may need altering to include the colorspace data.

This image is then drawn in the PDF contents stream... Some things which appear as an image to the eye may also be made up of multiple images or not even images at all!

All this means that if you want to extract images from a PDF, you need to assemble the image from all the raw data – it is not stored as a complete image file you can just rip out.

    
por 14.03.2016 / 03:06