Como obter o tamanho da memória da GPU disponível para o OpenCL?

1

Como é possível obter o tamanho da memória em uma GPU, que está disponível para programas com o uso OpenCL para cálculos, como darktable ?

Eu sei de lspci , que fornece algumas informações gerais, mas não o que estou procurando.

$ sudo lspci -v -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao XT [Radeon R9 270X] (prog-if 00 [VGA controller])
    Subsystem: Gigabyte Technology Co., Ltd Device 227d
    Flags: bus master, fast devsel, latency 0, IRQ 49
    Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Memory at fe780000 (64-bit, non-prefetchable) [size=256K]
    I/O ports at c000 [size=256]
    Expansion ROM at fe7c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: [58] Express Legacy Endpoint, MSI 00
    Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
    Capabilities: [150] Advanced Error Reporting
    Capabilities: [270] #19
    Capabilities: [2b0] Address Translation Service (ATS)
    Capabilities: [2c0] #13
    Capabilities: [2d0] #1b
    Kernel driver in use: fglrx_pci

Ele mostra 256 MB, o que não é realista e é muito pequeno (a GPU tem 4 GB de memória total), porque a darktable funciona bem com o OpenCL e requer no mínimo 768 MB.

Em seguida, há clinfo (pacote clinfo), que fornece o seguinte:

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 1.2 AMD-APP (1411.4)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa 


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:               2
  Device Type:                   CL_DEVICE_TYPE_GPU
  Device ID:                     4098
  Board name:                    AMD Radeon R9 200 Series
  Device Topology:               PCI[ B#1, D#0, F#0 ]
  Max compute units:                 20
  Max work items dimensions:             3
    Max work items[0]:               256
    Max work items[1]:               256
    Max work items[2]:               256
  Max work group size:               256
  Preferred vector width char:           4
  Preferred vector width short:          2
  Preferred vector width int:            1
  Preferred vector width long:           1
  Preferred vector width float:          1
  Preferred vector width double:         1
  Native vector width char:          4
  Native vector width short:             2
  Native vector width int:           1
  Native vector width long:          1
  Native vector width float:             1
  Native vector width double:            1
  Max clock frequency:               1100Mhz
  Address bits:                  32
  Max memory allocation:             1073741824
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                16384
  Max image 2D height:               16384
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           1024
  Alignment (bits) of base address:      2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     No
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    16384
  Global memory size:                3221225472
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Scratchpad
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     64
  Error correction support:          0
  Unified memory for Host and Device:        0
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             No
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Pitcairn
  Vendor:                    Advanced Micro Devices, Inc.
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (VM)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir 


  Device Type:                   CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Board name:                    
  Max compute units:                 2
  Max work items dimensions:             3
    Max work items[0]:               1024
    Max work items[1]:               1024
    Max work items[2]:               1024
  Max work group size:               1024
  Preferred vector width char:           16
  Preferred vector width short:          8
  Preferred vector width int:            4
  Preferred vector width long:           2
  Preferred vector width float:          4
  Preferred vector width double:         2
  Native vector width char:          16
  Native vector width short:             8
  Native vector width int:           4
  Native vector width long:          2
  Native vector width float:             4
  Native vector width double:            2
  Max clock frequency:               2664Mhz
  Address bits:                  64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:       128
  Max number of images write arguments:      8
  Max image 2D width:                8192
  Max image 2D height:               8192
  Max image 3D width:                2048
  Max image 3D height:               2048
  Max image 3D depth:                2048
  Max samplers within kernel:            16
  Max size of kernel argument:           4096
  Alignment (bits) of base address:      1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                  Yes
    Round to nearest even:           Yes
    Round to zero:               Yes
    Round to +ve and infinity:           Yes
    IEEE754-2008 fused multiply-add:         Yes
  Cache type:                    Read/Write
  Cache line size:               64
  Cache size:                    32768
  Global memory size:                6258630656
  Constant buffer size:              65536
  Max number of constant args:           8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:          0
  Unified memory for Host and Device:        1
  Profiling timer resolution:            1
  Device endianess:              Little
  Available:                     Yes
  Compiler available:                Yes
  Execution capabilities:                
    Execute OpenCL kernels:          Yes
    Execute native function:             Yes
  Queue properties:              
    Out-of-Order:                No
    Profiling :                  Yes
  Platform ID:                   0x00007fce5d932500
  Name:                      Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz
  Vendor:                    GenuineIntel
  Device OpenCL C version:           OpenCL C 1.2 
  Driver version:                1411.4 (sse2)
  Profile:                   FULL_PROFILE
  Version:                   OpenCL 1.2 AMD-APP (1411.4)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm 

Existem alguns valores com memória em seus nomes, mas qual deles é a memória total disponível? Em qual unidade? O tamanho da memória global é de 512MB em bits e a alocação máxima de memória é de 256MB em bits. O tamanho da memória local pode ser de 4 GB em MB. O clinfo não possui manpage ou ajuda integrada com -h .

Como interpretar corretamente todos esses valores para obter a quantidade de memória disponível na GPU? Existem outros programas que eu possa usar?

Além disso: por que ainda não há tag para o OpenCL?

    
por sebix 27.02.2015 / 17:49

2 respostas

2

Você provavelmente já tem a resposta, mas a saída de clinfo está em bytes e não em bits. O tamanho da memória global é, portanto, de cerca de 3 GB, não 512 MB.

    
por 06.01.2016 / 18:10
1

Então você precisa de um utilitário linux-world-wide de propósito geral para scripts etc para obter essa informação? Temo que obter informações específicas não seja fácil. Eu não estou familiarizado com o pacote clinfo, eu presumo que você tenha que sudo o apt-get instalá-lo.

Porque, se não precisa ser geral, você poderia escrever um aplicativo OpenCL que obteria essa informação. Acredito que o OpenCL deva ter algum método para lhe dar tal informação, é apenas uma questão de escrever uma aplicação simples que inicialize o contexto OpenCL e printfs GPU_MEMORY (ou similar) para o console.

Quanto à tag OpenCL, acho que você teria mais sorte no link

    
por 27.02.2015 / 18:34