Erro ao copiar o arquivo para o HDFS no ecossistema hadoop

1

Ao emitir um comando para copiar o arquivo do sistema de arquivos local para o HDFS no terminal no Hadoop 3.0, ele mostra o erro

hadoop-3.0.0/hadoop2_data/hdfs/datanode': No such file or directory: 
'hdfs://localhost:9000/user/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanode.

No entanto, verifiquei que o diretório hadoop-3.0.0 / hadoop2_data / hdfs / datanode existe com os direitos de acesso apropriados. Tentei fazer o upload do arquivo do navegador da Web e ele mostra o seguinte erro.

"Couldn't find datanode to write file. Forbidden"

Por favor, ajude a resolver o problema. Anexando o core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/Amit/hadoop-3.0.0/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/home/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>

Verifiquei o arquivo de log do Datanode no diretório de instalação do Hadoop e ele está mostrando a mensagem de sucesso como

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/tmp/hadoop-Amit/dfs/data
INFO org.apache.commons.beanutils.FluentPropertyBeanIntrospector: Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is DESKTOP-JIUFBOR.localdomain
INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 0
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /0.0.0.0:9866
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwidth is 10485760 bytes/s
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50
INFO org.eclipse.jetty.util.log: Logging initialized @146677ms
INFO org.apache.hadoop.security.authentication.server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined
INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 49833
INFO org.eclipse.jetty.server.Server: jetty-9.3.19.v20170502
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@3a0baae5{/logs,file:///home/Amit/hadoop-3.0.0/logs/,AVAILABLE}
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@289710d9{/static,file:///home/Amit/hadoop-3.0.0/share/hadoop/hdfs/webapps/static/,AVAILABLE}
INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@3016fd5e{/,file:///home/Amit/hadoop-3.0.0/share/hadoop/hdfs/webapps/datanode/,AVAILABLE}{/datanode}
INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@178213b{HTTP/1.1,[http/1.1]}{localhost:49833}
INFO org.eclipse.jetty.server.Server: Started @151790ms
INFO org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864
INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = Amit
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup = supergroup
INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 1000 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 9867
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC server at /0.0.0.0:9867
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received for nameservices: null
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices for nameservices: <default>
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000 starting to offer service
INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9867: starting
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode during handshakeBlock pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000
INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /tmp/hadoop-Amit/dfs/data/in_use.lock acquired by nodename [email protected]
INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for /tmp/hadoop-Amit/dfs/data/current/BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=1436602813;bpid=BP-1751678544-127.0.1.1-1518974872649;lv=-57;nsInfo=lv=-64;cid=CID-b7086125-1e01-4cf4-94d0-f8b6b1d4db25;nsid=1436602813;c=1518974872649;bpid=BP-1751678544-127.0.1.1-1518974872649;dnuuid=f132f3ae-7f95-424d-b4d0-729602fc80dd
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added new volume: DS-ba9d49d2-87cb-4dff-ae80-d7f11382644f
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Added volume - [DISK]file:/tmp/hadoop-Amit/dfs/data, StorageType: DISK
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Registered FSDatasetState MBean
INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for /tmp/hadoop-Amit/dfs/data
INFO org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker: Scheduled health check for volume /tmp/hadoop-Amit/dfs/data
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding block pool BP-1751678544-127.0.1.1-1518974872649
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data...
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-1751678544-127.0.1.1-1518974872649 on /tmp/hadoop-Amit/dfs/data: 1552ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to scan all replicas for block pool BP-1751678544-127.0.1.1-1518974872649: 1597ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data...
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice: Replica Cache file: /tmp/hadoop-Amit/dfs/data/current/BP-1751678544-127.0.1.1-1518974872649/current/replicas doesn't exist 
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1751678544-127.0.1.1-1518974872649 on volume /tmp/hadoop-Amit/dfs/data: 1ms
INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Total time to add all replicas to map: 4ms
INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/tmp/hadoop-Amit/dfs/data, DS-ba9d49d2-87cb-4dff-ae80-d7f11382644f): no suitable block pools found to scan.  Waiting 1811581849 ms.
INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Periodic Directory Tree Verification scan starting at 2/18/18 8:15 PM with interval of 21600000ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1751678544-127.0.1.1-1518974872649 (Datanode Uuid f132f3ae-7f95-424d-b4d0-729602fc80dd) service to localhost/127.0.0.1:9000 beginning handshake with NN
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool Block pool BP-1751678544-127.0.1.1-1518974872649 (Datanode Uuid f132f3ae-7f95-424d-b4d0-729602fc80dd) service to localhost/127.0.0.1:9000 successfully registered with NN
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode localhost/127.0.0.1:9000 using BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of 10000msec Initial delay: 0msec; heartBeatInterval=3000
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Successfully sent block report 0xe646383a22bd4be5,  containing 1 storage report(s), of which we sent 1. The reports had 0 total blocks and used 1 RPC(s). This took 9 msec to generate and 834 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
    
por Amit Singla 18.02.2018 / 15:25

1 resposta

2

Resumo executivo:

Parece que você está tentando copiar os arquivos na pasta HDFS inexistente

Resposta detalhada:

Existe uma enorme diferença entre o HDFS file-system e o sistema de arquivos regular.

Sistema de arquivos HDFS - O Hadoop Distributed File System (HDFS) foi projetado para armazenar arquivos grandes em todos os computadores de forma confiável. um grande aglomerado.

O sistema de arquivos é distribuído entre várias máquinas e pode ser acessado apenas por comandos do HDFS (ou equivalente).

Assumindo que você estava usando /user/Amit/hadoop-3.0.0/hadoop2_data/hdfs/datanode como pasta de destino do HDFS - suspeito que a pasta não exista.

Você pode testar minha suposição ao executar os seguintes comandos:

  • Copie o arquivo para a pasta HDFS /tmp

    hadoop fs -put <LocalFileSystem_Path> /tmp
    
  • Copie o arquivo para a pasta padrão HDFS ( . )

    hadoop fs -put <LocalFileSystem_Path> .
    

Depois, você pode executar o comando ls (list files) - para ver se os arquivos estão lá:

  • Listar arquivos na pasta HDFS /tmp

    hadoop dfs -ls /tmp
    
  • Listar arquivos na pasta padrão HDFS ( . )

    hadoop dfs -ls .
    

Mais informações sobre - Shell do sistema de arquivos HDFS pode ser encontrado aqui :

Atualização:

Para verificar se a pasta HDFS /user/Amit realmente existe, você pode executar o seguinte comando:

hadoop dfs -ls /user/Amit

Se a pasta existir, você poderá copiar arquivos usando:

hadoop fs -put <LocalFileSystem_Path> /user/Amit

Se a pasta não existir, você precisará investigar o sistema de arquivos HDFS, usando, por exemplo:

hadoop dfs -ls /

Seguido da execução de ls nos subdiretórios.

Se você tiver as permissões relevantes, poderá criar subpastas, por exemplo, se a pasta /user/Amit existir, você poderá executar:

hdfs dfs -mkdir /user/Amit/newsubfolder

Note, você pode tentar usar a opção mkdir -p , que fará também as pastas pais (se você tiver as permissões necessárias) veja dois exemplos abaixo (eu acho que você tem as permissões necessárias para o primeiro comando ):

hdfs dfs -mkdir /tmp/Amit/fold1/fold2
hdfs dfs -mkdir /user/Amit/fold1/fold2
    
por Yaron 19.02.2018 / 08:49