Como sincronizar arquivos com s3cmd para o Amazon S3, verifique se foi enviado e removido localmente?

0

Estou tentando usar Amazon S3 service para armazenar logs dos meus aplicativos. Dado /user/bin/s3cmd --help me diz o que preciso saber como enviar os arquivos:

s3cmd --help
usage: s3cmd [options] COMMAND [parameters]

S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.

options:
  -h, --help            show this help message and exit
  --configure           Invoke interactive (re)configuration tool.
  -c FILE, --config=FILE
                        Config file name. Defaults to
                        /home/valter.silva/.s3cfg
  --dump-config         Dump current configuration after parsing config files
                        and command line options and exit.
  -n, --dry-run         Only show what should be uploaded or downloaded but
                        don't actually do it. May still perform S3 requests to
                        get bucket listings and other information though (only
                        for file transfer commands)
  -e, --encrypt         Encrypt files before uploading to S3.
  --no-encrypt          Don't encrypt files.
  -f, --force           Force overwrite and other dangerous operations.
  --continue            Continue getting a partially downloaded file (only for
                        [get] command).
  --skip-existing       Skip over files that exist at the destination (only
                        for [get] and [sync] commands).
  -r, --recursive       Recursive upload, download or removal.
  --check-md5           Check MD5 sums when comparing files for [sync].
                        (default)
  --no-check-md5        Do not check MD5 sums when comparing files for [sync].
                        Only size will be compared. May significantly speed up
                        transfer but may also miss some changed files.
  -P, --acl-public      Store objects with ACL allowing read for anyone.
  --acl-private         Store objects with default ACL allowing access for you
                        only.
  --acl-grant=PERMISSION:EMAIL or USER_CANONICAL_ID
                        Grant stated permission to a given amazon user.
                        Permission is one of: read, write, read_acp,
                        write_acp, full_control, all
  --acl-revoke=PERMISSION:USER_CANONICAL_ID
                        Revoke stated permission for a given amazon user.
                        Permission is one of: read, write, read_acp, wr
                        ite_acp, full_control, all
  --delete-removed      Delete remote objects with no corresponding local file
                        [sync]
  --no-delete-removed   Don't delete remote objects.
  -p, --preserve        Preserve filesystem attributes (mode, ownership,
                        timestamps). Default for [sync] command.
  --no-preserve         Don't store FS attributes
  --exclude=GLOB        Filenames and paths matching GLOB will be excluded
                        from sync
  --exclude-from=FILE   Read --exclude GLOBs from FILE
  --rexclude=REGEXP     Filenames and paths matching REGEXP (regular
                        expression) will be excluded from sync
  --rexclude-from=FILE  Read --rexclude REGEXPs from FILE
  --include=GLOB        Filenames and paths matching GLOB will be included
                        even if previously excluded by one of
                        --(r)exclude(-from) patterns
  --include-from=FILE   Read --include GLOBs from FILE
  --rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)
                        instead of GLOB
  --rinclude-from=FILE  Read --rinclude REGEXPs from FILE
  --bucket-location=BUCKET_LOCATION
                        Datacentre to create bucket in. As of now the
                        datacenters are: US (default), EU, us-west-1, and ap-
                        southeast-1
  --reduced-redundancy, --rr
                        Store object with 'Reduced redundancy'. Lower per-GB
                        price. [put, cp, mv]
  --access-logging-target-prefix=LOG_TARGET_PREFIX
                        Target prefix for access logs (S3 URI) (for [cfmodify]
                        and [accesslog] commands)
  --no-access-logging   Disable access logging (for [cfmodify] and [accesslog]
                        commands)
  -m MIME/TYPE, --mime-type=MIME/TYPE
                        Default MIME-type to be set for objects stored.
  -M, --guess-mime-type
                        Guess MIME-type of files by their extension. Falls
                        back to default MIME-Type as specified by --mime-type
                        option
  --add-header=NAME:VALUE
                        Add a given HTTP header to the upload request. Can be
                        used multiple times. For instance set 'Expires' or
                        'Cache-Control' headers (or both) using this options
                        if you like.
  --encoding=ENCODING   Override autodetected terminal and filesystem encoding
                        (character set). Autodetected: UTF-8
  --verbatim            Use the S3 name as given on the command line. No pre-
                        processing, encoding, etc. Use with caution!
  --list-md5            Include MD5 sums in bucket listings (only for 'ls'
                        command).
  -H, --human-readable-sizes
                        Print sizes in human readable form (eg 1kB instead of
                        1234).
  --progress            Display progress meter (default on TTY).
  --no-progress         Don't display progress meter (default on non-TTY).
  --enable              Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --disable             Enable given CloudFront distribution (only for
                        [cfmodify] command)
  --cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for
                        [cfcreate] and [cfmodify] commands)
  --cf-remove-cname=CNAME
                        Remove given CNAME from a CloudFront distribution
                        (only for [cfmodify] command)
  --cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (only
                        for [cfcreate] and [cfmodify] commands)
  --cf-default-root-object=DEFAULT_ROOT_OBJECT
                        Set the default root object to return when no object
                        is specified in the URL. Use a relative path, i.e.
                        default/index.html instead of /default/index.html or
                        s3://bucket/default/index.html (only for [cfcreate]
                        and [cfmodify] commands)
  -v, --verbose         Enable verbose output.
  -d, --debug           Enable debug output.
  --version             Show s3cmd version (1.0.0) and exit.
  -F, --follow-symlinks
                        Follow symbolic links as if they are regular files

Mas não diz como verificar se o arquivo foi enviado e remover os enviados. Devo verificar via MD5 e excluir localmente por algum script shell ?

    
por Valter Silva 09.05.2013 / 15:07

3 respostas

1

Depois de algum tempo, consegui desenvolver um código em bash , que verifica o md5sum dos arquivos s3 e meus local e remove os arquivos local que já estão em amazon s3 :

#!/bin/bash
datacenter="amazon"
hostname='hostname';
path="backup/server245"

s3='s3cmd ls --list-md5 -H s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/'

s3_list='echo "$s3"|awk {'print $4" "$5'} | sed 's= .*/= =''

locally='md5sum /"$path"/*.gz';
locally_list=$(echo "$locally" | sed 's= .*/= =');
#echo "$locally_list";

IFS=$'\n'
for i in $locally_list
do
  #echo $i
  locally_hash='echo $i|awk {'print $1'}'
  locally_file='echo $i|awk {'print $2'}'

  for j in $s3_list
  do
    s3_hash=$(echo $j|awk {'print $1'}); 
    s3_file=$(echo $j|awk {'print $2'});

    #to avoid empty file when have only hash from folder
    if [[ $s3_hash != "" ]] && [[ $s3_file != "" ]]; then 
      if [[ $s3_hash == $locally_hash ]] && [[ $s3_file == $locally_file ]]; then
        echo "### REMOVING ###";
        echo "$locally_file";
        #rm /"$path"/"$locally_file";
      fi
    fi
  done
done
unset IFS
    
por 23.05.2013 / 18:49
1

FWIW, eu precisava fazer algo semelhante e escrevi o seguinte script bash. O que isso faz é:

  1. obtém uma lista de arquivos em um diretório com mais de $ MINUTES minutos usando find
  2. usa lsof para determinar se o arquivo está aberto (isso pode não ser verdade se o arquivo for digitado aberto por um editor)
  3. usa s3cmd para copiar o arquivo em um intervalo S3.
  4. compara as somas MD5 no arquivo remoto no S3 e no local. Se eles fizerem check-out, exclua o local.

-

#!/bin/bash
MINUTES=60
TARGET_DIR="s3://AWSbucketname/subfolder/'hostname -s'/"
LOCAL_DIR="/path/to/folder"
FILES=()

echo ""
echo "About to upload files in $LOCAL_DIR up to S3 folder:"
echo "    $TARGET_DIR"
echo "Then delete if MD5 sums line up."
echo "Starting in 5 seconds..."
sleep 5

cd $LOCAL_DIR

# Throw the list of files that the find command gets into an array
while IFS= read -d $'
#!/bin/bash
MINUTES=60
TARGET_DIR="s3://AWSbucketname/subfolder/'hostname -s'/"
LOCAL_DIR="/path/to/folder"
FILES=()

echo ""
echo "About to upload files in $LOCAL_DIR up to S3 folder:"
echo "    $TARGET_DIR"
echo "Then delete if MD5 sums line up."
echo "Starting in 5 seconds..."
sleep 5

cd $LOCAL_DIR

# Throw the list of files that the find command gets into an array
while IFS= read -d $'%pre%' -r file ; do
    FILES=("${FILES[@]}" "$file")
done < <(find $LOCAL_DIR -name \*.wav -mmin +$MINUTES -print0)

# echo "${WAV_FILES[@]}"   # DEBUG

for local_file in "${WAV_FILES[@]}"
do
    # Check that the file in question is not open.
    # lsof returns non-zero return value for file not in use
    lsof "$local_file" 2>&1 > /dev/null
    if test $? -ne 0 ; then
        echo ""
        echo "$local_file isn't open. Copying to S3..."
        s3cmd -p put $local_file $TARGET_DIR
        # s3cmd -n put $local_file $TARGET_DIR # DEBUG - dry-run

        ## Now attempt to delete if the MD5 sums check out:

        remote_file=${local_file##*/}
        md5sum_remote='s3cmd info  "$TARGET_DIR$remote_file" | grep MD5 | awk '{print $3}''
        md5sum_local='md5sum $local_file | awk '{print $1}''
        if [[ "$md5sum_remote" == "$md5sum_local" ]]; then
          echo "$remote_file MD5 sum checks out. Deleting..."
          rm $local_file
        fi
    fi
done
' -r file ; do FILES=("${FILES[@]}" "$file") done < <(find $LOCAL_DIR -name \*.wav -mmin +$MINUTES -print0) # echo "${WAV_FILES[@]}" # DEBUG for local_file in "${WAV_FILES[@]}" do # Check that the file in question is not open. # lsof returns non-zero return value for file not in use lsof "$local_file" 2>&1 > /dev/null if test $? -ne 0 ; then echo "" echo "$local_file isn't open. Copying to S3..." s3cmd -p put $local_file $TARGET_DIR # s3cmd -n put $local_file $TARGET_DIR # DEBUG - dry-run ## Now attempt to delete if the MD5 sums check out: remote_file=${local_file##*/} md5sum_remote='s3cmd info "$TARGET_DIR$remote_file" | grep MD5 | awk '{print $3}'' md5sum_local='md5sum $local_file | awk '{print $1}'' if [[ "$md5sum_remote" == "$md5sum_local" ]]; then echo "$remote_file MD5 sum checks out. Deleting..." rm $local_file fi fi done
    
por 15.12.2014 / 14:42
0

Da documentação oficial:

- delete-after (Executar exclusões após novos uploads [sync])

ou

- delete-after-fetch (Apague os objetos remotos após buscar o arquivo local (somente para os comandos [get] e [sync]).)

se você quiser sincronizar de remoto para local

link

    
por 10.08.2018 / 11:27