Depois de algumas pesquisas, eu criei o seguinte:
Arquivo criado modify_ntp_config.sh
no S3:
#!/bin/bash
set -eEu
ntp_config_file="${1:-example_config}"
echo "Removing 'server 10.*' entries from \"$ntp_config_file\""
sudo sed -i -e '/server 10.*/d' $ntp_config_file
echo "Reinitialize ntp"
sudo service ntpd stop
sudo ntpdate -s time.nist.gov
sudo service ntpd start
Copiado este arquivo para s3:
$ aws s3 cp /var/tmp/modify_ntp_config.sh \
s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh
E, em seguida, usando aws-tools
:
aws emr create-cluster --name "..." [...cluster create options ...] \
--steps \
Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://<region>.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh","/etc/ntp.conf"]
Resulta na seguinte saída de log (copiada de s3 para localdisk)
$ aws s3 cp --recursive s3://<s3-bucket-name>/log/<cluster-id>/steps/<step-id>/ /var/tmp/5HKO7
download: s3://[...]/stdout.gz to ../../var/tmp/5HKO7/stdout.gz
download: s3://[...]/stderr.gz to ../../var/tmp/5HKO7/stderr.gz
download: s3://[...]/controller.gz to ../../var/tmp/5HKO7/controller.gz
$ zcat /var/tmp/5HKO7/stdout.gz
Downloading 's3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh' to '/mnt/var/lib/hadoop/steps/[...]/.'
Removing 'server 10.*' entries from "/etc/ntp.conf"
Reinitialize ntp
Shutting down ntpd: [ OK ]
Starting ntpd: [ OK ]
$ zcat /var/tmp/5HKO7/stderr.gz
Command exiting with ret '0'
NOTA: Outra forma seria usá-lo em um cluster emr já em execução usando aws emr add-steps
.
$ aws emr add-steps --cluster-id "j-<emr_cluster_id>"\
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://<region>.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://<s3-bucket-name>/data/scripts/modify_ntp_config.sh","/etc/ntp.conf"]