Download a S3-served website whose index listing is exposed
02 October 2022
Tweet Follow @hazula
Tweet Follow @hazula
Suppose there’s a website you’d like to mirror locally that you can tell is served via an AWS S3 website. Here’s a way to download it if the S3 index is exposed.
If the site is http://flaws.cloud/, and it is served by an S3
bucket named flaws.cloud
, then you can visit the S3 bucket at
http://flaws.cloud.s3.amazonaws.com/.
1) Create a new directory, let’s call it flaws-cloud
, and cd
into it.
2) Create a script run.sh
:
cat << 'EOF' > run.sh
#!/bin/sh
BUCKET_NAME="flaws.cloud" exec ./download-s3-site.sh
EOF
chmod u+x run.sh
and make it executable chmod u+x run.sh
.
3) In the same directory, paste the script below as ‘download-s3-site.sh
’ and make it executable.
4) You can then download the site by running ./run.sh
.
I separated the configuration (run.sh) from the actual script (download-s3-site.sh) just to make it more easily reusable for me.
cat << 'EOF' > download-s3-site.sh
#!/bin/bash
set -eou pipefail
if [ -z "${BUCKET_NAME:-}" ]; then echo "[FATAL] BUCKET_NAME not set" >&2 ; exit 1; fi
OVERWRITE="${OVERWRITE:-false}"
DEBUG="${DEBUG:-false}"
info() { echo "[INFO] $*" >&2 ; }
debug() { if [ "$DEBUG" = "true" ]; then echo "[DEBUG] $*" >&2 ; fi ; }
download() {
local remote_filename local_filename
remote_filename="$1"
local_filename="$2"
if [ "$OVERWRITE" = "true" ] || [ ! -e "${local_filename}" ] ; then
curl \
--insecure \
--silent --show-error \
--location \
--create-dirs \
"https://${BUCKET_NAME}.s3.amazonaws.com${remote_filename}" \
--output "${local_filename}"
else
debug "Already downloaded ${local_filename}"
fi
}
download_file() {
local remote_filename
remote_filename="$1"
download "/${remote_filename}" "${remote_filename}"
}
download_index() {
download "" index.xml
}
download_files() {
ruby -rrexml/document -e 'REXML::Document.new(STDIN.read).root.each_element("//Key") {|elem| puts elem.text }' < index.xml \
| while read -r remote_filename; do
download_file "$remote_filename"
done
}
main() {
download_index
download_files
}
main
EOF
chmod u+x download-s3-site.sh
./run.sh
blog comments powered by Disqus