As always with Open Source, you have a number of different options: the one I'll show you today is a user-land filesystem (aka FUSE) called s3fs. The project is quite active and provides actual support to its users :)
This how-to assumes that you already have an AWS account with EC2 instances and S3 buckets. If not, go to the AWS website and get started :)
As usual, I'll be using a vanilla Ubuntu 12.04 instance, let's 'ssh' into it and install what we need.
First, let's grab the s3fs sources (at the time of writing, the latest version is 1.71):
$ wget http://s3fs.googlecode.com/files/s3fs-1.71.tar.gz
Then, let's add all the development packages required to build it. I'm starting from a fresh instance, so this is really needed. If your VM is already populated with all these packages, you can skip this step.
$ sudo apt-get install make gcc g++ pkg-config libfuse-dev libcurl4-openssl-dev libxml2-dev
Now, let's build s3fs:
$ tar xvfz s3fs-1.71.tar.gz
$ cd s3fs-1.71
$ ./configure
$ make
$ make
$ sudo make install
The next step creates a file storing the AWS keys (insert your own!) required by S3 to allow your s3fs requests:
$ echo ACCESS_KEY_ID:SECRET_ACCESS_KEY/ > ~/.passwd-s3fs
$ chmod 400 ~/.passwd-s3fs
The last step is to create a mount point with the right ownership, as well as a cache directory:
$ sudo mkdir /mnt/s3
$ sudo chown ubuntu:ubuntu /mnt/s3
$ mkdir ~/cache
We are now ready to mount our bucket (again, use your own bucket name):
$ id
uid=1000(ubuntu) gid=1000(ubuntu)
$ s3fs -o uid=1000,gid=1000,use_cache=/home/ubuntu/cache myBucket /mnt/s3
$ mount
output removed for brevity
s3fs on /mnt/s3 type fuse.s3fs (rw,nosuid,nodev,user=ubuntu)
This worked. Let's copy a file and see what kind of performance we can get:
$ time cp /mnt/s3/6MegabyteFile .
real 0m0.572s
user 0m0.000s
sys 0m0.012s
About 10 Megabytes per second :-/ S3 is not famous for its speed and this is another proof. In this example, the EC2 instance and the S3 buckets are located in the same zone (eu-west). I'm not sure I want to find out what happens when they're not!
All the more reason to check that the cache works, then:
$ ls -l /home/ubuntu/cache/myBucket
-rw-r--r-- 1 ubuntu ubuntu 6196315 Jul 31 17:39 6MegabyteFile
$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.027s
user 0m0.000s
sys 0m0.020s
Yes, it does. Let's give it another shot:
Yes, it does. Let's give it another shot:
$ rm /home/ubuntu/cache/myBucket/6MegabyteFile
$ time cp -f /mnt/s3/6MegabyteFile .
real 0m0.787s
real 0m0.787s
user 0m0.000s
sys 0m0.012s
$ sudo umount /mnt/s3/
Being able to access your buckets locally is a great feature for testing, debugging and maybe some simple production use cases. I've always been a sucker for filesystems ("Everything Is A File", remember?) and this one is definitely cool. Try it out!
Hi,
ReplyDeleteThe trailing slash in the passwd-s3fs creation command line is misleading.
Other than that - excellent explanation!
Thank you!