Tuesday, January 30, 2018

A quick guide to running deep learning applications on Amazon AWS virtual server supercomputer

Deep learning is serious business and if you want to work on sizeable problems you're going to need more hardware than you probably have at home. Here is how to use Amazon Web Services in order to be able to upload your code and datasets to an online server which will run it for you and then let you download the results. It's basically a virtual computer that you use over the Internet by sending commands.

First of all, when I got started I was initially following this video on how to use AWS:
https://youtu.be/vTAMSm06baA?t=1476

AWS Educate

If you're a student then you can take advantage of AWS Educate where if you are accepted you will receive 100 free hours of runtime to get started. 100 hours in deep learning will not last long but it will allow you to experiment and get your bearings with AWS without worrying about the bill too much. It will also be a few days before you get a reply. Here's the link:
https://aws.amazon.com/education/awseducate/

Get the applications you need

If you're on Windows, then before creating your server you should install WinSCP and PuTTY in order to be able to upload your data and run your programs. If you're not on Windows then you can use the terminal with ssh to do this.

Create an account

Start by visiting this link and making an account:
https://aws.amazon.com/

You need to provide your credit card details before starting. You will be charged automatically every month so make sure to keep an eye on your usage as you pay for the amount of time you let the server run per hour.

Enter the AWS console

Next go into the AWS console which is where you get to manage everything that has to do with virtual servers:
https://console.aws.amazon.com/

Note that there is the name of a city at the top such as "Ohio". This is to say where you want your servers to be in. Amazon has servers all around the world and you might prefer one region over another, mostly to reduce latency. Different regions also have different prices, so that might take priority in your decision. Ohio and W. Virginia seem to be the cheapest. See here for more information:
https://www.concurrencylabs.com/blog/choose-your-aws-region-wisely/

Keep in mind that each region has its own interface so that if you reserve a server in one region, you can only configure your server when that region is selected. You will not see a list of all your servers in any region. So make sure you remember which regions you use.

After having chosen a region, next go to Services and click on EC2:
https://console.aws.amazon.com/ec2/

Create a virtual server

Click on the big blue button called "Launch Instance" in order to create your virtual server. You can now see a list of AMIs which are preconfigured virtual servers that are specialized for some kind of task, such as deep learning. You're going to copy an instance of one of these and then upload your files to it. Look for the AMI called "Deep Learning AMI (Ubuntu)" which contains a bunch of deep learning libraries together with CUDA drivers. Click on "Select".

This is where you choose the computing power you want. The better it is, the more expensive. If you just want to see how it works then choose a free one which says "Free tier eligible". If you want to get down to business then choose one of the "GPU Compute" instances such as "p2.xlarge" which has 12GB of GPU memory. The "p2.xlarge" costs about 90 cents per hour (which starts from the moment you create the instance, not when you start running your programs so it also includes the time it takes to upload your data).

If this is your first time creating a powerful instance then you will first need to ask Amazon to let you use it (this is something they do to avoid people hogging all the resources). Under "Regarding" choose "Service Limit Increase", under "Limit Type" choose "EC2 Instances", and under "Region" choose the region you selected previously. You also have to say something about how you'll use it. After being granted access you can then continue from where we left off.

After ticking the check box next to your selected computer power, click on "Next: Configure Instance Details".

Leave this next step with default settings. Click on "Next: Add Storage".

This is where you choose your storage space. You will need at least 50GB of space in order to store the deep learning AMI but you will need additional space for your datasets and results. Keep in mind that you have to pay per GB per month for storage. If you need frequent access to the hard drive (such as loading minibatches from disk) then you'll need to use an SSD drive which costs about 10 cents per GB per month. Otherwise you can use a magnetic drive which costs about 5 cents per GB per month.

Next click on "Next: Add Tags". This is where you make up some informative tags for your virtual server in order to differentiate it from other virtual servers. You can do something like "Name=Deep learning". If you only have one server then this is not important. Click on "Next: Configure Security Group".

This is where you create some firewall rules to make sure that only your computer has access to the virtual server, even if someone else knows the password. It might be the case that this doesn't work for you and you won't be able to connect at all even from your IP address in which case choose "Anywhere" as a source which will not make any restricts based on IP. Click "Review and Launch".

As soon as you click on "Launch" at the bottom the clock starts ticking and you will start being billed. If you've got the AWS Educate package then you will have 100 free hours but they will start being used up. You can stop an instance any time you want but you will be billed for one hour as soon as it starts, even if you stop it immediately.

If this is your first time using a server in the selected region (the place you want your server to be in) then you will be asked to create a cryptographic private key which is a file that will be used instead of a password to connect to the virtual server. If you're on Windows then you need to use a program that comes with PuTTY called PuTTYgen which converts the .pem file that Amazon sends you to a .ppk file that PuTTY can use. Follow the section called "To convert your private key" in the following link to do this:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html?icmpid=docs_ec2_console

Connecting to your virtual server

Now that we have our server ready we need to connect to it, upload our stuff, and get it to run. We're also on the clock so we need to hurry. Start by visiting your list of server instances, which can be found in the side bar when you're in the EC2 dashboard:
https://console.aws.amazon.com/ec2/v2/home#Instances:sort=instanceId

Your server should be running. You can stop it by right clicking it and under "Instance state" choosing "Stop". This will stop it from charging you every hour but you will be charged for at least one hour as soon as you start.

Clicking on a server and then clicking on "Connect" will give you information to access the server using PuTTY or ssh.

If you're using Windows, you connect to your server using PuTTY, which you can find out how using this guide:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html?icmpid=docs_ec2_console
After connecting using PuTTY you can transfer the configuration to WinSCP by opening WinSCP, going on Tools, and choosing "Import sites". Now you can connect using WinSCP in order to upload and download files as well.

If you're not using Windows then you should use the terminal and follow this guide instead:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html
You can upload stuff from Linux by using FileZilla or by using the normal directory explorer and clicking CTRL+L to enter an FTP location.

Upload all the files you need including datasets and scripts. You can zip the files before uploading them and then unzip them on the server using the "unzip" Linux command.

DO NOT INSTALL LIBRARIES WITH PIP YET! See the next section first.

Using your virtual server

Finally we are close to start running our scripts. But we still need to do two more things first.

First of all, look at the top of the PuTTY or terminal which tells you how to activate different Anaconda environments. These are Python environments with different libraries available which are connected to CUDA so that you will be able to run on the GPU. In PuTTY or terminal enter "source activate <environment name>". Remember this line to use later.

Now you can install anything you want using pip.

Secondly as soon as you close PuTTY or terminal all running processes will stop (but the instance will still be running so you'll still be paying). What you need to do is to use the application called "screen" which will keep everything running on its own. See this link:
https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/

Now you'll need to activate the environment again because screen creates a new session which is disconnected from the previous one.

Hoorray!

You're done! You can now start using your server. When you're ready, make sure to stop it and you can even terminate it but that will completely delete the server with all data so only do it when you're really ready, otherwise you'll be wasting money reuploading and rerunning everything.

You can see your current bill by clicking on Services and going on Bill. This will show you your current month's bill as well as previous bills as well. You can ever get a daily break down and forecasts.
https://console.aws.amazon.com/billing/home