A Shiny New Python Data Science Sandbox in 30 Minutes Or Less

Home//A Shiny New Python Data Science Sandbox in 30 Minutes Or Less

This post will give beginners a full walkthrough to go from nothing to a fully functional linux/python/pandas/scikit-learn environement with jupyter as a front end. For exploratory work, I really like this stack. My native OS is Windows, but since we’re using VMs I would imagine the setup for OS X is very similar and probably won’t need any modification (other than steps for configuring the VM). If you have a solid internet connection, we should be able to get this all done in under 30 minutes startiiiinnnnnng NOW…

1. Download an Ubuntu Desktop version of your choice. I like 14.04. 16.04 is probably great too.

2. Download VMware Workstation Player if you’re on Windows, or whatever you use for virtual machines if you’re on Mac (VirtualBox is a common option). If you’re on Linux, two thumbs up: you’re totally rad and you don’t need to worry about this step, and you probably know how to do all this already and why are you reading this?


(If you’re on VirtualBox, you’ll need to sort out your own settings for this part. If anyone wants to write a walkthrough for that I’ll be happy to post it here)

3. Create a new VM from the Ubuntu .iso downloaded in the first step using whatever settings are appropriate for your machine. The following screenshots will walk you through the settings I use, but you’ll likely want to adjust the CPU, RAM, and disk based on whatever machine you’re working on:

Create a new VM:

Select the .iso file:

Set your name, username, and password:

Set the VM name and file location:

Set the disk size and whether or not it’s split into multiple files on the host system:

Customize the hardware settings:


Go through the Ubuntu installation (defaults are fine):

All done, ready to rock…

4. Ubuntu updates and utilities. First, you probably want to install VMWare powertools so that you can cut and paste from host to guest, and adjust the size of the Ubuntu desktop automatically when you resize the VMWare window. Also, right off the bat you’ll see that the Software Updater has a list of updates that can be installed. You can update everything, nothing, or go through them and cherrypick the ones you want.

Note: I always use Python 2, which is installed by default. If you’re dead set on using Python 3 I’m sure this will all work out just fine if you choose that route. If not, and you figure out any necessary deviations please leave a comment!

Ok, at this point you’re hopefully up and running on Ubuntu in a VM, so next we’ll install the environment. This will all be done from the Terminal…

1. Download the anaconda installer, unzip, and install (default values are fine, except that final one concerning prepending the Anaconda install directory to the PATH – choose yes for that)

**(For Python 3, use “Anaconda3-4.0.0-Linux-x86_64.sh)

2. Update anaconda

3. Create an environment. Anaconda environments are great. You can have completely different set of packages and versions in each environment so that dependencies of different development projects you work on don’t ever conflict.

After the name you assign the environment is a list of packages that you want to have installed. This is the standard list I use, but you can include anything else you’d like.

4. Activate the new environment

You’ll see two different environments at this point. The root and the environment you just created. Eventually you’ll likely have lots of them. The asterisk shows which one is currently active. To switch to the new environment we just created, use the source command:

When you’re done with this environment, you can either switch to a different one using the previous command with a different environment name, or alternately you can call

to go back to the root environment.

5. You’re ready to rock! Start up Jupyter, create a new notebook, then you’re ready start data science-ing


How fast was that? Any problems? Add a comment and we’ll see if we can patch any holes in these setup steps.

2017-01-30T11:40:40+00:00 June 24th, 2016|5 Comments
  • Nice post. Windows is such a pain! Setting up a development environment is much more difficult than it should be. I actually moved from windows to Linux a few years ago and have never turned back.

    I’d recommend for people to just marke the move if they are involved with programming and such in any way, but there are valid reasons why some people have to or prefer to use windows, so for those who fit under that category, this could be a good alternative.

  • ultimateposeur

    Not sure what advantages this would have over an Anaconda installation in Windows itself. https://www.continuum.io/downloads#_windows

  • Another option is to use Vagrant to automatize those steps.
    You can get a similar workspace installing Vagrant and typing those lines:

    vagrant box add continuumio/anaconda2
    mkdir anaconda2
    cd anaconda2
    vagrant init continuumio/anaconda2
    vagrant up
    vagrant ssh
    conda info

    more info at https://docs.continuum.io/anaconda/vagrant
    Vagrant can be download for free here: https://www.vagrantup.com/

    • Never used it before but Vagrant looks like an excellent option too. Thanks for the heads up!

  • Many people who are new to Data Science are windows users who don’t want to change their main OS in which cases tutorials like these are extremely helpful , This is a really quick and easy tutorial to get ubuntu running on your windows machine.