
Python and the conda package manager: This is a Python-centric course, so the first thing we’ll need to do is install Python and a robust, data-science-appropriate package manager. To set ourselves up for this course (and hopefully our careers!), we’ll need to set up the following things: That will probably mean you’ll get a little annoyed at the fragility of many of these tools, and you may get frustrated spending hours trying to find a setting that got set wrong (though we’ll try to minimize these experiences!), but try to think of this time not as wasted, but instead as part of your data science education! What We’ll Be Setting Up ¶ So in this course, we’re going to address environment setup head-on.

Moreover, it means you may not know enough about how data science tools work to debug problems on your own when they come up.

For example, if the MIDS Python Bootcamp included a module on setting up Python environments instead of providing you with a clean virtual machine, you’d probably end up learning ~25% less programming!īut the problem with this approach is that if every course you take pursues this strategy, you may find that you don’t feel empowered to go do data science yourself when those clean VMs are taken away at the end of the semester.

But it is a skill that takes time and energy to learn, and so in most classes - which want to focus on teaching topics like statistical analysis or programming concepts - instructors choose to provide students with clean, ready-to-use environments so everyone can focus on those topics. Getting data science tools installed and working together is, for better or worse, a pretty core part of the day-to-day life of data scientists, and learning how to troubleshoot problems quickly is an important skill for being productive in the profession. Why deal with all the headaches of setting up your own environment, you may ask? Why not just use a cloud platform like Google Colab or a virtual machine with everything already set up? One of the major learning goals of this class is for you to be comfortable managing all the software and settings required for you to do data science on your own computer.

