Why Python is better than R for Data Science careers?
New data scientists are all faced with a hugely important question: should I learn Python or R?
The question is so important because it takes many hundreds of hours to learn your first programming language. It’s impractical to try to learn both, especially when you’re just starting your career.
So which should you pick?
Based on my experience, I believe your career will benefit more if you choose Python.
In my opinion, Python is the better choice for a career in data science, especially if you’re just getting started
I’ll give you four reasons why I think Python is the better choice for your career, but I also want to be clear that I don’t think R is a bad choice.
Choosing R won’t negatively affect your job opportunities, and depending on your team, you might even be required to learn it. In fact, Facebook uses R for the analytic components of its internal survey tools, and the language is supported across all of our data science infrastructure.
That said, I believe that if you learn Python first, you will more quickly become productive as a practicing data scientist, and you’ll be better able to contribute to your team in important areas outside of statistical modeling.
So learning Python will enable you to deliver more impact for your company, and your career will benefit more as a result.
Reason #1: you’ll probably have to learn Python anyway
Most companies require their data scientists to do more than predictive modeling (ie machine learning). At the least, you’ll probably be required to maintain the data pipelines that feed your models, and those data pipelines will likely be built in Python.
The industry standard for pipelines today is the Python-based Airflow, and at Facebook we use an internal Python tool that’s substantially the same.
In fact, I’d estimate 100% of our data scientists at Facebook use Python every week, while maybe 10% or so actively use R.
So it’s probably more efficient for you to choose Python: while you can likely avoid using R once you’ve landed a job, it will be unlikely you can avoid using Python.
Reason #2: Python is easier to learn
The amount of time it takes to become employable is extremely important, especially if you’re self-studying outside of college.
Python has a strong reputation for being easy to learn. Having learnt both Python and R (although much more deeply in Python), I think that Python’s reputation is well deserved.
The benefits of Python’s easiness to learn are especially apparent when you start using language features beyond statistical modeling. Those features include things like packaging your projects for distribution, developing command line interfaces, modeling your data structures with ORMs like SQLAlchemy, among others.
Using Python will make it easier to become proficient with those features, and your career will benefit as a result.
Reason #3: Python has a larger community
Python is one of the most popular programming languages in the world, with a huge community on sites like stack overflow, kaggle, and even medium.
So when you inevitably encounter an issue you can’t solve on your own, you’ll be more likely to find people that have encountered it before you, asked for help, and received a solution.
That means you’ll spend less time debugging a compatibility issue with your system, and more time delivering code that drives impact for your company.
Reason #4: Deploying your models is easier with Python
Finally, you’ll likely reach the point in your career where you want to make your models available in real time to either end-users. To solve that problem you’ll need to build a REST-based web app, which is far easier to do with Python.
In fact, Python has some of the most popular web app frameworks in the world, namely Django and Flask. Your company’s internal deployment tools are much more likely to support those frameworks, and relatively unlikely to support R.
The popularity of those frameworks also means they are well supported by Platform-as-a-Service providers like Heroku, Amazon Lightsail, and many others. You’ll be able to publish your personal projects online for a fraction of the effort that it would take to deploy the same projects in R.
Best of all, if you’re lucky enough that your company uses a Python framework for its own products, learning Python means you’ll be dangerous enough to wire up your own in-app tracking. Being able to autonomously capture more features for your models will have a dramatic effect on the impact you can deliver.
Of course, all decisions have trade-offs, and choosing to learn Python instead of R is no different. Even though I believe Python is the better choice for a data science career, it’s worth considering the downsides that come with it.
For me, the largest downside is that Python doesn’t have an equivalent tool to Rstudio. The most comparable tool for Python is the Jupyter notebook, but I personally believe Rstudio is better because of its data exploration capabilities.
R is also highly popular in the academic community, so the documentation for packages in R is much more likely to reference the academic research directly. That documentation can be very useful for data scientists that work on the “cutting edge” of research.
But I don’t believe that the lack of an Rstudio equivalent is enough to negate the relative strengths of Python. And careers in data science academia are also much more rare, which makes the research-related strengths of R less relevant for the majority of data scientists.
So despite the strengths of R, I believe your career will benefit more if you choose to learn Python instead.
Finally, I think it’s worth mentioning again that I don’t think learning R is a bad choice, just that Python is more likely a better choice for your career. Depending on the specifics of your situation, it could make more sense for you to learn R instead.
Regardless of which language you choose, you shouldn’t feel that you can’t ever change your mind. All programming languages have a lot more similarities than differences: learning your second language is much easier than learning your first.
In fact, I had chosen to learn R first myself! So it’s hard for me warn too strongly against R, even though I would now recommend Python as the better choice for your career.