Do I need to be a programmer for a career in Data Science?
Priya, a budding data scientist, was upset when she was bombarded with programming-related questions in her recent job interview. “I spent the last two years working on various modelling techniques, but now I am being asked questions about Python? I would like to build my career in data science and not in application development,” she said, with genuine doubts about her choice. The interview was held by a leading unicorn in the startup space, and she had reasons to believe this practice was not an exception. After all, her friends had similar experiences in other setups.
This is a common dilemma faced by folks who are beginning their careers. What should young data scientists focus on — understanding the nuances of algorithms or faster application of them using the tools? Some of the veterans see this as an “analytics vs technology” question. However, this article agrees to disagree with this concept. We will soon discover the truth as we progress through the article. How should you build a career in data science?
Analytics evolved from a shy goose, a decade back, to an assertive elephant. The tools of the past are irrelevant now. Some of the tools lost market share, their demise worthy of case studies in B-schools. However, if we are to predict its future or build a career in this field, there are some significant lessons it offers.
The Journey of Analytics
A decade back, analytics primarily was relegated to generating risk scorecards and designing campaigns. Analytical companies were built around these services.
Their teams would typically work on SAS, use statistical models, and the output will be some sort of score -risk, propensity, churn etc. Its primary role was to support business functions. Banks used various models to understand customer risk, churn etc. Retailers were active in their campaigns in the early days of adoption patterns.
And then “Business Intelligence” happened. What we saw was a plethora of BI tools addressing various needs of the business. The focus was primarily in various ways of efficient visualizations. Cognos, Business Objects, etc. were the rulers of the day.
But the real change to the nature of Analytics happened with the advent of Big Data. So, what changed with Big data? Was the data not collected at this scale, earlier? What is so “big” about big data? The answer lies more in the underlying hardware and software that allows us to make sense of big data. While data (structured and unstructured) existed for some time before this, the tools to comb through the big data weren’t ready.
Now, in its new role, analytics is no more just about algorithmic complexity. It needs the ability to address the scale. Businesses wanted to understand the “marketed value” of this newfound big data. This is where analytics started courting programming. One might have the best models, but they are of no use unless you trim and extract clean data out of zillions of GBs of data.
This also coincided with the advent of SaaS (Software as a service) and PaaS (Platform as a service). This made computing power more and more affordable.
By now, there is an abundance of data clubbed with economical and viable computing resources to process that data. The natural question was – What can be done with this huge data? Can we perform real-time analytics? Can the algorithmic learning be automated? Can we build models to imitate human logic? That’s where Machine Learning and Artificial Intelligence started becoming more relevant.
What then is machine learning? Well, to each his own. In its more restrictive definition, it limits itself to situations where there is some level of feedback-based learning. But again, the consensus here is to include most forms of analytical techniques into it.
While the traditional analytics need a basic level of expertise in statistics, you can perform most of your advanced NLP, Computer vision etc. without any knowledge of their details. This is made possible by the ML APIs of Amazon/Google. For example, a 10th grader can run facial recognition on a few images, with little or no knowledge of Analytics. Some of the veteran’s question if this is real analytics. Whether you agree with them or not, it is here to stay.
The Need for Programming
Imagine a scenario where your statistical model output needs to be integrated with ERP systems, to enable the line manager to consume the output, or even better, to interact with it. Or a scenario where the inputs given to your optimization model change in real-time, and model reruns. As we see more and more business scenarios, it is becoming increasingly evident that embedded analytical solutions are the way forward. the way analytical solutions interact with the larger ecosystem is getting the spotlight. This is where the programming comes into the picture.
For an analytics solution to be scalable, technology is crucial. While Priya might be jolted by this revelation, she is one of the lucky ones to realize this early in the career. Now she has the tools to redraw her career.