Ever since Randy Zwitch first introduced RSiteCatalyst, an Adobe Analytics R package, to the world, the Digital Analytics industry has had a fascination with programmatic data analysis. For a time, it was the hot topic to be caught discussing at conferences, the thought-leaders were blogging about it, and experts opined about it on their favorite podcasts.
Yet, as trendy as doing analysis in R was becoming within the digital analytics community, the increase in digital analytics maturity, that Randy discussed as a primary driver for why he created the package, didn’t seem to be there. Many recognized experts questioned the use of programmatic data analysis saying they really couldn’t identify a need for it in their daily work as a digital analyst — we will stick with using Excel, thank you very much.
“Digital Analysts, for as much as they talked about it, weren’t actually making use of programmatic data analysis tools at all!”
Here we are, at the start of 2016, and the Digital Analytics industry has made very little progress in advancing the initiative to add better analysis tools, with perhaps the exception of Ben Gaines at Adobe with his work on the Analysis Workspace tool, that Randy set into motion back in 2013.
So why is programmatic data analysis such a critical skill for digital analysts to add to their toolset?
- Increasing Data Volumes: I guess the simple answer is that Excel has very real limits when it comes to data size. As digital analysts we need to quickly move on from basing our insights on a single source of click-stream data. So as we introduce more robust datasets to our analysis, we will quickly discover that Excel is simply not cut out to deal with large datasets — when I say large data, I’m not talking massive datasets, a single year of sales transactional data can easily bring Excel to a halt. If for no other reason, this one should convince you as to why this is such a critical skill to have.
- It’s repeatable: One of the major issues with using Excel as a data analysis tool is the inherent lack of structure that a spreadsheet introduces. You can drop data wherever you want, create formulas that change with every cell, with little or no history for how a specific result was arrived at. If you can’t remember how you got there, is the analysis really correct? Programmatic tools such as R and Python take a much more structured, and scientific, approach to data analysis. Analysis code runs top to bottom with every step along the way written down, providing the analyst with a repeatable structure that builds confidence in the results.
- An ‘execute-explore’ workflow: Tools like Excel are great for dashboards and reporting but as an analysis tool they tend to be better at creating paralysis than insights. In comparison, programmatic analysis tools create higher-level thinking, you instantly start thinking of everything as a collection of data and as operations and functions that act upon that data. In addition, the very structure introduced through programmatic analysis encourages analysts to take a more explorative view of their analytics workflow, whereas with tools like Excel they may have been more encouraged to create reporting.
The reality is that when you have an industry that is as hot as the digital analytics industry has been for the last 10 years it can be difficult to push yourself to be at the top of your game because it’s so easy to get by, and honestly be very successful doing it, sticking with the status quo. However, you owe it to your employer, your client, and most importantly yourself, to always be learning. And not only learning but diligently investing in adding new knowledge and skills.
If you are interested in learning more about programmatic analysis, there are many fantastic online courses available, tons of great blogs, and many easy to follow books. I personally recommend the book Python for Data Analysis.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
by Wes McKinney