The amount of data in our digital universe is skyrocketing, and the experts believe it will double in size every two years until at least 2020. Yet this data seems to be drowning organizations and 80% of all data projects are currently failing. This means that organizations who successfully use their data are in possession of a major competitive advantage. But it won’t last, and eventually, everyone will be expected to have broad data literacy skills.
The State of Data Today
Knowing how to utilize data talent within an organization has been a major pain-point for businesses. This new problem is posed by the vast, uncharted waters of data and the ever-expanding tech industry. The momentum behind data will not be slowing anytime soon, as population growth and innovation continue at an incredible pace. Through innovation, the price of all the technology (storage, connectivity, computation) required to do data science has plummeted since its first commercial introduction in 1956, when IBM’s RAMAC boasted 5MB of processing ability for a bargain $30K a month.
So with access to all this new data and the tools necessary to utilize it, why are 80% of current data projects failing? There are three primary causes. The first is many organizations lack a data culture (see below). Second, data is often compiled in separate departmental silos, with each department having a different baseline about what their data is and how to use it. Third, statistics has historically been one of the most poorly taught subjects in higher ed, leading directly to the struggles many face when interpreting data (learn to interpret data like a pro here). While data skills will eventually become a commodity over the next few years (like typing), the 20% of organizations currently succeeding with data hold a significant advantage over their competitors.
Data Scientists & Their Tools
The term “Unicorn” was coined to describe the Modern Data Scientist who is a perfect blend of technical skills like databasing and programming, and soft skills like domain expertise, communication, and visualization. The name “Unicorn” is fitting because although they do exist, this type of Data Scientist is incredibly hard to find. It’s common for hiring managers to chase these Unicorns, and a lot of current data failure is the result of poor hiring practice or improperly utilizing talent. Data science is a team effort that should be divided into four positions, requiring specialization and constant cooperation between team members:
- Data Detectives: These are your primary researchers. They incorporate traditional market research platforms like surveys (Qualtrics), qualitative interviews, and academic research tools like EBSCO/LexisNexis. This position requires a deep understanding of your market space, competitors, and how to get data that once processed and visualized (by others) will help DM’s and Stakeholders with their strategy.
- Data Guardians: These are the members of your team responsible for centralizing all the data. This role typically involves building a database from scratch (Architecting), as well as knowing what all the stored data means and protecting it from internal and external threats (Stewardship). Tools most frequently used by Data Guardians are mostly SQL based and require a lot of hardware, most of which are increasingly cloud-based.
- Data Wranglers: Typically programmers who automate processes like pulling data from API’s and databases, scraping public information, and running advanced analytical models. Their skill set, also known as a “stack,” usually includes programming languages like Linux, Apache Server, C++ among others, and if they’re stats savvy they’ll also use R, Python, Spark, and Hadoop.
- Data Storytellers: This is currently the most in-demand type of Data Scientist. Data Storytellers can verbally and visually translate complex data and analytics into simple information through storytelling. This role is extremely critical. Without someone who’s able to convey your team’s findings to decision-makers, all the analysis you’ve done is useless. The Data Storyteller’s tools include Tableau, Power BI, Powerpoint, and other visualization/presentation programs.
Data Science Team & Their Cost
Everyone knows Data Science isn’t cheap. In regard to hiring practices, there are significant differences between businesses that generate $10 million + of annual revenue and those that don’t. The right data science team for you also depends on factors like the technical nature of your product, stage of product life cycle, B2B vs. B2C, state of competition, and of course revenue. In all cases, your Data Science team should report to a C-Level executive, regardless of annual revenue and organization size. Excessive middle management will hurt your Data Science team’s chance of making an impact on your organization.
For organizations with less than $10 million in revenue and already have a lot of well-organized, accessible data, the first hires you consider should be a Data Wrangler ($130k) and Data Storyteller ($90k). The Data Wrangler will combine deep programming skill with a knowledge of statistics to obtain the data you’re looking for, while the Data Storyteller will present this information with clear, actionable insights. If your organization hasn’t built up much data yet, start by hiring a Data Guardian ($85k) to archetype the system, test for security, and establish protocols. Then, hire a Data Storyteller to start getting a general understanding of what your data is saying. Finally, add a Data Wrangler to start bringing more streams of information into your organization’s data pool. Your team’s total annual cost including training, tools, and overhead should add up to around $403k (based on average salaries). The Data Science team within your organization has the potential to be one of the most impactful because the insights they discover have the ability to help improve every other part of your organization.
If your organization exceeds $10 million in revenue, your building process will be slightly different. Instead of trying to continually scale up the same team, build teams of 5 people that can be replicated for scale. This 5 member team should include a Data Detective($60k), Data Guardian ($85k), Data Storyteller ($90k), and two Data Wranglers ($130k). After one team reaches its work capacity, build another and put a managing director over them. One director should be able to oversee around 5 of these teams, and each individual team can typically work on 2-3 different projects simultaneously. This teams annual cost (not counting management) including tools, training, and overhead should amount to somewhere near $666k. A very common word of advice from current data scientists is to focus on smaller teams where strong collaboration can fully utilize specialization. The “Two Pizza Rule” created by Amazon CEO Jeff Bezos is an excellent policy for your data team to follow. This rule advises against creating teams or having meetings where two pizzas won’t feed everyone in the room and has been proven to increase productivity.
Building A Data Culture
Hiring is the clear first step in building a data culture, but it’s important to realize your work is not done yet. A well-functioning data culture should combine people with software, perform fast and agile sprint cycles, have all team members possess a descriptive understanding of project conclusions, have clear strategic knowledge in analyzing data, and use clear visualizations. Building a data culture starts at the top, and leaders need to understand the different ways to define data and know what approaches to take when utilizing it.
Data is typically defined by:
- Origin – Where it came from, comprised of Survey, Observation, and Experiment. Observational Data is usually from machine-to-machine or human-to-machine interactions now.
- Totality – Sample vs. Census
- Scope – Time Series vs. Cross-Sectional
- Measurement – Nominal, Ordinal, Interval, or Ratio
- Freshness – Saliency & Shelf Life of Data
A common misconception about data is that it has to be big to be useful. Utilizing “small” data can have a huge impact on your organization. Start with qualitative data then use what you’ve learned to find larger quantitative data. To utilize all the data at your disposal, break down departmental silos and start connecting insights across data sources regardless of their size. It’s also important to realize that data is not infallible, and is meant to compliment your intuition not replace it. This is especially true for repetitive tasks that can be automated for better efficiency.
Finally, a mandatory part of creating a data culture at any organization will include re-education. Simply hiring a data team won’t bring the results you’re hoping for unless the rest of your employees are involved in this culture change as well. There are many different ways to accomplish this, but there are three common first steps. They include establishing a baseline through activities like “Lunch and Learns”, diving deeper by hosting seminars with the employees who could utilize deeper data support, and having decision makers establish protocols, along with providing a central environment and broad access.
Struggling to pull meaningful insights from data? See our post on going from Analytics To Actionable Insights.