Understanding the different roles in Big Data Organizations

The most important aspect of Big Data are the people involved. While many organizations have plans to turn their data into value, they sometimes spend too much time on the ‘data’ and not enough time on the ‘people’ side of the equation. In the short period of time that data science is now part of professional enterprises, a number of new roles have formed that are essential to the success of Big Data. Each of these roles contributes to the Big Data team of the Big Data Centre of Excellence that was explained in one of our previous posts.

Big Data Analyst

The Big Data analyst is a role that involves acquiring, processing and summarising the information from Big Data sets in order to discover business value. Unlike data scientists, data analysts are more generalists.

Big Data analysts are expected to know R, Python, HTML, SQL, C++, and Javascript. They need to be more than a little familiar with data retrieval and storing systems, data visualisation and data warehousing using ETL tools, Hadoop-based analytics, and Business Intelligence concepts. These persistent and passionate data miners usually have a strong background in math, statistics, machine learning, and programming.

Big Data analysts are involved in data crunching and data visualisation. If there are requests for data insights from stakeholders, data analysts have to query databases. They are in charge of data that is scraped, assuring the quality and managing it. They have to interpret data and effectively communicate the findings.

Big Data Scientist

The Big Data scientist is a role that involves the development and deployment of algorithms and statistical models in order to predict future outcomes that provide business value based on Big Data sets. In recent years, the data scientist role has grown tremendously in popularity and there is significant demand for this job role.

The Big Data scientist job role is a senior role that requires deep understanding of algorithms and data processing operations. People in this role are expected to be experts in R, SAS, Python, SQL, MatLab, Hive, Pig, and Spark. Data scientists typically hold higher degrees in quantitative subjects such as statistics and mathematics and are proficient in Big Data technologies and analytical tools.

The role of a data scientist is not only about data crunching. It’s about understanding business challenges, creating some valuable actionable insights to the data, and communicating their findings to the business. Additionally, the role of the data scientist requires creative thinking and problem solving skills that are necessary to design, develop, and deploy algorithms that can retrieve value from Big Data.

Big Data Engineer

The Big Data engineer is a role that designs, builds and manages the underlying IT infrastructure that is required to obtain value from Big Data sets. Data engineers ensure that an enterprise’s Big Data ecosystem is running without glitches for data analysts and data scientists to carry out the analysis.

Big Data engineers are computer engineers who must know Pig, Hadoop, MapReduce, Hive, MySQL, Cassandra, MongoDB, NoSQL, SQL, Data streaming, and programming. Data engineers have to be proficient in R, Python, Ruby, C++, Perl, Java, SAS, SPSS, and Matlab. Other must-have skills include knowledge of ETL tools, data APIs, data modelling, and data warehousing solutions. They are typically not expected to know analytics or machine learning.

Big Data engineers develop, construct, test, and maintain highly scalable data management systems. Unlike data scientists who seek an exploratory and iterative path to arrive at a solution, data engineers look for the linear path. Data engineers will improve existing systems by integrating newer data management technologies. They will develop custom analytics applications and software components. Data engineers collect and store data, do real-time or batch processing, and serve it for analysis to data scientists via an API.

Other Big Data roles

Since the domain of Big Data is rapidly growing, many more Big Data roles exist. Examples include Machine Learning Engineer, MIS Reporting Executive, Big Data solutions specialist, etc. Most of these roles require expertise of a specific Big Data platform or tool. The most essential roles to operate any Big Data Center of Excellence can however be summarised by the three roles that were discussed in the section above.