1.4 Big Data Characteristics

The term Big Data is generally used to indicate a data set that is ‘massive’ in size, and therefore difficult to store, process and analyse with traditional computing resources. From the definition of Big Data that was presented in paragraph 1.1, the question remains how the term ‘massive’ can be defined. In other words, what elements make Big Data truly Big?

Although there is no universally accepted answer to this question, it is common practice to define Big Data through a number of key characteristics.[i] The most widely accepted characteristics of Big Data are denoted by the 4V model, which is depicted in figure 4.[ii] The 4V model considers the nature and necessity of Big Data by considering core properties of massive data sets: volume, velocity, variety and value.

  1. Volume – The volume of data refers to the inherent size of the data sets that need to be analysed and processed, which are now frequently larger than terabytes or petabytes. The sheer volume of the data requires distinct and different processing technologies than traditional storage and processing capabilities. Typically, the volume characteristic indicates that the data set of interest is too large to process with a regular laptop or desktop processor, and that specific Big Data technology is required to run the analysis. An example of a high-volume data set would be all credit card transactions on a day for a particular credit card company.
  1. Velocity – Velocity refers to the speed with which data is generated as well as the speed with which the data can be analysed or processed. The fact that data is generated at high speeds requires no further explanation, when you consider the amount of content that is posted on social media platforms in a single minute. In many cases, the value of Big Data is not only the ability of companies to analyze large data sets, but also to execute this within an acceptable time frame. Particularly with time-sensitive information, such as stock price information, the ability to process and analyse data quickly can provide a competitive advantage.
  1. Variety – Variety refers to the different types of data collected through sensors, smart phones or social media. Most of these data generating devices will capture data in different formats. A smartphone, for example, will capture data into a variety of different formats. Messages could be stored as text files, photos in jpeg format, and videos will be converted to an mp4 format. The ability to analyse different data formats is inherent to obtain value from data sets. The variety of different data types frequently requires distinct processing capabilities and specialist algorithms. A classification of different data types will be further discussed in chapter 1.6.
  1. Value – Value refers to perhaps the most important aspect of Big Data. A data set needs to contain value for the organization or individual that analyses the data. Value refers to the process of discovering patterns or information that can lead to actionable insights. The different ways in which organizations can retrieve value from Big Data sets was discussed in chapter 2.

Data that is characterized by high volume, velocity, variety and value must be processed with advanced technological solutions to reveal meaningful information. Consequently, data that meets these criteria is considered Big Data.

Big Data Characteristics

Figure 4: The 4v Model of characteristics of Big Data

[i] Zikopoulos, P.C., Deroos, D. and Parasuraman, K., 2013. Harness the power of big data: The IBM big data platform. McGraw-Hill,.

[ii] Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A. and Khan, S.U., 2015. The rise of “big data” on cloud computing: Review and open research issues. Information systems, 47, pp.98-115.