Project Description

Different data types

In the previous article, we identified the four different forms of pattern identification in data sets and their differences. Today’s discussion will focus on different types of data structures.

In computer science, a data structure is a particular way of organizing and storing data in a computer such that it can be accessed and modified efficiently. More precisely, a data structure is a collection of data values, the relationships among them, and the functions for operations that can be applied to the data.

For the analysis of data, it is important to understand that there are three common types of data structures:

  1. Structured data
  2. Unstructured data
  3. Semi-structured data

Structured data

Structured data is data that adheres to a pre-defined data model and is therefore straightforward to analyze. Structure data conforms to a tabular format with relationship between the different rows and columns. Common examples of structured data are Excel files or SQL databases. Each of these have structured rows and columns that can be sorted.

Unstructured data

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in structures databases. Common examples of unstructured data include audio, video files or No-SQL databases.

Semi-structured data

Semi-structured data is a form of structured data that does not conform with the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contain tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure. Examples of semi-structured data include JSON and XML are forms of semi-structured data.

Meta data

A last category of data type is metadata. From a technical point of view, this is not a separate data structure, but it is one of the most important elements for Big Data analysis and big data solutions. Metadata is data about data. It provides additional information about a specific set of data. In a set of photographs, for examples, metadata could describe when and where the photos were taken. The metadata then provides fields for dates and locations which, by themselves, can be considered structured data. Because of this reason, metadata is frequently used by Big Data solutions for initial analysis.

Most ‘traditional’ data analytics techniques (including most Business Intelligence solutions) have the ability to process structured data. Processing unstructured or semi-structured data is however much more complex and requires distinct for analysis.

The four different types of data

Figure 1: The four different types of data

To learn more about Big Data, visit our Big Data Knowledge Base. For more information, contact us at info@bigdataframework.org or drop us a message in the chatbox.