Rudyard Kipling, in his poem I Keep six honest serving-men wrote:
I keep six honest serving-men (They taught me all I knew); Their names are What and Why and When And How and Where and Who.
We can generate a list of questions. For example, what problem are we trying to solve? Why do we want o to solve this? How do we go about solving this problem? Where do we obtain the information or data? Who owns the data? When do we need the solution?
In part one of this writeup, I talked about your vision, and now you have a vision of questions you want to be answered. First, you have to determine if you want your data to give insight about the past, present or future or all.
Next, we will walk backwards from our vision and see how we can achieve it.
THE DATA SOURCES
Where does your data come from? For organisations, data comes from within the organisation (internal) or outside of the organisation (external)
Internal data sources
As a business, you may or may not be digital. If you are not, this may be an excellent time to start thinking about going digital. To be data-driven, you need to capture all your business processes and events digitally. These events could be your sales, purchases with dates and time, location, process capture and lots more.
Within your organisation, data can be found in the following systems; Customer Relationship Management (CRM), customer care systems, Enterprise Resource Planning( ERP), Supply chain management, billing and invoicing, lead management or sales force automation, campaign management, human resources, medical records, product management, contentment management web Analytics, process monitoring, fault monitoring, ticketing and workflow management, telematics, and machine data processing.
External data sources
These are data that your company does not generate. The data can be reference data for which an external company is the owner of such data. Or processes related to your organisation like your bank data from your bank, sale-related data from suppliers or partners.
- The London Stock Exchange (LSE) and New York Stock Exchange (NYSE) are organised marketplaces for issuing securities and then trading those securities via their members. Banks, Brokers or Dealers may obtain reference data for securities that are issued by the exchange. This data will then be used further on to make informed decisions.
- A pizza delivery service will have an internal system that captures its orders and availability and obtain data from external companies like Deliveroo who deliver the food to the customer.
Single source of truth
Most large organisations already capture their business process digitally and some hybrid (digital and manual). However, within some organisations, there is no single source of truth. Each department operates in silos, and there is no common consensus of the data. Organisations should standardise their data across all divisions.
Timeliness of Data
It is essential to know when the data should be available and when it becomes stale. For example, the bank of England raising interest rates in 2007 will not be helpful for business to take a decision now. It will be helpful If I want to analyse historical data to find patterns or develop a business model.
Type and Frequency of Data
It is essential to note the type of data you are capturing. Is it event-driven data, or is it process-driven data? What is the frequency of the data, and how often you wish to capture this for your analysis? Questions you should ask yourself is when does this data become irrelevant for my use case?
- Event – Weather data is relevant to you if you are in the hospitality business. To make confident decisions like planning and scheduling business assets and resources, this needs to be relevant for current usage and the near future. Last weeks data event is not relevant for the decisions you will make today. It may be relevant for the analysis of historical events.
- Event – COVID 19 event triggered a strong risk-off sentiment amongst Investors and traders. They were able to take real-time decisions (albeit panicked).
- Process – A customer orders some items from an e-commerce platform. The platform captures this information in real-time and enhances the user experience by using a recommender engine based on other historical data. This sale is relevant both now and in the future.
- Process – User submits a complaint about a product. In resolving the root cause, you may want to know if this is an issue with your supplier or a one-off.
Variety of Data
Digital data can be generated by humans and machines in a structured, semi-structured and unstructured format.
What is structured data?
Structured data is data that conforms to a predefined data model. Unstructured data are typically captured and stored in relational database management systems like Oracle database, Microsoft SQL Server, IBM DB2, and MYSQL. Systems are defined around fixed columns and rows in tuples known as tables, and the data types are also specified in the data model.
What is Unstructured Data?
Unstructured data have no predefined format and can be captured in varieties of ways. These can be text and videos. There is no agreed format or data validation.
What is semi-structured Data?
Semi-structured data is somewhat structured but does not conform to a data model like the structured data.
- Structured – Relational databases and data warehouses characterised by predefined data models.
- Semi-structured – tweets, some text formats like XML, JSON,NoSQL
- Unstructured – social media data, emails, audio, text files, videos, satellite imagery, digital surveillance, sensor data
We know what data we require and the frequency of the capture. Now we want to discuss the available tools for storing this data. It is good practice to use different operational and analytic systems because analytical systems are usually processing intensive.
In part three, I will be discussing the data tools.
Khan, Nawsher & Yaqoob, Ibrar & Abaker, Ibrahim & Hashem, Ibrahim & Inayat, Zakira & Kamaleldin, Waleed & Ali, Mahmoud & Alam, Muhammad & Shiraz, Muhammad & Gani, Abdullah. (2013). Big Data: Survey, Technologies, Opportunities, and Challenges.