An architect’s dream is an engineer’s nightmare! As you already know having a vision of your analytics is someone else’s nightmare. Some problems you will be looking to solve may be very challenging. Once you have your data in production you must set up monitoring. Delivering an analytics platform in the past was challenging. Now with cloud technology a lot of impossible tasks are more attainable.
Becoming data-driven enables you to gain insight into people, businesses and things by combining your organisation data (Customer records transactional systems, operational systems, expert knowledge) with external events. Examples of external data can be social media, image data, weather and other event data. One of the driving forces for creating analytics is enabling your organisation to make informed decisions based on data to drive growth and productivity.
In the previous write-ups, part 1, part 2 and part 3, we saw we had to take the following steps: Identify your analytics requirement, source the data, choose and create your data platform. The great thing about having your data in place is that you can explore your data and decide how to use the data.
Your data is ready for the next step. You have all your internal and external data. Still, you want to get more meaningful insights into your business as an organisation to enable you to make cost savings and revenue-generating decisions.
Examples of problems that analytics address:
Who are your top customers, what are they buying? What is our customer attrition rate? What is our share of the market? How well are our products doing compared to the competition? How long to the next machine component break down? There is terrible weather forecasted for tomorrow, how will that affect your staff, your flights? Which planes will be affected?
Using analytics, we can choose what system to put in place for your business operations.
There are four main types of analytical models, which I talk about next.
So what is descriptive analytics? These are your time-bound standard reports that explain what events have happened and why. They also so answer questions like when or how many times an event happened. Dashboards, standard reports, ad-hoc reports or any simple report are all forms of descriptive analytics. Reports can have numbers, text and charts. Reports can include drill-downs and alerts. A simple report most users have is your bank statement which shows your incoming and outgoing cash, interest and fees. Other reports are your household bills, for example, electricity which tells you the amount of energy you have used over a given period. Most organisations are here have data warehouses and business intelligence solutions.
The example reports I gave above are reports that a company sends out to their customers. Each of these customers sees information that is related to them and them only. On the other hand, the companies sending out these reports have a bigger picture. They have an aggregated view of the reports for all clients and make more efficient business decisions using the data.
Let us take an airline company as an example. We want to decide if we should continue to fly from a remote airport. We need a report that gives us the passenger number over the last year, the revenue generated, and costs associated with the trip, stopovers and direct flights.
Predictive analytics is a branch of advanced analytics that uses historical data and statistical algorithms to predict future trends. To build models, we use historical data to train our models. To predict a value based on new data, you need to prepare the data (data preprocessing), build the model and validate your models. Let us look at this briefly. Here we answer the question, “what will happen and why? What is the best action to take? “.
- Data cleaning and transformation
- Outlier detection and treatment
- Missing values
- Dimension reduction
- Data reduction
- Feature selection
Several models are available for predicting and classifying your dependent variables. Before building and selecting your model, you need to choose features relevant to your model. The model you select will depend on the type of the dependent variable you wish to predict. For example, if your dependent variable is binary (1 & 0), you may want to look at using a logistic regression model.
Types of Model
- Linear Regression
- Multiple Regression
- Logistic Regression
- Multiple Logistic Regression
- Classification trees
- Regression trees
- Bagging, Boosting, Random Forest
3. Neural Networks
- Recurrent neural networks
- Artificial neural networks
- Convolutional neural networks
Choosing a valid model is an essential step in your analytics. Here is where you may use out-of-sample data to check that the model gives good results. For example, you may wish to check for model overfitting when you get excellent training results but poor test results.
As the name suggests prescribes a course of action. Your organisations can build and optimise models subject to constraints to enable you to make a decision. Some problems require yes/no decisions, and this problem is a binary problem.
Models are similar to predictive models.
What is Cluster Analysis? Cluster analysis is a way of grouping elements where the elements are similar. Clustering is used in market analysis to find customer segments to which the company can market their product, and clustering is also used for data reduction.
Example of Cluster Analysis: An example of segmentation is generation segments often used for marketing products based on the customer’s segment.
- The Greatest Generation: Born between 1901 – 1924
- The Silent Generation: Born between 1924 – 1945
- Baby Boomers: Born between 1946 – 1964
- Generation X: Born between 1965 and 1980
- Generation Y or Millennials: Born between 1981 and 2000
- Generation Z: It is the generation born after 1995
What is Principal Component Analysis (PCA)? – Principal component is unsupervised learning, often used to reduce the dimensionality of your data.
Example: If you have one hundred variables present in your dataset, Principal component analysis can be used to find a lower number of variables that can represent 90% of your data.
What is Optimisation: In general, optimisations tends to maximise or minimise specific output. For example, an organisation wishes to invest £1,000,000 in 20 stocks. These stocks can be chosen in any order but cannot exceed the investment. The objective is to maximise the profit. Optimisations are usually nonlinear and multimodal with complex constraints.
Constraints: With our models, we can specify constraints. For example, I want to invest £100,00, 40% of the money in technology stocks and 10% Industrial stocks.
With cognitive, we are looking at reasoning learning and natural language processing. Here the system is constantly learning and refining analysis.
Latent Dirichlet Analysis (LDA) – This is unsupervised which classifies documents in different topics.
Sentiment Analysis is used for a classification problem or a Regression problem.
The attention model uses an encoder (RNN), all hidden states and an attention decoder.
Sequence to sequence model – uses encoder (RNN), Context vector and a decoder(RNN)
Long short term model (LSTM ) – Generally used for classification and prediction models.
Gated recurrent unit (GRU) – This is a type of RNN.
Acoustic model – helps turn sound signal into a type of phonetic representation.
Language Model – Grammar, words and sentence structure for a language
Hidden Markov Model – uses transition probabilities and emission probability
Algorithmic trading allows you to take data real-time from social media, live broadcast, economic news and other signals and take a position in the market.
Let us look at what happens: Sentiment analysis is used to determine the nature of the tweets from Twitter. Here we want to determine if the sentiment is positive or negative for a market instrument (e.g. TESLA).
A live broadcast is converted from speech to text and similarly analysed for positive or negative sentiments—the output from the sentiment analysis and other inputs consumed by your model and other inputs. The final step executes an action automatically by taking a position in the market. This is cognitive analytics.
In conclusion, there are various excellent reasons why an organisation should become data-driven. When chosen, all your analytics will have to be deployed into the production environment and available for users either as self-service, alerts, reports, or other uses.
Having the platform and all the analytics is not enough for your organisation to be data-driven. You have to have a data-driven culture in the organisation from how data is stored and collected. Data-driven culture has to be driven by the senior leadership of an organisation.
Blei, D., NG, A. and Jordan, M., 2003. [online] Jmlr.org. Available at: <https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf> [Accessed 26 September 2021].
Arxiv.org. 2014. [online] Available at: <https://arxiv.org/pdf/1406.1078.pdf> [Accessed 26 September 2021].