Advanced Machine Learning, Data Mining, and Online Advertising Services
In this post we explain the importance of using online algorithms for computing KPI's statistics efficiently when dealing with collecting a stream of data observations in real-time.
There are different scenarios where a company is collecting a large amount of data from its products (e.g. users interaction with a game) such as the time each user spends in the game or amount of money each user spends to purchase in-app items. Most of the times companies need to compute some baisc KPIs (key performance indicators) and display them on a Dashboard so that Execs/Dev/Sale/Marketing teams can get first-hand insights on how the product is performing over time and focus their energy/time on effective strategies to improve the main KPI's (e.g. users average LTV).
Thus, computing basic/advanced stats around selected metrics such as means and standard deviations in real time becomes a basic need. One way to compute the statistics on important KPIs is to strore all data and then run the required computation on it once a while in order to update stats over time. This naive computation takes at least n steps when the size of data is n.
However, there are more efficient techniques to update the stats of a stream of observations which saves us time/memory complexities. We can employ a recursive inceremental way to compute metrics where we use every new observed sample to update the current state of the KPI's stat. For instance for computing mean of a sequence using an Online approach one should implement the following formula:
Using above recurrence equation, one can update the expectation for the given KPI as new observation arrives in constant time. This helps minimize space and time complexities and consequently save money on servers. One can also use online algorithms to compute the standard deviation of a random variable by computing :
See online algorithm for a simple python script demonstrating how one can use online algorithms to compute mean/std for a sequence of data. We also discussed Multi-Armed Bandit Algorithms as another example of online techniques for the purpose of optimization.