|

Python for business analyst: customer retention – cohort analysis

If we have a table Registration for a new customer registration and a table of sales called Basket, then the task to do is calculate customer retention of the newly acquired customers. It means that we should calculate the number of purchases of the newly registered customers in the month of registration = 0-th month, the first month after and 2, 3, 4-9 months after. This calculation allows the business analyst to present customer acquisition process in the form of cohort. Here is a sample of such a presentation. Dataset used in the post is built based on Tablue samples downloaded from here or there.

Cohort Data

In order to track how users behave over time or how the same behavior differs for different perspectives, cohort analysis helps to compare these dimentions / time they were acquired and the retention of those users over time.

Users for cohort analysis can be divided in two ways:

  • Acquisition Cohorts: divide users by when they signed up first to your shop. According to your analysis you might break down your cohorts by the day, the week or the month they purchased, and track daily, weekly or monthly how then continue with your store. In this case, you are measuring the retention of these users.
  • Behavioral Cohorts: divide users by the behaviors they have or have not taken within a given time period. These could be any number of discrete actions that a user can perform like clicking links, action points wisit product carts or put products to the carts, or browse of different ways, etc. – any combination of actions and events. In this case, an analysis groups users who did certain actions within a specified timeframe – say, within first x days or after y days, weeks or months. You can then monitor how long different state of user behaviour is active after they perform certain actions. It is importn as a certain moment of trouth.

Dataset

Registration data

Orders

The first thing we need to do is summarising order data according key [Order ID], [Order Date], Customer ID] to a customers which were buying in certain dates:

SELECT [Order ID]
      ,[Order Date]
	  ,[Customer ID]
      ,count(distinct [Product ID]) product_count
      ,sum([Sales]) sales_value

  FROM [mietwood].[dbo].[ss_Orders]
  group by [Order ID]
      ,[Order Date]
	  ,[Customer ID]

If we count customers accrding registration date year and mont we got registration statistics.

Now the question is how they were buying the months followin registration. We can go with code below. We calculate number of customers which has bought in the month of registration and in the following months, where m3 means 3 and more.

with t1 as (
     SELECT 
	  [Customer ID]
      ,EOMONTH([Order Date]) Order_Date_month
	  ,COUNT(distinct [Order ID]) orders_count
	
      ,count(distinct [Product ID]) products_count
      ,sum([Sales]) sales_value
  FROM [mietwood].[dbo].[ss_Orders] where [Order Date] <= DATEFROMPARTS(2015,12,31)
  group by [Customer ID], EOMONTH([Order Date]) 
  )
select 
  eomonth(a.[Registration date]) reg_date_month --, t1.Order_Date_month
  ,count(distinct a.[Customer ID]) Cust_registered
  ,sum( iif( Order_Date_month = EOMONTH(a.[Registration date]) ,1,0)) m0_Cust_count
  ,sum( iif( Order_Date_month = dateadd(month,1,EOMONTH(a.[Registration date])),1,0)) m1_Cust_count
  ,sum( iif( Order_Date_month = dateadd(month,2,EOMONTH(a.[Registration date])),1,0)) m2_Cust_count
  ,sum( iif( Order_Date_month >= dateadd(month,3,EOMONTH(a.[Registration date])),1,0)) m2_Cust_count
from [mietwood].[dbo].[ss_Registration] a
   left join t1 on a.[Customer ID] = t1.[Customer ID] where [Registration date] <= DATEFROMPARTS(2015,12,31)
group by eomonth(a.[Registration date])
order by 1,2

So, what we can see. Crearly there is no many customers buying every month, but they return to the shop in camming months. Lets brake it for all month for Power BI analysis. You can do it using following code.

with t1 as (
     SELECT 
	  [Customer ID]
      ,EOMONTH([Order Date]) Order_Date_month
	  ,COUNT(distinct [Order ID]) orders_count
	
      ,count(distinct [Product ID]) products_count
      ,sum([Sales]) sales_value
  FROM [mietwood].[dbo].[ss_Orders] where [Order Date] <= DATEFROMPARTS(2015,12,31)
  group by [Customer ID], EOMONTH([Order Date]) 
  )
select 
  eomonth(a.[Registration date]) reg_date_month, t1.Order_Date_month, datediff(month, eomonth(a.[Registration date]),t1.Order_Date_month) Months_from_reg
  ,count(distinct t1.[Customer ID]) Cust_buying

from [mietwood].[dbo].[ss_Registration] a
   left join t1 on a.[Customer ID] = t1.[Customer ID] where [Registration date] <= DATEFROMPARTS(2015,12,31)
group by eomonth(a.[Registration date]), t1.Order_Date_month
order by 1,2

The results of the query can be analysed in Excel

Finally every quoter of the year 2014 the company have aquired about 150 customers. They all have bought in the month of registration (by definition), but 10% of them are continue purchase during live time with the company. So, the retention rate is 10%.

Customer retention in Power BI

We ca present customer retention data in Power BI in form of triangular table after calculating customers acquired in month 0 and then buying in the following months 1-12 as presented in right panel.

We can present it also in form of stocked bars where the X axis presents the year and month of customer acquisition and the color in stocked bars represents income from customers in following month after acquisition giving finally the amount of total income from customer or from period of acquisition.

To calculate triangular table in PBI additional calculation table should be created using formula from menu modelling/new table as presented below:

This table should be used as the column creator in Matrix visual, where rows, as date of registration should be taken from table created base on the select presented before.

To calculate measure following formula can be ustilised:

Customer retention cust = 
var CurentMonthAfter = SELECTEDVALUE(Registration_months_after[Following_months])
var CustRegDate = SELECTEDVALUE(Registration_purchase_orders[dt_reg_eom]) return

    CALCULATE(DISTINCTCOUNT(Registration_purchase_orders[CustomerIdEx]),
        FILTER(Registration_purchase_orders,
            (Registration_purchase_orders[Order_date_eom] = EOMONTH(CustRegDate,CurentMonthAfter))
            && (Registration_purchase_orders[ErpValueNetto] > 0)
        )
    )

To calculate % of retain customers can be used following code

Customer retention proc = 
var CurentMonthAfter = SELECTEDVALUE(Registration_months_after[Following_months])
var CustRegDate = SELECTEDVALUE(Registration_purchase_orders[dt_reg_eom]) return
DIVIDE(
    CALCULATE(DISTINCTCOUNT(Registration_purchase_orders[CustomerIdEx]),
        FILTER(Registration_purchase_orders,
            (Registration_purchase_orders[Order_date_eom] = EOMONTH(CustRegDate,CurentMonthAfter)) 
            && (Registration_purchase_orders[ErpValueNetto] > 0)
        )
    ), CALCULATE(DISTINCTCOUNT(Registration_purchase[CustomerIdEx]),FILTER(Registration_purchase,eomonth(Registration_purchase[dt_reg],0)=CustRegDate))
)

Happy coding.

Apendix

What Chat GPT assistend adivses

Cohort analysis is a useful tool for businesses to understand customer behavior over time and identify trends or patterns among different groups of customers.

Here are the steps involved in performing a cohort analysis for customer retention:

  1. Define your cohorts: Start by segmenting your customers into groups or cohorts based on a specific criterion, such as the month or year they made their first purchase. This allows you to compare how customers from different cohorts behave over time.
  2. Calculate retention rate: For each cohort, calculate the percentage of customers who make repeat purchases in subsequent time periods. For example, you can calculate the percentage of customers who return to make a purchase in the second month after their initial purchase, then in the third month, and so on.
  3. Visualize the retention curve: Create a retention curve, also known as a cohort analysis chart, to visualize the retention rates of each cohort over time. This can be done using tools like Excel, Google Sheets, or specialized data visualization software.
  4. Identify trends and patterns: Analyze the retention curve to identify any trends or patterns among your different cohorts. For example, you may find that customers in certain cohorts have higher or lower retention rates compared to others. This can help you identify key factors or actions that contribute to customer retention.
  5. Take action: Based on your findings, you can develop strategies to improve customer retention. This could involve implementing targeted marketing campaigns, enhancing customer experiences, or improving product/service offerings.

Remember that cohort analysis is an ongoing process, so it’s important to regularly track and update your analysis as new cohorts are formed.

Here’s an example of how you can approach the analysis using the given tables (R (Date of registration, Customer Id) and P (Date of purchase, Customer Id, value of purchase)):

  1. Define cohorts: To start, you can define cohorts based on the month or year of customer registration. This means grouping customers who registered within the same time period. For example, you can group all customers who registered in January 2021 together.
  2. Calculate retention rate: For each cohort, calculate the percentage of customers who made a repeat purchase in subsequent time periods. Calculate retention rates for different time periods after registration, such as the second month, third month, and so on. To calculate retention rates, you’ll need to track if a customer made a purchase in a specific time period after their registration.
SELECT
  registration_month,
  purchase_month,
  COUNT(p.customer_id) / COUNT(DISTINCT r.customer_id) AS retention_rate
FROM
  R r
LEFT JOIN
  P p ON r.customer_id = p.customer_id
    AND p.date_of_purchase >= r.date_of_registration
    AND p.date_of_purchase < DATE_ADD(r.date_of_registration, INTERVAL 2 MONTH) 
       -- Adjust the interval as needed
GROUP BY
  registration_month,
  purchase_month 

It looke simmilar what we have made before.

If anything helpful please leave a comment.

Similar Posts