Python for business analytics – rfm analysis
We will practice python for business analytics on customer segmentation. RFM is a popular customer segmentation technique used in business analytics and marketing. It stands for Recency, Frequency, and Monetary Value, which are three key metrics used to analyze and segment customers based on their behavior and value to the business.
Here’s a breakdown of each component of RFM:
- Recency:
- Recency refers to the amount of time that has passed since a customer’s last interaction or purchase with a business.
- Customers who have more recently interacted with or made a purchase from the business are generally considered more engaged or active.
- Frequency:
- Frequency measures how often a customer engages with or makes purchases from the business within a specific timeframe.
- Customers who make frequent purchases or interact with the business more often are usually considered loyal or of higher value.
- Monetary Value:
- Monetary Value, also known as the revenue/value contribution from the customer, represents the total amount of money a customer has spent or generated for the business.
- Customers who have made high-value purchases or have a higher overall spending tend to be more valuable to the business.
To apply RFM segmentation, businesses typically assign numerical values to each component (Recency, Frequency, Monetary Value) on a scale (e.g., from 1 to 5) or divide the values into different segments. For example:
- Recency: Customers who made a purchase within the last 30 days might be given a higher score (e.g., 5), while those who haven’t made a purchase within the last 90 days might be given a lower score (e.g., 2).
- Frequency: Customers who have made multiple purchases within a given time frame may receive a higher score (e.g., 4), whereas those who rarely make purchases might be assigned a lower score (e.g., 2).
- Monetary Value: Customers who spend more money per transaction might be given a higher score (e.g., 5), while those who spend less would receive a lower score (e.g., 2).
After assigning scores or segments to each RFM component, businesses can combine them to create RFM segments. The resulting segments help identify different customer groups, such as:
- Champions: High-value customers who have made recent purchases frequently.
- Sleeping/Lapsed: Customers who were once frequent buyers but haven’t made a purchase in a while.
- Promising: Recently engaged customers who have not spent much yet.
- At-risk: Customers who used to be frequent and high-value but haven’t made a recent purchase.
- etc.
By segmenting customers based on RFM, businesses can tailor their marketing strategies and efforts to each segment’s unique characteristics and needs. This enables businesses to understand customer behavior, prioritize marketing initiatives, and implement personalized approaches to maximize customer engagement, loyalty, and revenue. This is the article in our series of python for business analytics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#import plotly.express as px
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
pd.options.display.float_format = '{:,.4f}'.format
def p25 (x): return np.percentile(x, q=25)
def p50 (x): return np.percentile(x, q=50)
def p75 (x): return np.percentile(x, q=75)
def p90 (x): return np.percentile(x, q=90)
def p95 (x): return np.percentile(x, q=95)
def p99 (x): return np.percentile(x, q=99)
xls_file = 'e-com_dataset_mp_c3.xlsx'
print('\nreading order_items')
dfxx = pd.read_excel(xls_file, sheet_name="order_items")
# dfxx.sample(5000).to_excel('e-com_dataset_mp_c3-5000.xlsx')
print(dfxx.info())
print(dfxx.columns)
Python for business analytics – for datetime processing
The provided code calculates various date-related attributes and derived variables for each row in a DataFrame. It extracts the year, month, week, and day of the week from a given ‘order_date’ column. It also creates a new column called ‘date_dt’ with only the date component, and another column called ‘year_week’ by combining the year and week number. Additionally, it calculates the product of the ‘price’ and ‘quantity’ columns for each row and assigns the result to a new column called ‘value_row’. These calculations allow for further analysis and segmentation based on different time dimensions and value calculations.
# for datetime processing
dfxx['year'] = dfxx['order_date'].dt.isocalendar().year
dfxx['month'] = dfxx['order_date'].dt.month
dfxx['week'] = dfxx['order_date'].dt.isocalendar().week
dfxx['weekday'] = dfxx['order_date'].dt.isocalendar().day
dfxx['date_dt'] = dfxx['order_date'].dt.date
dfxx['year_week'] = dfxx.apply(lambda r: 'y'+str(r.year)+'-'+str(r.week).rjust(2, "0"), axis=1)
dfxx['year_week_num'] = dfxx.apply(lambda r: r.week if r.year == 2017 else 52 + r.week, axis=1)
dfxx['value_row'] = dfxx.apply(lambda r: round(r.price*r.quantity,2), axis=1)
# value over time
dfxx_yw = dfxx.groupby(['year_week_num'])['value_row'].sum().reset_index()
dfxx_yw.info()
dfxx_yw.sort_values('year_week_num',ascending=True).plot(kind='bar', figsize=(15, 10))
plt.style.use('default')
plt.scatter( x=dfxx_yw['year_week_num'], y=dfxx_yw['value_row'], s=5 )

# repeating customers
dfxx_rc = dfxx.groupby(['year_week_num','customer_id'])['value_row'].sum().reset_index()
dfxx_rc.info()
dfxx_rep_cust = dfxx_rc.groupby('customer_id').agg({
'year_week_num': ['count','max'],
'value_row': 'sum'
}).reset_index()
dfxx_rep_cust.columns
dfxx_rep_cust['Recency'] = dfxx_rep_cust[('year_week_num','max')]
dfxx_rep_cust['Frequency'] = dfxx_rep_cust[('year_week_num','count')]
dfxx_rep_cust['Monetary'] = np.log10(dfxx_rep_cust[('value_row', 'sum')])
dfxx_rep_cust.info()
dfxx_rep_cust['Monetary'].sort_values().plot(kind='hist')
dfxx_rep_cust['Recency'].sort_values().plot(kind='hist')
dfxx_rep_cust['Frequency'].sort_values().plot(kind='hist')



# same oreder in columns
dfxx_rfm = dfxx_rep_cust[[( 'customer_id', ''),
( 'Recency', ''),
( 'Frequency', ''),
( 'Monetary', '')]]
dfxx_rfm.columns = ['cid','rec','frec','mon']
To normalize the columns ‘rec’, ‘frec’, and ‘mon’ in the DataFrame dfxx_rfm for further RFM clustering, you can use the MinMaxScaler from the sklearn.preprocessing module. Here’s an example code snippet to demonstrate the normalization process. In the code MinMaxScaler scales the values within a specified range, typically between 0 and 1. It transforms the selected columns (‘rec’, ‘frec’, and ‘mon’) by mapping their original values to the scaled range. The resulting normalized values will now be between 0 and 1, allowing for fair comparison and clustering based on RFM scores. Python for business analytics.
from sklearn.preprocessing import MinMaxScaler
# Create a MinMaxScaler object
scaler = MinMaxScaler()
# Select the columns to be normalized
columns_to_normalize = ['rec', 'frec', 'mon']
# Apply the MinMaxScaler to the selected columns
dfxx_rfm[columns_to_normalize] = scaler.fit_transform(dfxx_rfm[columns_to_normalize])
In this code, you first define the number of clusters you want (in this case, 3). Then, you create an instance of the KMeans algorithm with the desired number of clusters. Next, you fit the model on the normalized data using the fit method, which assigns each data point to a cluster. The resulting cluster labels are accessed via kmeans.labels_. Finally, you add the cluster labels to the original DataFrame as a new column named ‘cluster’.
Now, you can analyze the clusters and label customers as ‘good,’ ‘middle-class,’ or ‘tail’ based on their assigned cluster. You may want to inspect the characteristics and behavior of customers within each cluster to define appropriate labels for your specific business context.
from sklearn.cluster import KMeans
# Specify the number of clusters
num_clusters = 3
# Create a K-means clustering object
kmeans = KMeans(n_clusters=num_clusters)
# Fit the model on the normalized data
kmeans.fit(dfxx_rfm_norm)
# Get the cluster labels
cluster_labels = kmeans.labels_
# Add the cluster labels as a new column to the original DataFrame
dfxx_rep_cust['cluster'] = cluster_labels
Data analysis
import matplotlib.pyplot as plt
# Assume you have the clustered data with cluster labels in a DataFrame called dfxx_rfm
# Create a scatter plot of the clustered data
plt.scatter(dfxx_rfm['frec'], dfxx_rfm['mon'], c=dfxx_rfm['cluster'], cmap='viridis')
plt.xlabel('Frequency')
plt.ylabel('Monetary Value')
plt.title('RFM Clustering')
plt.show()


The dependency between frequency and monetary value, as well as recency and monetary value, carries significant business meaning in the context of customer behavior and business analytics. Let’s discuss each dependency separately:
- Dependency between Frequency and Monetary Value:
- Understanding the relationship between the frequency of customer purchases and their monetary value allows businesses to identify patterns and segment customers based on their buying behavior.
- Higher frequency coupled with higher monetary value indicates customers who are not only loyal but also spend more per transaction, potentially representing high-value or VIP customers.
- On the other hand, low frequency with high monetary value may indicate customers who make infrequent but high-value purchases, possibly requiring different engagement strategies targeted at increasing their purchase frequency.
- Dependency between Recency and Monetary Value:
- The connection between recency and monetary value helps businesses evaluate the impact of recency on customer spending and loyalty.
- Customers who have made recent purchases with high monetary value can indicate high engagement and loyalty, as their recent activity suggests continued interest in the business and potential for future purchases.
- Conversely, customers with low recency but high monetary value may indicate a need for re-engagement strategies, as they might be considered lapsed or dormant customers who can potentially be reactivated with targeted marketing efforts.
By examining these dependencies, businesses can make informed decisions about customer segmentation, marketing strategies, and resource allocation. For example, identifying and focusing on the high-frequency, high-value customer segment can help drive revenue growth and customer retention. Similarly, employing targeted campaigns to re-engage customers with high monetary value but low recency can potentially boost their activity and extend their customer lifetime value. Understanding the dependencies between these variables allows businesses to tailor their approaches towards different customer segments, ultimately driving overall business success.
# We can also try a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Extract the three dimensions
x = dfxx_rfm['rec']
y = dfxx_rfm['frec']
z = dfxx_rfm['mon']
# Color-code the points based on the cluster labels
c = dfxx_rfm['cluster']
# Create the scatter plot
scatter = ax.scatter(x, y, z, c=c, cmap='viridis')
# Set labels for each axis
ax.set_xlabel('Recency')
ax.set_ylabel('Frequency')
ax.set_zlabel('Monetary Value')
# Add a colorbar indicating the cluster colors
cbar = fig.colorbar(scatter)
# Show the plot
plt.show()

RFM analysis is a powerful and widely-used technique for customer value and company profitability management. It offers several benefits and applications that contribute to business success. Here’s a summary of the usefulness and applications of RFM analysis:
- Customer Segmentation: RFM analysis allows businesses to segment customers based on their recency, frequency, and monetary value scores. This segmentation helps identify different customer groups such as high-value customers, loyal customers, new customers, or at-risk customers. By understanding these segments, businesses can tailor their marketing strategies, customer experience, and communication to improve customer satisfaction and retention.
- Targeted Marketing Campaigns: RFM analysis provides valuable insights for targeted marketing campaigns. For example, it helps identify customers who haven’t made a purchase recently but have a high monetary value. Such customers may be targeted with personalized offers or incentives to encourage repeat purchases and re-engage them with the business.
- Customer Retention and Loyalty: RFM analysis helps identify and focus on high-value and loyal customers. By understanding their behaviors and preferences, businesses can implement loyalty programs, exclusive offers, or personalized experiences to strengthen customer loyalty, increase customer retention, and ultimately enhance profitability.
- Profitability Management: RFM analysis enables businesses to allocate resources more efficiently. For instance, by identifying and prioritizing high-value customers, businesses can optimize marketing spend, customer service efforts, and cross-selling opportunities towards the customers who provide the most significant contribution to overall profitability.
- Decision-Making and Strategy Formulation: RFM analysis provides valuable insights for strategic decision-making. It helps businesses understand the value and potential of different customer segments, identify opportunities for growth, and formulate effective business strategies to maximize profitability.
Overall, RFM analysis offers a data-driven approach to effectively manage customer value and profitability. By leveraging customer behavioral data, businesses can make informed decisions, improve customer relationships, enhance profitability, and ultimately achieve long-term success.
