- What is A/B testing?
- Where can A/B testing be used?
- Key performance indicators (KPIs)
- meditation app
- Dataset 1: User demographics
- Dataset 2: User actions
- KPI: Conversion Rate
- Joining the demographic and subscription data
# Import pandas
import pandas as pd
# Load the customer_data
customer_data = pd.read_csv("customer_data.csv")
# Load the app_purchases
app_purchases = pd.read_csv("inapp_purchases.csv")
# Print the columns of customer data
print(customer_data.columns)
# Print the columns of app_purchases
print(app_purchases.columns)
# Merge on the 'uid' field
uid_combined_data = app_purchases.merge(customer_data, on=['uid'], how='inner')
# Examine the results
print(uid_combined_data.head())
print(len(uid_combined_data))
# Merge on the 'uid' and 'date' field
uid_date_combined_data = app_purchases.merge(customer_data, on=['uid', 'date'], how='inner')
# Examine the results
print(uid_date_combined_data.head())
print(len(uid_date_combined_data))
- KPIs
- 1. Methods for calculating KPIs
- 2. Grouping Data: .groupby()
- 3. Aggregating data - mean price paid per group
- 4. Aggregate data: .agg()
# Calculate the mean purchase price
purchase_price_mean = purchase_data.price.agg('mean')
# Examine the output
print(purchase_price_mean)
# Calculate the mean and median purchase price
purchase_price_summary = purchase_data.price.agg(['mean', 'median'])
# Examine the output
print(purchase_price_summary)
# Calculate the mean and median of price and age
purchase_summary = purchase_data.agg({'price': ['mean', 'median'], 'age': ['mean', 'median']})
# Examine the output
print(purchase_summary)
# Group the data
grouped_purchase_data = purchase_data.groupby(by = ['device', 'gender'])
# Aggregate the data
purchase_summary = grouped_purchase_data.agg({'price': ['mean', 'median', 'std']})
# Examine the results
print(purchase_summary)
# Compute max_purchase_date
max_purchase_date = current_date - timedelta(days=28)
# Filter to only include users who registered before our max date
purchase_data_filt = purchase_data[purchase_data.reg_date < max_purchase_date]
# Filter to contain only purchases within the first 28 days of registration
purchase_data_filt = purchase_data_filt[(purchase_data_filt.date <=
purchase_data_filt.reg_date + timedelta(days=28))]
# Output the mean price paid per purchase
print(purchase_data_filt.price.mean())
# Set the max registration date to be one month before today
max_reg_date = current_date - timedelta(days=28)
# Find the month 1 values
month1 = np.where((purchase_data.reg_date < max_reg_date) &
(purchase_data.date < purchase_data.reg_date + timedelta(days=28)),
purchase_data.price,
np.NaN)
# Update the value in the DataFrame
purchase_data['month1'] = month1
# Group the data by gender and device
purchase_data_upd = purchase_data.groupby(by=['gender', 'device'], as_index=False)
# Aggregate the month1 and price data
purchase_summary = purchase_data_upd.agg(
{'month1': ['mean', 'median'],
'price': ['mean', 'median']})
# Examine the results
print(purchase_summary)