top of page

The Love-Hate Relationship of every Analyst | Impact Measurement

  • Writer: Kirtish Badwe
    Kirtish Badwe
  • Dec 31, 2025
  • 6 min read


One of the most common questions that Analysts get is

What is the impact of that change we made?

In case of Product Analytics, this is even more frequent.


Today's products are not static by any means. They change continuously. Some of the best products, like Instagram, churn out features at a breakneck pace! While the feature list of these product is long, so is the feature graveyard!


Feature Graveyard is what I call those changes which were rolled back as they did not work well.

  • That new Onboarding Flow which screwed up the retention instead of improving

  • The new personalised section which led to no improvement in conversion but increased the model training costs multifold


The list is endless. Many such features are rolled back but they do provide some important learnings. This Try --> Fail --> Learn --> Repeat loop is exactly what makes great products. Those companies who are able to master this process, are able to improve their offering drastically!


As they say,

Iteration leads to Perfection

But in most of today's products, there are so many things affecting the user behaviour. If not measured properly, impact measurement becomes a nightmare.


So, how do companies conclude whether a feature is a success or not? Let's understand this deeper.







Consider Instagram as an example. The Instagram Feed is one of their primary offerings. It is such a complex product that multiple teams would be working on the Feed simultaneously

  • Some teams would work on making it more engaging

  • Some teams would work to reduce loading time

  • Some would work on Advertising on the feed


... and so on.


Now, if the Tech team makes some change which improves the loading time, and the Product team also makes some changes to the Feed algorithm, how can they measure the impact of their respective change?


Is the engagement improving due to the algorithm changes or lower loading times?


These are the questions which Analysts have to deal with and answer. It becomes crucial to get these answers right, as the future product roadmap may depend on the results of these changes.


So how to measure the impact of changes properly? Following are some of the techniques used for impact measurement.




Pre-Post Analysis


This is the simplest of the impact measurement frameworks. It measures the the target metric before and after the feature live date. The observed change in the metric is considered as the impact of the feature.


Pros

  • Fairly simple to implement and measure

  • Works well when the target metric is predictable


Cons

  • Difficult to attribute impact when multiple changes go live simultaneously

  • If the target metric is noisy, difficult to measure impact

  • Sensitive to time period of analysis

    • If the Pre and Post period are significantly apart, other global/seasonal factors can corrupt data

  • The user distribution on Pre and Post period may change which will corrupt our data


In most of the real world cases, Pre/Post analysis might not provide an exact impact estimate.





Randomised Control Trials


This is the gold standard of testing and impact measurement. In Product or Marketing terminology, it is also known as A/B Testing.


The idea is fairly simple -

  • Randomly divide the user base into 2 parts.

  • Show the feature to Part A (Target cohort)

  • Do not show the feature to Part B (Control cohort)

  • Now, measure the metric impact on Part A vs Part B


This simple design ensures that there are no existing biases in our measurement. Since the users are randomly divided, the chances of getting a biased users is limited. Similarly, as the measurement happens in the same time period, external/seasonal effects do not impact our measurements.


Pros

  • Clean impact measurement is possible

  • Random assignment removes any biases between Target and Control Cohort

  • Duration of Experiment, No. of samples can be quantified

  • Can run multiple experiments simultaneously, which increases the speed of learning



Cons

  • Fairly complex implementation

    • The functionality of showing / not showing feature needs to be built in the product

    • Random assignment needs to be truly random

    • The user cohort (Target / Control) should not change till the experiment duration.

  • A/B Testing might not be possible in all experiments

    • Showing different pricing to different cohorts can lead to user backlash

    • Some experiments, specifically in the pharmaceutical industry, might not allow us to get a control cohort (Not providing a potential life saving drug to patients might not be possible)

  • Correct interpretation of results require advanced statistics knowledge (significance testing, p-values)







Quasi-Experiments (Causal Inference)

As the name suggests, this methodology tries to mimic the A/B Testing, without actually running those tests. The whole crux of A/B Tests is the creation of an unbiased control cohort. But, as mentioned above, the creation of this control cohort is not always possible.

This is where Quasi-Experimentation becomes helpful. Using few techniques, we create synthetic control, which is then used as a comparable cohort to measure the impact.




Difference in Difference


This technique is an advanced version of Pre/Post Analysis.


  • We breakdown our user base into

    • Those who have used the feature

    • Those who have NOT used the feature

  • Then, we carry out a simple Pre-Post analysis separately for each cohort i.e. compute the lift in the target metric for Pre and Post period

  • After this, we take a difference of the lift for each cohort. The idea is the users who have interacted with the feature will have a higher lift than the users who have not.


Fairly simple, right? But just like the Pre/Post analysis, there are a few drawbacks of this method


  • In case of multiple things affecting our target variable, a Pre/Post might not provide the correct impact

  • There is an inherent bias in this method. More engaged users will be the ones to interact with the feature, whereas very low engaged users might not interact with the feature. And, just because the users are very low engaged, we might not see any lift in Pre/Post measurement (so the reason why there was no lift is because users were very low engaged, and nothing to do whether the users interacted with the feature or not)




Matching


In this method as well, we break down the users into Used the Feature / Not Used the Feature. But, for every user who has used the feature, we try to find an Exact twin user who has not used the feature. (we try to match such users, hence the name Matching).

Then we compute the difference between metrics for these users and avg. out the differences for all users.


Pros

  • Clean impact measurement is possible

  • Easier to implement as compared to A/B Testing



Cons


  • Measurement is complicated if there are multiple factors which need to be matched among users (e.g. Matching users with similar Age, Engagement, Revenue potential, experience etc)

  • Exact matches might not be available and significant pre-biases can be present in our cohorts since the cohorts are not randomly created

  • No matches can be found for extreme users i.e. heavily engaged or very low engaged users.





Propensity Score Matching (PSM)

This is a slightly more advanced technique for implementing Matching. Propensity Score can be interpreted as a probability of a User interacting with the feature.

Users are bucketed based on their propensity score and then for each bucket, the metric movement is measured for Feature Interacted vs Feature Not-Interacted Users. The score is calculated based on simple classification algorithms like Logistic Regression.



Pros:

  • Reduces the Feature Interaction bias using PSM buckets

  • Can provide decent measurement if the count of users is high enough.



Cons

  • Similar to matching, the risk of NOT finding a match exists. This is especially true of the user behaviour is influenced by multiple factors

  • Heavy computation needed to calculate the Propensity score and then match the users.







Conclusion

The overall crux of Impact measurement is creating an Unbiased Control Set, and then comparing the metrics of this cohort with our Target Cohort. The job of an Analytics Professional is to carry out a like-to-like comparison.


RCTs or A/B test remain the Gold Standard of testing and impact measurement. But advanced techniques of Causal Inference can also provide directional insights into the impact.


Each of these techniques deserve multiple dedicated articles explaining the logic and implementation process, which we'll definitely publish.


Until then....keep learning!

Comments


bottom of page