Popular pop music artists
Where does the age data come from? As part of the enrollment process for most music services, the user is asked for a few pieces of demographic data, including gender and year-of-birth. As an example, here’s a typical user-enrollment screen from a popular music subscription service:
Is this age data any good? The first thing we need to do before we get too far with analyzing the age data is to get a feel for how accurate it is. If new users are entering random values for their date of birth then we won’t be able to use the listener’s age for anything useful. For this study, I looked at the age data submitted by several million listeners. This histogram shows the relative number of users by year of birth.
The first thing I notice is the curve has the shape one would expect. The number of listeners in each age bucket increases as the listener gets younger until around age 21 or so, at which points it drops off rapidly. The shape of the curve aligns with the data from this study by EMI in 2011 that shows the penetration of music streaming service by age demographic. This is a good indicator that our age data is an accurate representation of reality.
However, there are a few anomalies in the age data. There are unexpected peaks at each decade – likely due to people rounding their birth year to the nearest decade. A very small percentage (0.01 %) indicate that they are over 120 years old, which is quite unlikely. Despite this noise, the age data looks to be a valid and fairly accurate representation, in the aggregate, of the age of listeners. We should be able to use this data to understand how age impacts listening.