Making Daily Song Plays Datasets For Music Labels

Making Daily Song Plays Datasets For Music Labels


Brian Femiano


Advance RSVP is available to Primary Entry badge types only. Walk-ups may be accepted on a first come, first served basis.

Every day we generate and deliver datasets to each of the major record labels. Each dataset will describe in structured detail every single song spin that occurred that day on Pandora to which that label has rights. This includes information on the user that played the song, title, lead artist, how the song play was triggered within the app, exact timestamp of play and other valuable information. Generating and delivering these event-level datasets involves many sequential and intricate, pipelined steps. Delivering these massive files every 24 hours has not been possible over the last 5-10 years, until now. This workshop will cover exactly how we automated every step of the workflow, as well as how we generate the datasets and validate them using Apache Spark and Hive.

Programming descriptions are generated by participants and do not necessarily reflect the opinions of SXSW.

Primary Entry: Platinum Badge, Interactive Badge
Format: Workshop
Event Type: Session
Level: Intermediate
Online: Slides