Pulse Plus

PhonePe recently released Pulse repo from their payment data. It was hard to get an overview of the data without doing some data transformation. The data is eight levels deep, nested, and multiple files for similar purpose data. Hard to do any command-line aggregate queries for data exploration. It’s hard to do any analysis with 2000+ files. So I created an SQLite database of the data using python sqlite-utils. The SQLite database aggregated data and top data in 5 tables - aggregated_user, aggregated_user_device, aggregated_transaction, top_user, top_transaction. [Read More]

jut - render jupyter notebook in the terminal

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The definition copied from the official website. It’s becoming common to use Jupyter notebook to write books, do data analysis, reproducible experiments, etc… The file produced out of notebook follows JSON Schema. [Read More]