Sign inGet started
← Back to all guides

How to Analyze Ecommerce Purchases: A Step-by-Step Guide

By Deepnote team

Updated on November 23, 2023

Learn to process and analyze San Francisco city employee salary data using Pandas. This tutorial covers data manipulation techniques, visualization, and insights extraction.

In this tutorial, we'll explore a dataset of fictional ecommerce purchases, showcasing how to effectively utilize Python and Pandas for data analysis. Our dataset is a compilation of simulated transactions on Amazon. Remember, the data is fabricated and may not align with real-world scenarios.

Step 1: Setting Up Your Environment

Firstly, import the Pandas library and load the dataset into a DataFrame. We'll name our DataFrame ecom for easy reference.

import pandas as pd
ecom = pd.read_csv('Ecommerce Purchases')

Step 2: Getting to Know Your Data

Before diving into analysis, it's crucial to understand your dataset's structure. Use ecom.head() to view the first few rows.

ecom.head()

This will display the top entries in your DataFrame, giving you a glimpse of the various columns and types of data available.


Address	Lot	AM or PM	Browser Info	Company	Credit Card	CC Exp Date	CC Security Code	CC Provider	Email	Job	IP Address	Language	Purchase Price
0	16629 Pace Camp Apt. 448\nAlexisborough, NE 77...	46 in	PM	Opera/9.56.(X11; Linux x86_64; sl-SI) Presto/2...	Martinez-Herman	6011929061123406	02/20	900	JCB 16 digit	pdunlap@yahoo.com	Scientist, product/process development	149.146.147.205	el	98.14
1	9374 Jasmine Spurs Suite 508\nSouth John, TN 8...	28 rn	PM	Opera/8.93.(Windows 98; Win 9x 4.90; en-US) Pr...	Fletcher, Richards and Whitaker	3337758169645356	11/18	561	Mastercard	anthony41@reed.com	Drilling engineer	15.160.41.51	fr	70.73
2	Unit 0065 Box 5052\nDPO AP 27450	94 vE	PM	Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ...	Simpson, Williams and Pham	675957666125	08/19	699	JCB 16 digit	amymiller@morales-harrison.com	Customer service manager	132.207.160.22	de	0.95
3	7780 Julia Fords\nNew Stacy, WA 45798	36 vm	PM	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0 ...	Williams, Marshall and Buchanan	6011578504430710	02/24	384	Discover	brent16@olson-robinson.info	Drilling engineer	30.250.74.19	es	78.04
4	23012 Munoz Drive Suite 337\nNew Cynthia, TX 5...	20 IE	AM	Opera/9.58.(X11; Linux x86_64; it-IT) Presto/2...	Brown, Watson and Andrews	6011456623207998	10/25	678	Diners Club / Carte Blanche	christopherwright@gmail.com	Fine artist	24.140.33.94	es	77.82

Step 3: Basic Statistics and Information

To get a sense of the scale of your dataset, use ecom.info() to view the number of entries, columns, and data types.

ecom.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
Address             10000 non-null object
Lot                 10000 non-null object
AM or PM            10000 non-null object
Browser Info        10000 non-null object
Company             10000 non-null object
Credit Card         10000 non-null int64
CC Exp Date         10000 non-null object
CC Security Code    10000 non-null int64
CC Provider         10000 non-null object
Email               10000 non-null object
Job                 10000 non-null object
IP Address          10000 non-null object
Language            10000 non-null object
Purchase Price      10000 non-null float64
dtypes: float64(1), int64(2), object(11)
memory usage: 1.1+ MB

Step 4: Calculating Average Purchase Price

To find the average purchase price, apply the mean() function to the Purchase Price column.

ecom['Purchase Price'].mean()
50.34730200000025

Step 5: Identifying Price Extremes

Determine the highest and lowest purchase prices with the max() and min() functions.

highest_price = ecom['Purchase Price'].max()
lowest_price = ecom['Purchase Price'].min()

Step 6: Language Preferences

To see how many customers prefer English, filter the DataFrame by the Language column.

english_speakers = ecom[ecom['Language']=='en'].count()

Step 7: Job Title Analysis

If you're interested in how many customers are lawyers, filter by the Job column.

lawyers_count = ecom[ecom['Job'] == 'Lawyer'].count()
30

Step 8: Purchase Times

You can use value_counts() to see how many purchases were made in the AM vs. PM.

purchase_times = ecom['AM or PM'].value_counts()
purchase_times
PM    5068
AM    4932
Name: AM or PM, dtype: int64

Step 9: Common Job Titles

Identify the most common job titles using value_counts() on the Job column.

common_jobs = ecom['Job'].value_counts().head(5)
common_jobs
Interior and spatial designer    31
Lawyer                           30
Social researcher                28
Purchasing manager               27
Designer, jewellery              27
Name: Job, dtype: int64

Step 10: Specific Transaction Details

To find details about a specific transaction, filter by a unique identifier, like Lot.

specific_purchase = ecom[ecom['Lot']=='90 WT']['Purchase Price']
specific_purchase
513    75.1
Name: Purchase Price, dtype: float64

Step 11: Finding Email by Credit Card Number

If you need to find a user's email based on their credit card number:

user_email = ecom[ecom["Credit Card"] == 4926535242672853]['Email']
user_email
1234    bondellen@williams-garza.com
Name: Email, dtype: object

Step 12: High-value Transactions

To analyze high-value transactions with a specific credit card provider:

high_value_transactions = ecom[(ecom['CC Provider']=='American Express') & (ecom['Purchase Price']>95)].count()
high_value_transactions
Address             39
Lot                 39
AM or PM            39
Browser Info        39
Company             39
Credit Card         39
CC Exp Date         39
CC Security Code    39
CC Provider         39
Email               39
Job                 39
IP Address          39
Language            39
Purchase Price      39
dtype: int64

Step 13: Credit Card Expiration Analysis

For credit cards expiring in a specific year, use a lambda function to filter.

expiring_cards_2025 = sum(ecom['CC Exp Date'].apply(lambda x: x[3:]) == '25')
expiring_cards_2025
1033

Step 14: Popular Email Providers

Finally, to determine the most popular email providers among your customers:

email_providers = ecom['Email'].apply(lambda x: x.split('@')[1]).value_counts().head(5)
email_providers
hotmail.com     1638
yahoo.com       1616
gmail.com       1605
smith.com         42
williams.com      37
Name: Email, dtype: int64

Summary

In this tutorial, we've explored various methods to analyze an ecommerce dataset using Python and Pandas. We covered basic DataFrame operations, statistical analysis, and data filtering techniques, demonstrating how to extract meaningful insights from an ecommerce dataset. These techniques are fundamental in data science and can be applied to a wide range of datasets for insightful analysis.

Footer

Product

  • Integrations
  • Pricing
  • Documentation
  • Changelog
  • Security

Company

Comparisons

Resources

  • Privacy
  • Terms

© Deepnote