How to Summarize Data with Pandas, Python

How to Summarize Data with Pandas, Python. Are you starting to learn how to analyze data using Python Pandas? If yes, this post is for you. We will go over different functions used to summarize data contained in a pandas dataframe.

For demonstration purposes, I used the Supermarket Sales data set from Kaggle. I downloaded the file and saved it in my local drive.

#import library
import pandas as pd
#import file
ss = pd.read_csv('supermarket_sales.csv')
#preview data
ss.head()

Image for post — Supermarket Sales dataframe

info(): provides a concise summary of a dataframe.

I use this method every time I am working with pandas especially when doing data cleaning. It shows you all the information you need to know about your dataframe like: record counts, column names, data types, index range , and memory usage.

By looking at the summary provided for ss.info() below we can observe:

record count is 1000
composed of 17 columns
Column names can be updated to eliminate white spaces
Data types included are object, float64 and int
Date and Time columns are dtype = object, should be updated to corresponding datetime formats.
There are no missing values on any of the records

2. describe(): Generates descriptive statistics that will provide visibility of the dispersion and shape of a dataset’s distribution. It excludes NaN values. It can be used for dataframe or a specific series.

By default, the results are for numeric types, but you can use the parameter include to show the stats for some or all the data types in the dataframe.

You can exclude specific data types from a dataframe as well with the exclude parameter:

By default, the percentiles returned by this function are the 25th, 50th and 75th. If you wish to change the percentiles in the results, you can add those by using the parameter percentiles. Values must be between 0 and 1. This applies for both dataframe and series:

3. value_counts(): returnscounts of unique values for the specified series. NaN values are excluded by default.

By default, the results are listed in descending order, but you can change it by using parameter ascending = True

You can get the frequency (instead of count) for each value by setting the normalize = True

4. nunique(): Count distinct observations. Can be used for a dataframe or a series. By default, it exclude the NaN values.

If you need to include the NaN values, use the parameter dropna= False

Since this dataset does not contains any NaN values, let us create a dataframe for this example:

d = {'A': [1, 2, None], 'B': [3, 4, 2], 'C': [3, None, None]}
df = pd.DataFrame(d)
df

Now see how the dropna parameter set to False changes the results:

5. sum(): Return the sum of the values for the requested axis. You can use it for both dataframe and series.

You can specify to apply the function only to numeric types by using the parameter: numeric_only = True

Another useful parameter is min_count where you can set the minimum number of records that must exists to apply the function.

Let’s create another dataframe to demo how this parameter works:

d = {'A': [1, 2, None, 5, 8], 'B': [3, 4, 2, 4, 5], 
     'C': [3, None, None, 3, 4]}
df = pd.DataFrame(d)
df

6. count(): Return number of non-NA/null observations.

Can be applied to both dataframe and series:

Use the parameter numeric_only = True to apply functions to only series with numeric data types.

Need Help with Researchers or Data Analysts, Lets Help you with Data Analysis & Result Interpretation for your Project, Thesis or Dissertation?

We are Experts in SPSS, EVIEWS, AMOS, STATA, R, and Python

7. Min and Max, Mean and Median:

a. min(): Return the minimum value

b. max(): Return the maximum value

c.mean(): Return the mean of the values

d. median(): Return the median of the values

These functions can be applied to both dataframe and series:

Summary

In this post we went over some functions to get summarized data from a pandas dataframe. We used .info() to get information about the structure and format of our dataframe. We used .describe() to get descriptive statistic about the data. Then we showed how to get the counts per columns with .counts(); values grouped and their counts with .value_counts(), how many unique values a column has with .nunique() and finally functions to obtain the maximum and minimum values with .max() and .min() as well as the median and mean values with .median() and .mean(). Thanks for reading!

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Help with any types of dissertations

Undergraduate dissertation

Choose from hundreds of experts who can assist you in completing your undergraduate dissertation! Prices start at $10 per page, with potential discounts for longer orders or extended deadlines

Master’s dissertation

If you are in the process of completing a Master’s degree, we can provide you with an experienced writer to finish your dissertation. We strive to offer a quick turnaround on tailored papers at an affordable price, starting at $10.30 per page.

Ph.D. or doctoral dissertation

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help

Dissertation Writing
Dissertation Proposal Writing
Dissertation Objectives Writing
Dissertation Literature Review
Dissertation Methodology
Dissertation Data Analysis

Need Our Services?

Thesis Writing Help

Thesis Proposal Writing
Thesis Problem Statement
Thesis Introduction Writing
Thesis Literature Review
Data Analysis for Thesis
Thesis Methodology Writing
Thesis Data analysis & Interpratation
Thesis Discussions Writing

Editing & Proofreading Services

Dissertation Proofreading & Editing
Thesis Proofreading & Editing
APA Formatting & References
Harvard Formating & Reference
MLA Format & Reference
Professional Formatting Services
Grammar and Structure Checking

How to Summarize Data with Pandas, Python

Summary

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Help with any types of dissertations

Undergraduate dissertation

Master’s dissertation

Ph.D. or doctoral dissertation

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help

Need Our Services?

Thesis Writing Help

Editing & Proofreading Services

Stuck with Your Research or Data Analysis Project?

Let Our Experts Help You:

Whatsapp Us:

Email Us:

We Make Sense out of your Data

CONTACT US

NAVIGATION

PRIVACY & TOS

How to Summarize Data with Pandas, Python

Summary

Get Help with Data Analysis, Research, Thesis, Dissertation and Assignments.

Help with any types of dissertations

Undergraduate dissertation

Master’s dissertation

Ph.D. or doctoral dissertation

Hire from among our most skilled experts to save your time and ease your workload. Prices for Ph.D. assistance start at $10.60 per page.

Dissertation Writing Help

Need Our Services?

Thesis Writing Help

Editing & Proofreading Services

Stuck with Your Research or Data Analysis Project?Let Our Experts Help You:

Whatsapp Us:

Email Us:

We Make Sense out of your Data

CONTACT US

NAVIGATION

PRIVACY & TOS

Stuck with Your Research or Data Analysis Project?

Let Our Experts Help You: