Matplotlib Histograms

MatplotlibBeginner
Practice Now

Introduction

In this lab, you will learn how to create and customize histograms using Matplotlib, one of the most popular data visualization libraries in Python. A histogram is a powerful tool for visualizing the distribution of a numerical dataset. It groups numbers into ranges (or "bins") and displays the frequency of data points falling into each bin.

You will go through the following steps:

  1. Generate sample data using NumPy.
  2. Create a basic histogram.
  3. Customize the number of bins.
  4. Change the color and edge style of the histogram bars.
  5. Normalize the histogram to show probability density.

By the end of this lab, you will be able to generate informative and visually appealing histograms for your data analysis projects. All plots will be saved as image files, which you can view directly in the LabEx WebIDE.

This is a Guided Lab, which provides step-by-step instructions to help you learn and practice. Follow the instructions carefully to complete each step and gain hands-on experience. Historical data shows that this is a beginner level lab with a 84% completion rate. It has received a 100% positive review rate from learners.

Generate sample data using numpy.random

In this step, you will generate a set of sample data that we can use to plot a histogram. We will use the NumPy library, which is a fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays.

We will use the numpy.random.normal() function to generate data that follows a normal (or Gaussian) distribution. This is a common type of data distribution found in many real-world scenarios.

First, open the main.py file from the file explorer on the left side of the WebIDE. Then, add the following code to it. This code will import the numpy library and generate 1000 random numbers with a mean of 0 and a standard deviation of 1.

import numpy as np

## Generate 1000 data points from a normal distribution
## with a mean (loc) of 0 and a standard deviation (scale) of 1.
data = np.random.normal(loc=0, scale=1, size=1000)

print("Sample data generated successfully.")

To run the script, open a terminal in the WebIDE (Terminal -> New Terminal) and execute the following command. Your working directory is already /home/labex/project.

python3 main.py

You will see a confirmation message in the terminal.

Sample data generated successfully.

The data variable in your script now holds an array of 1000 numbers, ready for visualization in the next step.

Plot histogram using plt.hist(data)

In this step, you will create your first histogram. We will use the matplotlib.pyplot module, which provides a simple interface for creating plots. It is conventionally imported with the alias plt.

The core function for creating a histogram is plt.hist(). At its simplest, it takes a single argument: the array of data you want to plot.

Because we are in a non-interactive environment, we cannot display the plot directly with plt.show(). Instead, we must save the plot to a file using plt.savefig().

Update your main.py file with the following code. It adds the Matplotlib plotting logic to the data generation code from the previous step.

import numpy as np
import matplotlib.pyplot as plt

## Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)

## Create a histogram
plt.hist(data)

## Save the plot to a file
plt.savefig('/home/labex/project/histogram.png')

print("Basic histogram saved to histogram.png")

Now, run the script again from the terminal:

python3 main.py

You should see the following output:

Basic histogram saved to histogram.png

A new file named histogram.png will appear in the file explorer on the left. Double-click it to open and view your first histogram. It will show the frequency distribution of the random data you generated.

Histogram

Set number of bins using bins parameter

In this step, you will learn how to control the granularity of your histogram by setting the number of bins. A "bin" is an interval that represents a range of data. The number of bins can significantly affect how the distribution is interpreted. Too few bins can hide important details, while too many can create a noisy plot.

Matplotlib's plt.hist() function has a bins parameter that allows you to specify the number of bins. By default, Matplotlib chooses a reasonable number, but often you'll want to adjust it.

Let's modify the code to create a histogram with 30 bins. We will also save it to a new file, histogram_bins.png, to compare it with the previous plot.

Update your main.py file as follows:

import numpy as np
import matplotlib.pyplot as plt

## Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)

## Create a histogram with 30 bins
plt.hist(data, bins=30)

## Save the plot to a new file
plt.savefig('/home/labex/project/histogram_bins.png')

print("Histogram with 30 bins saved to histogram_bins.png")

Run the script from the terminal:

python3 main.py

The output will be:

Histogram with 30 bins saved to histogram_bins.png

Now, find histogram_bins.png in the file explorer and open it. Compare it with the first histogram. You should notice that the bars are narrower, providing a more detailed view of the data's distribution.

Histogram with 30 bins

Customize histogram color and edgecolor

In this step, you will customize the visual appearance of the histogram. A well-styled plot is easier to read and more professional. The plt.hist() function offers several parameters for styling, including color for the bar fill and edgecolor for the bar borders.

Let's change the bar color to a light blue and add black borders to make each bin stand out more clearly.

Modify your main.py file to include these new parameters. We will save this customized plot to histogram_color.png.

import numpy as np
import matplotlib.pyplot as plt

## Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)

## Create a histogram with 30 bins, custom color, and edgecolor
plt.hist(data, bins=30, color='skyblue', edgecolor='black')

## Save the plot to a new file
plt.savefig('/home/labex/project/histogram_color.png')

print("Styled histogram saved to histogram_color.png")

Execute the script in the terminal:

python3 main.py

You will see this message:

Styled histogram saved to histogram_color.png

Open the newly created histogram_color.png file. You will see a much more polished histogram with light blue bars and distinct black outlines.

Styled histogram

Normalize histogram using density=True

In this step, you will learn how to create a normalized histogram. By default, a histogram's y-axis represents the count of data points in each bin. However, sometimes it's more useful to view the distribution as a probability density. In a normalized histogram, the height of each bar is adjusted so that the total area of all bars equals 1.

This is achieved by setting the density parameter to True in the plt.hist() function. It's also good practice to add labels and a title to your plot to make it self-explanatory.

Let's update the script to create a normalized histogram and add descriptive labels.

import numpy as np
import matplotlib.pyplot as plt

## Generate sample data
data = np.random.normal(loc=0, scale=1, size=1000)

## Create a normalized histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black', density=True)

## Add title and labels
plt.title('Normalized Histogram of Sample Data')
plt.xlabel('Value')
plt.ylabel('Probability Density')

## Save the plot to a new file
plt.savefig('/home/labex/project/histogram_normalized.png')

print("Normalized histogram saved to histogram_normalized.png")

Run the final version of your script:

python3 main.py

The output will be:

Normalized histogram saved to histogram_normalized.png

Open histogram_normalized.png. Notice that the y-axis values are now much smaller. They represent probability density, not raw counts. The overall shape of the distribution remains the same, but the scale is now standardized, which is useful for comparing distributions of different-sized datasets.

Normalized histogram

Summary

Congratulations on completing this lab! You have learned the essential skills for creating and customizing histograms with Matplotlib in Python.

In this lab, you have:

  • Generated sample data using numpy.random.normal().
  • Plotted a basic histogram with plt.hist().
  • Controlled the number of bins using the bins parameter.
  • Styled your histogram with the color and edgecolor parameters.
  • Created a normalized probability density histogram using density=True.
  • Added a title and labels to your plot for better context.

Histograms are a fundamental tool in data exploration and analysis. The techniques you've learned here will enable you to effectively visualize the distribution of your own datasets. Feel free to continue experimenting with other parameters and plot types in Matplotlib.