When it comes to the Data Science stream then Data Visualization is equally important with Data Analysis. Sometimes, we understand the data even better by looking it through plots and figures. And this way we can analyze it more effectively once we connect with it via the visual representation. This is not something which I’m composing on my own but are the words from the engineers in Data Science domain.
The need of Data Visualization is too fulfilled by Python Language as it provides a wide range of libraries for the same, thus making the task much easier for the data scientists. Today in this post, we’ll be giving you insights of one such Data Visualization Library in Python, that is Matplotlib. So let’s get going without any delay!
Its a Python based plotting library which was adapted from MATLAB (another programming language) and thus this explains the relevance between the two. It is the oldest plotting library in Python Language and it can be seen in its plotting styles. However, its been under development to renew its styles, also some other libraries are built over Matplotlib which adds user-defined styling to the plots and graphs.
Start with its installation:
Run the command (in python) :
import matplotlib and if it throws any error then you may need to install it on your system using
pip install matplotlib(in the CMD) or if you’re using Anaconda Distribution then
conda install matplotlib would be suffice in the anaconda prompt. Congrats, you’re good to go now!
Importing “Matplotlib” in your code:
A shorthand plt is used with matplotlib import, just like we used to use ‘np’ for Numpy and ‘pd’ for Pandas, respectively. Run the following code which will do the import of matplotlib for you –
import matplotlib.pyplot as plt
Here, ‘pyplot’ is the module which contains the plotting functions and the styling parameters to plot any figure/bar/graph under matplotlib.
Also, if you are coding in the IPython notebook then
%matplotlib inline just after the import will help you to see the plots in your notebook but if not, then you have to use
plt.show() after every plot to see it.
Jargons used in Matplotlib:
There are only two words you need to know about before diving into this library. As explained in the official website of Matplotlib,
- Figure is the final image of plots, which can contain 1 or more axes on it.
- Axes in matplotlib, represents an individual plot of data.
Types of Plotting Interfaces with Matplotlib:
- State-based (aka MATLAB-based) Plotting Interface – It is very closely related to the MATLAB approach and syntax. Point to remember is that this interface updates the current Figure object instead of rendering the Axes, and thus can lead to complications in the long run.
- Object oriented based (OO) Plotting Interface – In this case, we use an instance of
axes.Axesin order to render visualizations on an instance of
figure.Figure. In other words, we create different Axes objects as we need, without fooling around with the Figure object. Also, we call methods that do the plotting directly from the Axes, thus giving us much more flexibility and power in customizing our plot.
Confused how will you differentiate? The code below, gives the basic syntax of the two.
Note :- OO-based approach is preferred over the state-based and the seniors in this domain too emphasizes over using it.
Types of Plots:-
- Line Plot – The line plot is used to plot data points that have a high co-variance i.e. one variable varies greatly with another. It can be plotted using the
axes.plot(x, y) or plt.plot(x, y)method depending upon the OO or the state-based approach (assuming, you’ve imported matplotlib.pyplot already). Also, we can specify marker style, marker color, labels, line style, line color, etc with the data points.
- Hist Plot – The histograms are plotted using hist() method. These can get user a clear understanding of the distribution of data points and its median and range of values. Histograms can be simple or stacked onto each other and each one is displayed as under :
Also, it is as simple as pie. Check out the code below. [ The value of alpha in the code below can vary from 0 (transparent) to 1 (opaque). ]
- Scatter Plot – Its one of the most used plot which shows the relationship between 2 or more variables. Hence it is used for multivariate data (data having 2 or more variables). A scatter plot can be made as below, using 2 or more variables. The x and y axis contains 2 data variables and then 2 further variables can be assigned to the plot which will vary the size and color of the markers points in the scatter plot.
- Bar Plot – Its used to visualize the categorical data that has few (probably < 10) categories. A bar plot can be drawn in matplotlib using
axes.bar()depending upon the interface method used for plotting. Check an example of bar plot here.
- Pie Plot/Chart – Pie charts are being used in almost every domain whether its weather plot in geography, reign plot in history or political party plot in democracy. Using matplotlib, user can plot pie charts as well. Also, the library provides features to style the pie plots. See an example below.
- Box Plot – Its used for univariate data, to get the range, mode, median of the data visually. (See the code and figure below)
- Violin Plot – It is used to get the range, median, etc characteristics of the univariate data. (See the code above and figure below)
That was more than enough for a beginner to take in. So chill, its over.
Summing it all up, all the necessary plots are available in matplotlib (as shown above) which makes it a useful library for data scientists, however Data Visualizations is more than just MatplotLib. (See the official Site for more plots and styles here)
Keep following us to get to know more about styling of plots and figures with Seaborn and Panda’s built-in Data Visualization features. Also, for any doubts email us or post them in comments. Until then, have a good day. Keep Learning and don’t forget to like and share this post with your fellow learners!