+1(505)531-9093

Drawing Scatter Plots Using Python Matplotlib

Today I will Teach you Drawing Scatter Plots Using Python Matplotlib

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [3]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_96f916b7fc744f3ab5333e0e2cad9f12 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='W8CkVL71ugQkqWUTy2TphiGSLypGg6egEmp1dYeUgS5c',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_96f916b7fc744f3ab5333e0e2cad9f12.get_object(Bucket='drawinglinegraphs-donotdelete-pr-b4fmduabcw5chs',Key='insurance.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()

Out[3]:

  age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520

In [4]:

data = df_data_1
print(data)

      age     sex     bmi  children smoker     region      charges
0      19  female  27.900         0    yes  southwest  16884.92400
1      18    male  33.770         1     no  southeast   1725.55230
2      28    male  33.000         3     no  southeast   4449.46200
3      33    male  22.705         0     no  northwest  21984.47061
4      32    male  28.880         0     no  northwest   3866.85520
...   ...     ...     ...       ...    ...        ...          ...
1333   50    male  30.970         3     no  northwest  10600.54830
1334   18  female  31.920         0     no  northeast   2205.98080
1335   18  female  36.850         0     no  southeast   1629.83350
1336   21  female  25.800         0     no  southwest   2007.94500
1337   61  female  29.070         0    yes  northwest  29141.36030

[1338 rows x 7 columns]

Need Help with Researchers or Data Analysts, Lets Help you with Data Analysis & Result Interpretation for your Project, Thesis or Dissertation?

We are Experts in SPSS, EVIEWS, AMOS, STATA, R, and Python

Scatter plots

To create a simple scatter plot, we use the sns.scatterplot command and specify the values for:

the horizontal x-axis (x=insurance_data[‘bmi’]), and the vertical y-axis (y=insurance_data[‘charges’]).In [7]:

## Scatterplot of BMI and Charges

sns.scatterplot(x=data['bmi'], y=data['charges'])

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f26d8787d10>

In [10]:

## The scatterplot above suggests that body mass index (BMI) and insurance charges are positively correlated, 
# where customers with higher BMI typically also tend to pay more in insurance costs. 
# (This pattern makes sense, since high BMI is typically associated with higher risk of chronic disease.)

## To double-check the strength of this relationship, you might like to add a regression line, 
# or the line that best fits the data. We do this by changing the command to sns.regplot.

In [11]:

sns.regplot(x=data['bmi'], y=data['charges'])

Out[11]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f26d871aa10>

Color-coded scatter plots

We can use scatter plots to display the relationships between (not two, but…) three variables! One way of doing this is by color-coding the points.

For instance, to understand how smoking affects the relationship between BMI and insurance costs, we can color-code the points by ‘smoker’, and plot the other two columns (‘bmi’, ‘charges’) on the axes. In this case, you add the codeIn [16]:

sns.scatterplot(x=data['bmi'], y=data['charges'], hue=data['smoker']), 
plt.title('How Smoking influence Relationship between BMI and Cahrges')

Out[16]:

Text(0.5, 1.0, 'How Smoking influence Relationship between BMI and Cahrges')

This scatter plot shows that while nonsmokers to tend to pay slightly more with increasing BMI, smokers pay MUCH more.

To further emphasize this fact, we can use the sns.lmplot command to add two regression lines, corresponding to smokers and nonsmokers. (You’ll notice that the regression line for smokers has a much steeper slope, relative to the line for nonsmokers!)In [19]:

sns.lmplot(x= 'bmi', y='charges', hue='smoker', data=data), 
plt.title('How Smoking influence Relationship between BMI and Cahrges')

Out[19]:

Text(0.5, 1.0, 'How Smoking influence Relationship between BMI and Cahrges')

The sns.lmplot command above works slightly differently than the commands you have learned about so far:

Instead of setting x=insurance_data[‘bmi’] to select the ‘bmi’ column in insurance_data, we set x=”bmi” to specify the name of the column only. Similarly, y=”charges” and hue=”smoker” also contain the names of columns. We specify the dataset with data=insurance_data.In [20]:

## How Sex influence Relationship between BMI and Cahrges


sns.lmplot(x= 'bmi', y='charges', hue='sex', data=data), 
plt.title('How Sex influence Relationship between BMI and Cahrges')

Out[20]:

Text(0.5, 1.0, 'How Smoking influence Relationship between BMI and Cahrges')

This scatter plot shows that while female to tend to pay slightly more with increasing BMI, male pay more.

Finally, there’s one more plot that you’ll learn about, that might look slightly different from how you’re used to seeing scatter plots. Usually, we use scatter plots to highlight the relationship between two continuous variables (like “bmi” and “charges”). However, we can adapt the design of the scatter plot to feature a categorical variable (like “smoker”) on one of the main axes. We’ll refer to this plot type as a categorical scatter plot, and we build it with the sns.swarmplot command.In [29]:

## Categorical Scatter Plot

sns.swarmplot(x = data['smoker'], 
             y = data['charges'])
plt.title("Categorical Scatter Plot - Charges by Smoker")

Out[29]:

Text(0.5, 1.0, 'Categorical Scatter Plot - Charges by Smoker')

Among other things, this plot shows us that:

on average, non-smokers are charged less than smokers, and the customers who pay the most are smokers; whereas the customers who pay the least are non-smokers.In [30]:

sns.swarmplot(x = data['sex'], 
             y = data['charges'])
plt.title("Categorical Scatter Plot - Charges by Sex")

Out[30]:

Text(0.5, 1.0, 'Categorical Scatter Plot - Charges by Sex')

In [ ]:

 
Data Analytics Services
Need Our Services?
Econometrics & Statistics Modelling Services
Need Help, Whatsapp Us Now