stats project on phone pricing
Mobile PriceClassification
Presentation Overview
01
02
03
04
Problem
and Data
Overview
4
Classificatio
n Methods
Variable
Importance
Analysis of
Results
The latest iPhone is
4x
more expensive than the oldest
model
What makes a
newer iPhone
more expensive?
Research Questions
01
Predictions
How do we price mobile phones?
02
Factor
Importance
What characteristics influence mobile
phone pricing?
Overview of Data and Methods
Dataset
Our dataset includes information and pricing about
2000 phone models. Each entry has 20 predictors
and 1 response variable with 4 levels.
Method
Use classification techniques to determine optimal
pricing of mobile phones given their features
1.
Techniques
2.
Classification
a. Multinomial Linear Regression (MLR)
b. K-nearest neighbors (KNN)
c. Linear Discriminant Analysis (LDA)
d. Quadratic Discriminant Analysis
(QDA)
Random Forest and Variable Importance
Response Variable: Price Range
Level 1
Level 2
Level 3
Level 4
Low Cost
Medium
Cost
High Cost
Very High
Cost
010
1
2
033
Variable Schematic
Numerical
Categorical
Battery
Power
4G
Capability
Front
Camera
Megapixels
Dual Sim
Support
RAM
Clock
Speed
Number of
Core
Processors
Primary
Camera
Megapixels
Price
Range
(Low, medium,
high, and very
high cost)
3G
Capability
Touch
Screen
Wifi
Capability
Internal
Memory
(GB)
Bluetooth
EDA
50%
Mercury
Mercury is the closest planet
to the Sun and the smallest
one in the Solar System
4 Classification Methods
Purpose of Classification:
Despite being red, Mars is a cold place
and is full of iron oxide
Multinomial Logistic
Regression (MLR)
Mercury is the closest planet to the
Sun and the smallest one
K-nearest neighbors
(KNN)
Venus has a beautiful name and is the
second planet from the Sun
Linear Discriminant
Analysis (LDA)
Neptune is the farthest planet in the
Solar System
Quadratic Discriminant
Analysis (QDA)
3 Performance Metrics and Confusion
Matrices
Accuracy
Describes whether
predicted values
match actual values of
the target field
Matthews
Correlation
Coefficient
Produces a high score
if the prediction
obtained good results
in all of the confusion
matrix categories (both
true and false
positive/negative
values
F1 Scores
Combines precision and
recall scores of a
classifier into a single
metric
Confusion Matrix
Benefits of QDA and Assumptions
Mars
Despite being red, Mars is a cold place. The
planet is full of iron oxide dust
Mercury
Mercury is the closest planet to the Sun and
the smallest one in the Solar System
Venus
Venus has a beautiful name and is the
second planet from the Sun. It’s terribly hot
Neptune
Neptune is the farthest planet in the Solar
System. It’s the fourth-largest by diameter
Saturn
Saturn is the ringed one. It’s a gas giant,
composed of helium and hydrogen
Random Forest and Variable Importance
50%
Ram
Despite being red, Mars
is a cold place
90%
Battery
Power
Mercury is the closest
planet to the Sun
75%
Neptune
Neptune is the farthest
planet from the Sun
20%
Saturn
Saturn is the ringed one
and a gas giant
Recommendations
QDA Optimal for Pricing
Mobile phone manufacturers can use this dataset and a QDA model to help them
predict the price range for which their product should be within based on their
features with 83% overall accuracy.
More RAM and Battery Power Drives Price
Based on the variable importance plots, we know that RAM and battery power are
the most important variables for classification. This means that if manufacturers are
looking to optimize revenue, they can create products that maximize RAM and
battery power in order to price their products higher.
01
02
Shortcomings & Further Research Questions
03
04
Neptune
Neptune is the farthest planet in the Solar System. It’s the fourth-largest by
diameter
Venus
Venus has a beautiful name and is the second planet from the Sun. It’s
terribly hot
Thanks for
listening!
Questions?
TEMPLATE
SLIDES
1
Mobile Price Classification
Abstract
With the latest iPhone model costing $1600, we wanted to understand what makes a mobile
phone more expensive than others? The purpose of this project was to use classification
techniques to determine the optimal pricing of mobile phones given their features. We also used
random forest to answer our second research question: “What characteristics influence mobile
phone pricing?”
The dataset includes information and pricing about 2000 phone models. Each entry has 20
predictors and one response variable with 4 levels. Our team utilized Multinomial Linear
Regression (MLR), K-nearest neighbors (KNN), Linear Discriminant Analysis (LDA), and
Quadratic Discriminant Analysis (QDA) to classify the phones into their respective price
categories accurately.
Across all 3 metrics that we analyzed, overall prediction accuracy, the Matthews correlation
coefficient, and F1 scores, we found that QDA was the best model. If phone manufacturers are
interested in understanding how to price their phone relative to others, QDA classification
models can help them do so. Lastly, we looked at variable importance plots to come to the
conclusion that increasing RAM and battery power on a phone model increases the price.
From these conclusions, we’re able to recommend to mobile phone manufacturers to use QDA
models to help them price their future products. If they’re looking to increase revenue, they can
create a phone that has more RAM and battery power which are the most important factors when
it comes to pricing.
Due to our original testing dataset not having true values for the price ranges, we were unable to
use it for the purpose of this dataset which may be a potential shortcoming for our project. In the
future, we would want to replicate the process with more data points.
Research Questions
Our primary research questions are:
● How do we price mobile phones?
● What characteristics influence mobile phone pricing?
Variables
2
Our dataset has already been split into training and testing data by the creator, but the testing file
didn’t have the true pricing information for entry which we deemed necessary for testing
efficacy, so for this project, we strictly used the training dataset. We further split the existing
training dataset to have 80% of its 2000 entries act as a training dataset and 20% for testing.
Table 1: Type of Predictors
Variable
Type
Description
battery_power
blue
clock_speed
dual_sim
fc
four_g
int_memory
m_dep
mobile_wt
n_cores
pc
px_height
px_width
ram
sc_h
sc_w
talk_time
three_g
touch_screen
wifi
Numerical
Categorical
Numerical
Categorical
Numerical
Categorical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Numerical
Categorial
Categorical
Categorical
Total energy battery can store in one time in mAh
Has Bluetooth or not
Speed at which microprocessor executes instruction
Has dual sim support or not
Front camera megapixels
Has 4G or not
Internal memory in Gigabytes
Mobile depth in cm
Weight of mobile phone
Number of cores of processor
Primary camera megapixels
Pixel resolution height
Pixel resolution width
Random access memory in Megabytes
Screen height of mobile in cm
Screen width of mobile in cm
Longest time a single battery charge can last
Has 3G or not
Has touch screen or not
Has wifi or not
The response variable, or variable for classification, is price_range. This is the range of mobile
phone prices with an interval from 0 to 3, where 0 is low cost, 1 is medium cost, 2 is high cost,
and 3 is very high cost.
After careful consideration, our team decided to remove some of the variables, depending if they
seemed repetitive or surface-level, but we didn’t make any other modifications to the dataset.
Table 2: Predictors of Interest
Variable
Type
Description
battery_power
Numerical
Total energy battery can store in one time in mAh
Levels
3
blue
clock_speed
dual_sim
fc
four_g
int_memory
n_cores
pc
ram
three_g
touch_screen
wifi
Categorical
Numerical
Categorical
Numerical
Categorical
Numerical
Numerical
Numerical
Numerical
Categorial
Categorical
Categorical
Has Bluetooth or not
Speed at which microprocessor executes instruction
Has dual sim support or not
Front camera megapixels
Has 4G or not
Internal memory in Gigabytes
Number of cores of processor
Primary camera megapixels
Random access memory in Megabytes
Has 3G or not
Has touch screen or not
Has wifi or not
Graphic 1: Variable Schematic
2
2
2
2
2
2
4
Exploratory Data Analysis (EDA)
Categorical Data (Contingency Tables & Bar Charts)
DUAL SIM
BLUETOOTH
5
3G
4G
WiFi Capability
Touch Screen
As seen from the charts derived from the categorical data,
6
Numerical Data (Bot Plots)
Battery Power
Clock Speed
RAM
Front Camera Megapixels
Internal Memory (GB)
Number of Core Processors
Primary Camera Megapixels
7
From the boxplots derived from the numerical data, we can see that there are slight variances in
the median for the majority of the numerical variables, but there’s a clear difference between
price groups when it comes to RAM and battery power.
Methodology (add commentary about necessary assumptions to be met for
each)
We explore 4 classification models to help us determine the best method for separating the
phones into their respective class labels.
We evaluated various performance metrics and confusion matrices to understand which
classification method was most accurate according to our training data.
The 3 main metrics we looked at were overall prediction accuracy, Matthews correlation
coefficient, and f1 scores.
Multinomial Logistic Regression
Table 3: Multinomial Logistic Regression
Metric
Estimator
Estimate
accuracy
mcc
f_meas
multiclass
multiclass
macro
0.8249694
0.7666196
0.8233849
K-Nearest Neighbors (KNN)
8
Table 4: KNN vs. MLR
Metric
MLR
KNN
accuracy
mcc
f_meas
0.8249694
0.7666196
0.8233849
0.4473684
0.2633334
0.4459020
Linear Discriminant Analysis (LDA)
Table 5: LDA vs. KNN vs. MLR
Metric
MLR
KNN
LDA
accuracy
mcc
f_meas
0.8249694
0.7666196
0.8233849
0.4473684
0.2633334
0.4459020
0.2513661
0.0228240
0.2479963
Quadratic Driscriminant Analysis (QDA)
Table 6: QDA vs. LDA vs. KNN vs. MLR
Metric
MLR
KNN
LDA
QDA
accuracy
mcc
f_meas
0.8249694
0.7666196
0.8233849
0.4473684
0.2633334
0.4459020
0.2513661
0.0228240
0.2479963
0.8333333
0.7773442
0.8370942
Table 7: Confusion Matrix for QDA
Truth
Prediction
0
1
2
3
0
84
7
0
0
1
3
69
15
0
2
0
12
81
15
3
0
0
9
71
9
As seen from the confusion matric, the model does a decent job in classifying the phones into
their correct price category. We see from the diagonal that there are a total of 305 values that are
‘true positive’ and ‘true negative’, and 61 misclassified phones. This means that our model is
more likely to be able to perform as well on a different dataset in the future, making it more
reliable and grounded.
Random Forest
Since we used 13 predictors, we also wanted to provide more insight into which variables have
the most impact on the pricing of a mobile phone. We first found the optimal number of
candidate predictors for our random forest by cross-validating using the out-of-bag error which
helped us conclude that 11 candidate predictors would produce the most accurate model. Then
we fit a random forest model and produced the variable importance plots based on two
importance statistics; permutation and Gini. Both plots below show that RAM and battery power
are the two most important factors.
Graphic 3: Decision Tree
Graphic 4: Variable Importance – MDA vs. MDI
10
Results and Analysis (needs more shortcomings and next steps? more data for
training will give us higher accuracy?)
Comparing across the 4 methodologies, we see that QDA performs the best across all 3
performance metrics. There are 3 assumptions for QDA which is that it assumes a multivariate
normal distribution for each group, which is satisfied based on our large sample size, different
mean vectors for each group, and different covariance matrices for each group. Our QDA model
in specific was able to predict mobile phone pricing with 83% overall accuracy which means the
model itself could be useful for phone manufacturers looking to understand how to price their
new products. If they were to enter the specifications as inputs, the model would classify which
price range the phone should be within and manufacturers can price the phone accordingly based
on peers within the same class.
From our random forest variable importance plots, we found that RAM and battery power are the
most important determinants of pricing. From our initial EDA, we saw that the higher price
ranges had higher median RAM and battery power. This signals to industry players that phones
with more RAM and battery power are more valued and expensive. From the results of our
analysis, an interesting question to further explore is understanding how trading off RAM versus
battery power would affect pricing and consumer demand.
We've got everything to become your favourite writing service
Money back guarantee
Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.
Confidentiality
We don’t share your private information with anyone. What happens on our website stays on our website.
Our service is legit
We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.
Get a plagiarism-free paper
We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.
We can help with urgent tasks
Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.
Pay a fair price
Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.