Machine Learning Programming Code
Machine Learning (online)Coursework
2. Coursework resources
2.3. Part 3
As a reminder:
Finally, in the third part you will be given a classification problem. The analysis
will contain similar steps with the 2nd part but you should be able to interpret
the output from different models and compare their predictive performance
taking into account that the response variable will be binary. In addition to
appropriate regression or discriminant analysis, Tree-based methods, Nonlinear models or other suitable techniques can be used if you think they will
perform better.
Description for Bank Marketing Dataset
Task:
Build a Classification model to predict if the client will subscribe (yes/no) a
term deposit (variable y). Interpret the model and assess its predictive
performance.
Source:
[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to
Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier,
62:22-31, June 2014
Data Set Information:
The data is related with direct marketing campaigns of a Portuguese banking
institution. The marketing campaigns were based on phone calls. Often, more
than one contact to the same client was required, in order to access if the
product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.
Attribute Information:
Input variables:
# bank client data: 1 – age (numeric)
2 – job : type of job (categorical: ‘admin.’,’bluecollar’,’entrepreneur’,’housemaid’,’management’,’retired’,’selfemployed’,’services’,’student’,’technician’,’unemployed’,’unknown’)
3 – marital : marital status (categorical: ‘divorced’,’married’,’single’,’unknown’;
note: ‘divorced’ means divorced or widowed)
4 – education (categorical:
‘basic.4y’,’basic.6y’,’basic.9y’,’high.school’,’illiterate’,’professional.course’,’university.d
5 – default: has credit in default? (categorical: ‘no’,’yes’,’unknown’)
6 – housing: has housing loan? (categorical: ‘no’,’yes’,’unknown’)
7 – loan: has personal loan? (categorical: ‘no’,’yes’,’unknown’)
# related with the last contact of the current campaign:
8 – contact: contact communication type (categorical: ‘cellular’,’telephone’)
9 – month: last contact month of year (categorical: ‘jan’, ‘feb’, ‘mar’, …, ‘nov’,
‘dec’)
10 – day_of_week: last contact day of the week (categorical:
‘mon’,’tue’,’wed’,’thu’,’fri’)
11 – duration: last contact duration, in seconds (numeric). Important note: this
attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet,
the duration is not known before a call is performed. Also, after the end of the
call y is obviously known. Thus, this input should only be included for
benchmark purposes and should be discarded if the intention is to have a
realistic predictive model.
# other attributes:
12 – campaign: number of contacts performed during this campaign and for this
client (numeric, includes last contact)
13 – pdays: number of days that passed by after the client was last contacted
from a previous campaign (numeric; 999 means client was not previously
contacted)
14 – previous: number of contacts performed before this campaign and for this
client (numeric)
15 – poutcome: outcome of the previous marketing campaign (categorical:
‘failure’,’nonexistent’,’success’)
# social and economic context attributes
16 – emp.var.rate: employment variation rate – quarterly indicator (numeric)
17 – cons.price.idx: consumer price index – monthly indicator (numeric)
18 – cons.conf.idx: consumer confidence index – monthly indicator (numeric)
19 – euribor3m: euribor 3 month rate – daily indicator (numeric)
20 – nr.employed: number of employees – quarterly indicator (numeric)
Output variable (desired target):
21 – y – has the client subscribed a term deposit? (binary: ‘yes’,’no’)
Relevant Papers:
S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success
of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June
2014
S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct
Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al.
(Eds.), Proceedings of the European Simulation and Modelling Conference ESM’2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS.
[bank.zip]
Machine Learning (online)
Coursework
2. Coursework resources
2.2. Part 2
As a reminder:
In the second part you will be presented with a regression problem. The aim
would be to compare various models and techniques for their estimation to
allow meaningful interpretation and competitive predictive performance. The
latter should be assessed by appropriate experiments based on training and test
datasets. In addition to linear regression, Tree based methods, Non-linear
models or other suitable techniques can be used if you think they can provide
improvement.
Description for Student Performance Dataset
Task:
Build a regression model for the variable G3 (final grade) without using the
variables G1 and G2. Interpret the model and assess its predictive performance.
Source:
Paulo Cortez, University of Minho, Guimaraes, Portugal,
http://www3.dsi.uminho.pt/pcortez
Data Set Information:
This data approach student achievement in secondary education of two
Portuguese schools. The data attributes include student grades, demographic,
social and school related features) and it was collected by using school reports
and questionnaires.
Two datasets are provided regarding the performance in two distinct subjects:
Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008],
the two datasets were modeled under binary/five-level classification and
regression tasks.
Important note: the target attribute G3 has a strong correlation with attributes
G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd
period), while G1 and G2 correspond to the 1st and 2nd period grades. It is
more difficult to predict G3 without G2 and G1, but such prediction is much
more useful (see paper source for more details).
Attribute Information:
# Attributes for both student-mat.csv (Math course) and student-por.csv
(Portuguese language course) datasets:
1 school – student’s school (binary: ‘GP’ – Gabriel Pereira or ‘MS’ – Mousinho da
Silveira)
2 sex – student’s sex (binary: ‘F’ – female or ‘M’ – male)
3 age – student’s age (numeric: from 15 to 22)
4 address – student’s home address type (binary: ‘U’ – urban or ‘R’ – rural)
5 famsize – family size (binary: ‘LE3’ – less or equal to 3 or ‘GT3’ – greater than
3)
6 Pstatus – parent’s cohabitation status (binary: ‘T’ – living together or ‘A’ apart)
7 Medu – mother’s education (numeric: 0 – none, 1 – primary education (4th
grade), 2 – 5th to 9th grade, 3 -“ secondary education or 4 – higher education)
8 Fedu – father’s education (numeric: 0 – none, 1 – primary education (4th
grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
9 Mjob – mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’
(e.g. administrative or police), ‘at_home’ or ‘other’)
10 Fjob – father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’
(e.g. administrative or police), ‘at_home’ or ‘other’)
11 reason – reason to choose this school (nominal: close to ‘home’, school
‘reputation’, ‘course’ preference or ‘other’)
12 guardian – student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’)
13 traveltime – home to school travel time (numeric: 1 – 1 hour)
14 studytime – weekly study time (numeric: 1 – 10 hours)
15 failures – number of past class failures (numeric: n if 1
Top-quality papers guaranteed
100% original papers
We sell only unique pieces of writing completed according to your demands.
Confidential service
We use security encryption to keep your personal data protected.
Money-back guarantee
We can give your money back if something goes wrong with your order.
Enjoy the free features we offer to everyone
-
Title page
Get a free title page formatted according to the specifics of your particular style.
-
Custom formatting
Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.
-
Bibliography page
Don’t pay extra for a list of references that perfectly fits your academic needs.
-
24/7 support assistance
Ask us a question anytime you need to—we don’t charge extra for supporting you!
Calculate how much your essay costs
What we are popular for
- English 101
- History
- Business Studies
- Management
- Literature
- Composition
- Psychology
- Philosophy
- Marketing
- Economics