Need helps with my statistics questions

CIS-STA 3920 Project Assignment, Draft #1
Turn in 2 documents, one to each of the two portals.
Doc#1: Submit this project as a Microsoft Word document, only, to the designated Project
Portal. ; automatic deduction of 20 points if late up to 24 hours; after 24 hours, no credit.
Doc#2: Submit the Excel document in which you computed lagged stock return and lagged risk
data sets, from 2006 on, according to instructions I presented in class and in LN5.A. Submit to
the Data Portal.
For the Word document filename, put your last name first, then first name, then word
Project. For the Excel filename, put your last name first, then first name, then the word Data.
Overall Guidance
Showcase your ability to produce a well-crafted document, your capacity to learn how to do
something challenging on your own, and your ability to engage with the reader and to explain
what is happening. Demonstrate your ability to follow instructions and to work with integrity.
✓ On the cover page, include your name, the course number and date.
✓ On the cover page, show an interesting title and graphic, with the goal encouraging the reader
to turn to turn to the second page, rather than giving up at a glance. Use a graphic from your
paper – include your name in the title of the graph.
✓ On the second page, place a well-formatted Table of Contents.
✓ Number your pages.
✓ Number the questions in the same way as shown: 1, 2,…
✓ Answer each question at its numbered position. I will not look elsewhere for the answers.
✓ Insert your name in the main title of every R-graphic you show.
✓ Do not show the text of my questions. Instead, fold the question into the answer. For
example, instead of repeating my first question, you could say, “1. I am going to begin with
an introduction to the corporation from which…,” that is, don’t show the question.
✓ When using a quotation, use quotation marks and then a footnote to identify the source.
✓ Source all material taken from books, articles, my lecture notes, or the Web.
✓ For recommended layout style, review my lecture notes.
✓ Write in the first person.
✓ Share some of your (a) thought processes, (b) miscues, (c) workarounds, and (d) insights.
✓ Demonstrate engagement with the concepts and approaches taken in the Lecture Notes.
✓ Do not share your project work with anyone else; that is cheating.
✓ Do not plagiarize; do not plagiarize me.
TO START: First, select a classification methodology to learn in your project. Select either
Classification Trees or Support Vector Machines. Those topics are covered in the following
chapters in the ISLR text:
(a) Chapter 8, pages, 303-331 covers Tree-Based Methods (or CART, short for
classification and regression trees). Only show results for classification trees, not
regression trees.
(b) Chapter 9, pages, 337-368 covers SVM (“support vector machines”);
In terms of coverage, if you pick Chapter 8, Tree-Based Methods, get through boosting.
In Chapter 9, SVM, get through the section on support vector machines.
Appendix. After making your choice from above, get introduced to it by walking through the lab
example provided in the ISLR text. Show that work in an Appendix, including any graphs.
On those graphs, insert your name into the titles. To actually get the data used in the ISLR
examples, you will likely need to download an R package called ISLR; it contains the data
sets used in the text.
1. [20 points] Begin by introducing your reader to the corporation from which your stock data
comes. Tell the reader something you learned about that corporation that you found
interesting, something which would demonstrate to a recruiter that you possess curiosity and
the ability to employ it.
Explain to the reader why we took two different transformations of the price data,
return and risk. Illustrate your discussion with before-and-after graphics. Review LN2.A
and your LN2 homework as a refresher.
Then, using trimmed screenshots where needed from Excel, sketch out for the reader
how you converted your Yahoo-sourced stock data into lagged stock risk data set since
2. [20] First, draw a random sample of size n=300 without replacement from your stock return
data set. Recall that your stock return data contains a HiLo return column and standardized
log lag1 and log lag2 return columns. I will call this your n=300 stock return data set.
Show and explain how this is done.
Second, draw another random sample of size 300 without replacement from your stock
return data set. Recall that your stock risk data contains a HiLo risk column and
standardized log lag1 and log lag2 columns. I will call this your n=300 stock risk data set.
Next, using your n=300 stock return data set, walk through the steps covered by the
ISLR text for your chosen method, SVM or CART, explaining in your own words what you
are doing. Put your name into the title of any graphs you show. Where you are unclear as to
what is happening or why it is being done, say so, and document your efforts to work
towards understanding. Do not waste time trying to fake comprehension via plagiarism.
You are welcome to read up on from other sources concerning your chosen method,
including from textbooks, academic articles, and the web. However, carefully credit the
sources from which you gain insight, and do not plagiarize them! It is natural that you will
not understand much of what you read, but you can start wrestling with it and you can
document that wrestling.
Finally, run the program on your n=300 stock risk data set and compare the
performance to that of your n=300 stock return data set. Include use of the chi-square test.
Discuss the differences, the reasons that these would happen, and the lessons learned about
the nature of the stock market.
3. [20] Select one of the tuning parameters or decision criteria that lie beneath the surface of
your chosen methodology, CART or CART. Engage with it by researching beyond the
ISLR text. Then experiment with it. Experiment with your data and with other data sets.
Try decreasing or increasing n. Look at other sources for help, documenting the sources.
When borrowing text, use quotation marks and footnote the source. Do not plagiarize text
or graphics.
Here are some examples of possibilities from Chapter 8 on CART. On page 312, the
Gini index is defined, but what is it? Can you compute it yourself? How is the G statistic
used after it is computed? What is it compared against? Is that a parameter that you can
tune? Also, at the top of that page, the text says that the “…classification error rate is not
sufficiently sensitive.” See if you can demonstrate a lack of sensitivity! See if you can
figure out what is meant in this context by “sensitive?” To what is it not sensitive? The text
also says that entropy is an alternative to Gini and gives similar results. Do you find that to
be true?
Here is an example from Chapter 9 on SVM. On page 346, a parameter denoted C is
introduced, but what does it do? Demonstrate that you worked diligently on this problem:
the goal is to engage in with the issue, not to produce miracles of comprehension or a
plagiarism dump! Take a hands-on, practical approach: that requires experimentation.
4. [20] Create classification space plots for both of your n=300 data sets, using your chosen
methodology, SVM or CART. Be sure to explain how you went about this. Create the plot
using the same techniques that we did in our plots for other methods. Work it out yourself,
step by step.
If you are doing SVM, you will find that the SVM software automatically outputs
classification space plots. Do not show me those plots, as they will count ZERO on this
assignment. I am requiring that you create a classification space yourself, step by step. As
we know from trying to do that ourselves, it is not easy. Evidence of thoughtful and diligent
work is more important than getting your plots to work perfectly.
5. [20] Prepare a comparative study of knn, naive Bayes, logistic regression, and your selected
method. Make this comparison on your two n=300 data sets, splitting the data randomly in
half to get the training and testing sets.
Explain what you are doing as you go along, explain what you understand about what
distinguishes the methods, discuss reasons why the results vary, and why there might be
systematic differences in performance between return data and risk data. Show
classification space plots for knn, naïve Bayes, and logistic regression. At the end, show a
single table in which you summarize the overall correct forecast rate for the stock returns for
the four methods; then another table summarizing the performance on the stock risk data.

Don't use plagiarized sources. Get Your Custom Essay on
Need helps with my statistics questions
Just from $13/Page
Order Essay

Calculate the price of your paper

Total price:$26
Our features

We've got everything to become your favourite writing service

Need a better grade?
We've got you covered.

Order your paper
Live Chat+1(978) 822-0999EmailWhatsApp

Order your essay today and save 20% with the discount code GOLDEN

seoartvin escortizmir escortelazığ escortbacklink satışbacklink saleseskişehir oto kurtarıcıeskişehir oto kurtarıcıoto çekicibacklink satışbacklink satışıbacklink satışbacklink