Humber College Data Mining Programming Worksheet

  1. Using the 20 newsgroup data do the following:Do the pre-processing. This step is application dependent and so you want to read till the end of the task description before deciding what pre-processing steps you’ll choose to applyCreate plots, using matplotlib, to show the following (for each topic in the data separately and save the plots to file):Most frequent words, bigrams and trigramsWord cloud plotsHistogram of word and sentence lengthUse both Matrix Factorization (LSA) and the LDA algorithms to do topic modelling. The output is a sequence of 10 words for each topicCompare your topics between LSA and LDA and prepare yourself for questions about it (and other subjects) during your presentation.Use the labels provided in the dataset to measure the performance of both algorithms based on both accuracy and the F1 scoreLSA and LDA are unsupervised algorithms. In this part, try to apply logistic regression to this problem to see if you can predict the topic in a supervised fashion. Note that this problem no longer is a binary classification problem. You have to find a way to convert it to binary classification.

NOTES1: The 20 newsgroup dataset (KAGGLE) has 2 parts when you download it, there is a train file and a test file. All the items in this project should be done on the train dataset. Test dataset should only be used to measure/illustrate the performance of your model. The reported performances should not be reported on the train dataset.

NOTES2: You will be required to run your project during the presentation.

Calculate your order
275 words
Total price: $0.00

Top-quality papers guaranteed

54

100% original papers

We sell only unique pieces of writing completed according to your demands.

54

Confidential service

We use security encryption to keep your personal data protected.

54

Money-back guarantee

We can give your money back if something goes wrong with your order.

Enjoy the free features we offer to everyone

  1. Title page

    Get a free title page formatted according to the specifics of your particular style.

  2. Custom formatting

    Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.

  3. Bibliography page

    Don’t pay extra for a list of references that perfectly fits your academic needs.

  4. 24/7 support assistance

    Ask us a question anytime you need to—we don’t charge extra for supporting you!

Calculate how much your essay costs

Type of paper
Academic level
Deadline
550 words

How to place an order

  • Choose the number of pages, your academic level, and deadline
  • Push the orange button
  • Give instructions for your paper
  • Pay with PayPal or a credit card
  • Track the progress of your order
  • Approve and enjoy your custom paper

Ask experts to write you a cheap essay of excellent quality

Place an order