So this week, I just focus on choosing the best method for my fyp. My project is about predict the altered state of consciousness. So from the input given which are 31 attributes for source(input) and 2 attributes for target(output).
Actually this project has been done by my senior, and he used neural network to predict both target. So now I has to use other method to predict the targets. I have done some research regarding predictive modelling.
https://en.wikipedia.org/wiki/Predictive_modelling
And also other sources that may be related :
https://www.quora.com/What-are-some-Machine-Learning-algorithms-that-you-should-always-have-a-strong-understanding-of-and-why
http://www.tutorialspoint.com/data_mining/dm_classification_prediction.htm
http://rayli.net/blog/data/top-10-data-mining-algorithms-in-plain-english/
So based on the research that I've been done, like in the Quora link given above,
Sean Owen encouraged to use Random Forest for classification/regression. Also other method that catch my attention is Naive Bayes.
Based on the data that have been given to me by my supervisor (Shamimi A. Halim), so I started to play it with my weka tools.
Four method I used in this research :
1) Multilayer Perceptron (Backpropogation)
2) Naive Bayes
3) Random Forest
4) Logistic Regression
Multilayer Perceptron
Want to learn more :
https://en.wikipedia.org/wiki/Multilayer_perceptron
For this try-n-error research, the data have been preprocessed and just focus on one output which is status(Alive or Dead). Data set for training is 90%, the other 10% for testing. Total data is 204.
Parameter :
Result :
From the above result, I only got 70% accuracy.
So after try other parameters, I got the best(maybe?) parameter which have 3 hidden layers.
Parameter :
Result :
From the result I got 85% accuracy.
Naive Bayes
Want to learn more :
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
No parameter.
Result :
From the result I got 75% accuracy.
Random Forest
Want to learn more :
http://www.listendata.com/2014/11/random-forest-with-r.html
https://en.wikipedia.org/wiki/Random_forest
Parameter :
Result :
From the above result, I only got 70% accuracy.
So after try other parameters, I got the best(maybe?) parameter which numFeatures(number of features) set to 6 and (I dont know the function of seed, but I think it is related to randomness)seed set to 5.
Parameter :
Result :
From the image above, we got 85% accuracy which is same with Multi Layer Perceptron.
Logistic Regression
So why I choose Logistic Regression to try-n-error? Based on the definition in
wikipedia :
In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical.
Based on my output, which are Alive or Dead, it is categorical. That's why I try this method too.
Parameter :
Result :
Yeah! 85% accuracy. Same with Multi Layer Perceptron and Random Forest result.
So what now? I dont know. Lol. Maybe I have to study the algorithms before decide which method suitable and efficient for the data.