Let's go with simple SGD: You need to set learning rate, momentum, decay. You can start with simple Stochastic Gradient Descent (SGD), but there are many others. Then you need to choose a training algorithm. How many neurons to use in each layer? What activation functions to use? What weights initialization to use?Īrchitecture ready. How many layers to use usually 2 or 3 layers should be enough. You need some magic skills to train NN well. For NN you have more steps for preprocessing, so more steps to implement in the production system as well.įor RF, you set the number of trees in the ensemble (which is quite easy because of the more trees in RF the better) and you can use default hyperparameters and it should work. Keep in mind that all preprocessing that is used for preparing training data should be used in production. Scale features into the same (or at least similar) range.Convert categorical data into numerical. To conclude, for NN training, you need to do the following preprocessing: What is more, the gradients values can explode and the neurons can saturate, which will make it impossible to train NN. If you don't scale features into the same ranges then features with larger values will be treated as more important in the training, which is not desired. In the case of different ranges of features, there will be problems with model training. What is more, there is a need for feature scaling.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |