I recently took a class via the University of Michigan, “Computational Machine Learning for Scientists and Engineers” via the Continuum program of the Electrical and Computer Engineering department. The class typically had one assignment a week in the form of a codex, an interactive notebook similar to Jupyter. The concept for the ML approach was presented and then we were walked through how to program it with the Julia language. Julia is similar to Python but performance optimized like a compiled language which makes it a good fit for ML where many matrix operations are performed.
It was very cool to see how doing math on bunch of numbers in some matrices yields seemingly intelligent conclusions. However, the course did a good job of showing that when it comes down to it, the algorithm is still crunching a bunch of numbers and doesn’t have a understanding of the situation in a way that a human does. If the input is in a different format or even slightly different than what the model saw in training, it can misfire significantly.
A focus of the course was transforming data into a form that makes sense to an ML algorithm. For instance, we needed to transform a 2D image matrix into a vector in order for the model to learn the weights for each pixel. We also needed to normalize the pixel values and ensure it was grayscale.
I learned that reproducing another person’s work is not easy. Different versions of libraries can give different results and care must be taken to have the same computing environment. It’s important to detail the processing steps to get the data into the form that the ML algorithm works on: the division between test and training data, the encoding of categorical data, etc. I learned to describe the model itself: the layers, number of neurons, activation functions, learning rate, etc. These hyperparameters are often optimized through a trial and error process as it can be very hard to predict how the model will behave without actually training and running it.
2 classification tasks were used throughout the weeks. The MNIST handwriting classification dataset contained single written digits and the goal was to classify them into 0-9. The second task was classifying images of hands with the rock, paper, and scissors symbols. We started by coding in detail a single layer neural network with stochastic gradient descent. Then we moved to using the Flux library in Julia to do multi layer (deep) neural nets. We also worked with TensorFlow in Python so we could work in multiple environments. For each one, we learned how to evaluate the performance with training loss, classification accuracy, the confusion matrix, and the ROC curve.
At the end of the class we explored deep generative nets, which are interesting because they can produce a high dimensional output like an image instead of just a classification vector. Instead of taking a 62,500 element image vector and generating a 10 element classification vector, the model takes the classification vector and makes up an image to fit the description. Some elements of the input vector can correspond to a property of the object like its rotation or brightness. Somehow that neuron causes changes in the numbers throughout the neural network that end up with the right values in certain pixels to make those broad changes. Other applications of these models are to increase the resolution of an image by predicting the details based on what is nearby and even editing a photo to match a certain style.
Overall, the class provided a good understanding of how neural nets work, the work that has to be done to train them, and what needs to be in place for them to be successful. I have a sense now of how difficult it can be to do high dimensional tasks with large variations in situations like self driving. The work of preparing data for ML also has me seeing what tasks I work with that can be done with ML.