**̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡| ̲▫̲͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡ ̡͌l̡̡̡̡.**_.

Thoughts on AI, life, and everything else in between

Sanity Checking the Results of My Cnn Stock Price Predictor

For my assignment, I was training a 1D CNN model to predict stock prices. Whilst most of the models gave 0.5% accuracy, there was one version of the model trained with a learning rate of 1e-3 that was giving me 80% accuracy. I am a little skeptical. It feels a little too good to be true. A model that gave 0.8% accuracy would mean that this model would be extremely profitable. Sounds a litle too good to be true? ...

Observations Training 3 Models on MNISTFashion

I wanted to see what are the effects of training some simple models on this dataset, namely, a Shallow 1-layer neural network, a deep 3-layer neural network, a variant of the deep neural network with wider dimensions, as well as a simple 1-layer CNN. The results of the models are as follows: Model Dimensions Accuracy Shallow NN 16 78.4% Deep 3-layer NN 8-8-8 79.2% Deep 3-layer NN 16-12-8 80.6 CNN 8 82.6% These corresond to models #1-4 on this page: https://models.minimumloss.xyz/ ...

Regularization

Regularization is the process of ‘smoothening out’ your loss function, such that it is less responsive to the peculiarities of a specific training dataset and therefore does better at interpolating to unseen data. It attempts to resolve the problem of fitting well on training data but doing badly on test data. This process involves adding a new term to the loss function. In machine learning, the term regularization is also generally used to refer to any strategy that helps improve generalization. ...

The Constituents of Errors

When we’re training our models, we’re minimizing loss. Yet, not all loss are created equal. Broadly speaking, there are 3 sources of errors: Noise Noise is the inherent randomness of the test data. Assuming that you have a model that perfectly fits the true underlying function, the test data you draw will still lie within the SD of the true data. This is error that cannot be gotten rid of. ...

Parameter Initialization

Typically when initializing parameters ($\beta$, $\Omega$) for a network, we choose values from a standard normal distribution (with mean 0 and variance $\sigma^2$). Now, the most important factor in preserving the stability of the network as we move past different layers (or functional transformations) is the variance of the initialization. This affects the magnitude of your preactivation f, activations h in the forward pass, as well as your gradients in your backward pass. This almost solely determines whether you’ll suffer from vanishing or exploding gradient problems. ...

On the Math of Back Propagation

It’s incredible how entire models can be represented as a series of equations, and the entire process of back propogation (the spirit matter of deep learning) can also be represented as a series of equations. I’ll type this out fully when I have the time, but here’s what most neural networks are, abstracted: The math of back propagation On the top left (1), you see a typical 3 layer neural network represented by a series of preactivations f and hidden units h functions. That’s all there is to it! It’s a very compact representation of a complex series of functional compositions. ...

On Stochastic Gradient Descent, Momentum, Adam

This is an overdue recap of the lecture I had 2 weeks ago, where we went through Stochastic gradient descent, momentum, and Adam. I’ve learned the benefits of SGD before, so I won’t go that deeply into it. But I think what’s good about the lecture is how it anchors it in the mathematics. Stochastic Gradient descent To avoid being trapped in a local minimum, at each iteration, the SGD algorithm choose a random subset of training data and computes the gradients from these examples alone. This is known as a minibatch. The parameter update will then consider only that batch alone. ...

Loss Functions

The second class of FT5011 took us through loss functions. To be honest, my knowledge of loss functions is a little wonky. Like, I know the general idea, but beyond the simplistic model of “loss as sum of squared errors”, I really don’t know that much. After being completely lost in the class, and subsequently reviewing the chapter on loss functions, I am happy to say that my mental model of loss functions has been sucessfully updated. ...

Technical Trading

I’ve always been a little skeptical about technical trading. One of the things that I find hard to buy is the notion that ‘risk’ can be measured purely by the standard deviation of the price of the stock. Surely this only measures how ‘volatile’ a stock is, but it is different from the notion of ‘risk’? Because a stock is risky if it goes in the opposite direction than the position you took and you lose capital, not that its price action may be a bit more neurotic. ...

Backtesting

One of the SMA (simple moving average) strategies proposed in the Hilpisch book goes like this: we construct 2 SMA indicators, one short (42 days) and one long (252 days), and whenever the short indicator is above the long indicator (indicative that recent tide is bullish), we take a long position, and whenever the long indicator is above the short indicator (the reverse), we take a short position. In the book example, they looked at 10 year data of EUR/USD pair. Whenever the green line is above the red line, we maintain a long position; whenever the green line dips below the red line, we maintain a short position: ...