̴ı̴̴̡̡̡ ̡͌l̡̡̡ ̡͌l̡*̡̡ ̴̡ı̴̴̡ ̡̡͡| ̲▫̲͡ ̲̲͡▫̲̲͡͡ ̲|̡̡̡ ̡ ̴̡ı̴̡̡ ̡͌l̡̡̡̡._.

Thoughts on web3, tech, life, and everything else in between

Regularization

Regularization is the process of ‘smoothening out’ your loss function, such that it is less responsive to the peculiarities of a specific training dataset and therefore does better at interpolating to unseen data. It attempts to resolve the problem of fitting well on training data but doing badly on test data. This process involves adding a new term to the loss function. In machine learning, the term regularization is also generally used to refer to any strategy that helps improve generalization. ...

February 9, 2026 · 5 min · Lei

The Constituents of Errors

When we’re training our models, we’re minimizing loss. Yet, not all loss are created equal. Broadly speaking, there are 3 sources of errors: Noise Noise is the inherent randomness of the test data. Assuming that you have a model that perfectly fits the true underlying function, the test data you draw will still lie within the SD of the true data. This is error that cannot be gotten rid of. ...

February 8, 2026 · 8 min · Lei

Parameter Initialization

Typically when initializing parameters ($\beta$, $\Omega$) for a network, we choose values from a standard normal distribution (with mean 0 and variance $\sigma^2$). Now, the most important factor in preserving the stability of the network as we move past different layers (or functional transformations) is the variance of the initialization. This affects the magnitude of your preactivation f, activations h in the forward pass, as well as your gradients in your backward pass. This almost solely determines whether you’ll suffer from vanishing or exploding gradient problems. ...

February 8, 2026 · 4 min · Lei

On the Math of Back Propagation

It’s incredible how entire models can be represented as a series of equations, and the entire process of back propogation (the spirit matter of deep learning) can also be represented as a series of equations. I’ll type this out fully when I have the time, but here’s what most neural networks are, abstracted: The math of back propagation On the top left (1), you see a typical 3 layer neural network represented by a series of preactivations f and hidden units h functions. That’s all there is to it! It’s a very compact representation of a complex series of functional compositions. ...

February 8, 2026 · 4 min · Lei

On Stochastic Gradient Descent, Momentum, Adam

This is an overdue recap of the lecture I had 2 weeks ago, where we went through Stochastic gradient descent, momentum, and Adam. I’ve learned the benefits of SGD before, so I won’t go that deeply into it. But I think what’s good about the lecture is how it anchors it in the mathematics. Stochastic Gradient descent To avoid being trapped in a local minimum, at each iteration, the SGD algorithm choose a random subset of training data and computes the gradients from these examples alone. This is known as a minibatch. The parameter update will then consider only that batch alone. ...

February 6, 2026 · 7 min · Lei

Loss Functions

The second class of FT5011 took us through loss functions. To be honest, my knowledge of loss functions is a little wonky. Like, I know the general idea, but beyond the simplistic model of “loss as sum of squared errors”, I really don’t know that much. After being completely lost in the class, and subsequently reviewing the chapter on loss functions, I am happy to say that my mental model of loss functions has been sucessfully updated. ...

January 25, 2026 · 14 min · Lei

Technical Trading

I’ve always been a little skeptical about technical trading. One of the things that I find hard to buy is the notion that ‘risk’ can be measured purely by the standard deviation of the price of the stock. Surely this only measures how ‘volatile’ a stock is, but it is different from the notion of ‘risk’? Because a stock is risky if it goes in the opposite direction than the position you took and you lose capital, not that its price action may be a bit more neurotic. ...

January 21, 2026 · 3 min · Lei

Backtesting

One of the SMA (simple moving average) strategies proposed in the Hilpisch book goes like this: we construct 2 SMA indicators, one short (42 days) and one long (252 days), and whenever the short indicator is above the long indicator (indicative that recent tide is bullish), we take a long position, and whenever the long indicator is above the short indicator (the reverse), we take a short position. In the book example, they looked at 10 year data of EUR/USD pair. Whenever the green line is above the red line, we maintain a long position; whenever the green line dips below the red line, we maintain a short position: ...

January 20, 2026 · 3 min · Lei

Core Takeaways From the First FT5011 Lecture

I’ve just begun my second deep learning course with Prof Stanley Kok at NUS - for FT5011. Based on the first lecture alone, I have a feeling that this is going to be a great course. Stanley seems to be a very good explainer of concepts, and despite having learnt deep learning and neural networks before, there were still nuances from the first lecture that I want to write about. ...

January 18, 2026 · 6 min · Lei

I Asked ChatGPT How to Master Deep Learning and Its Answers Are Ridiculously Good

And so, I am reproducing it here for reference. My prompt: What are some best methods of gaining mastery in doing deep learning, for instance, dealing with quirks of different environments, shapes, data types, etc? Its response: Great question—and honestly, the fact that you’re thinking about quirks (envs, shapes, dtypes) is already a sign you’re moving from “tutorial deep learning” toward real mastery. I’ll break this into mindsets, habits, and deliberate practice loops—because deep learning skill compounds less from reading papers and more from debugging scars. ...

January 8, 2026 · 4 min · Lei