Three interesting things - 20231213
Interactive visualization of LLM architectures
https://bbycroft.net/llm by Brendan Bycroft
This is a really cool visualization! It does not only allow you to explore the architecture with zoom & rotation, but also has a detailed explanation of how information flow through the network, with a high-quality animation. Although it still requires some time to understand what’s going on in the visualization, it is extremely well done.
You can also check out source code: https://github.com/bbycroft/llm-viz
Pre-registration for predictive modeling
An interesting proposal to pre-registering predictive modeling (research) by @jakehofman et al.
https://arxiv.org/abs/2311.18807
Why do you need this? Although predictive (vs. explanatory) modeling is less prone to p-hacking due to the requirement to test on out-of-sample data, there are still many factors and choices that can make it vulnerable. Many predictive modeling tasks not only involve a lot of potential predictors but also require exploratory data analysis (EDA) and iterative model updates to arrive at good models. This can sometimes lead to overfitting and inappropriate re-use of data. Furthermore, there are still lots of researcher-degree-of-freedom like the choice of evaluation metrics, hyperparameters, etc.
The key idea from this paper is to do a two-step pre-registration: the first at the time of declaring the problem and the second after finishing the training of the model (with the model details and evaluation criteria). The first step would help clarifying and setting the problem concretely and the second step can ensure good generalizability.
llamafile
llamafile is a project by Mozilla Ocho (“Innovation and Experiments @ Mozilla”) that aims to distribute LLMs (and beyond?) as a highly-optimized single executable. First heard from Simon Willison but Kevin’s post finally nudged me to try it out.
I tried on my MacBook Pro and the ease and speed is impressive! You can run 7B parameter models no problem.