Index

osmarks' website

Now with handmade artisanal 1 bits!

Blog

Read my opinions via the internet.

2025-02-10 / 1.55k words
My new main router.
2024-02-25 / 3.44k words
How to run local AI slightly more cheaply than with a prebuilt system. Somewhat opinionated.
2025-01-26 / 1.84k words
Predicting the post-social world.
2025-01-24 / 4.17k words
Downloading and indexing everything* on Reddit on one computer.
2025-01-09 / 1.35k words
Computer algebra systems leave lots to the user and require task-specific manual design. Can we do better?
2024-11-01 / 2.65k words
Has Minecraft become easier?
2024-10-16 / 665 words
A slightly odd pattern I've observed.
2024-10-06 / 2.99k words
Or: why most AI hardware startups are lying.
2024-10-06 / 1.08k words
As ever, AI safety becomes AI capabilities.
2020-06-11 / 4.77k words
A nonexhaustive list of media which I like and which you may also be interested in.
2024-07-06 / 1.58k words
I got annoyed and rewrote everything.
2023-08-28 / 2.59k words
Powerful search tools as externalized cognition, and how mine work.
2024-05-12 / 1.29k words
What exactly is "magic" anyway?
2024-04-27 / 848 words
Please stop making chatbots.
2024-04-22 / 1.54k words
Absurd technical solutions for problems which did not particularly need solving are one of life's greatest joys.
2024-03-27 / 1.87k words
RSAPI and the rest of my infrastructure.
2023-09-24 / 1.64k words
This is, of course, all part of my evil plan to drive site activity through systematically generating (meta)political outrage.
2023-06-06 / 2.50k words
The history of the feared note-taking application.
2023-07-02 / 1.61k words
Why programming education isn't very good, and my thoughts on AI code generation.
2022-02-24 / 949 words
Learn about how osmarks.net works internally! Spoiler warning if you wanted to reverse-engineer it yourself.
2023-01-28 / 407 words
A common criticism of school is that it focuses overmuch on rote memorization. While I don't endorse school, I think this argument is wrong.
2022-05-14 / 463 words
RSS/Atom are protocols for Internet-based newsletter/feed services. They're surprisingly well-supported and you should consider using them.
2021-07-08 / 1.07k words
In which I get annoyed at yet more misguided UK government behaviour.
2020-05-20 / 582 words
Is solving Sudoku and similar puzzles by hand really useful in building computer science ability? We don't think so.
2017-08-16 / 940 words
We are not responsible if these tips cause your ship to implode/explode. Contains spoilers in vast quantities.
2018-08-14 / 688 words
Why I think that government programs telling everyone to "code" are pointless.
2020-01-25 / 145 words
It's slightly different now!
2018-06-01 / 737 words
My (probably unpopular in general but... actually likely fairly popular amongst this site's intended audience) opinions on smartphones today.

Microblog

Short-form observations.

Why do all three of the reasonably okay AI music tools (Udio, Suno, Riffusion) have fairly similar artifacts? Except for, I think, older versions of Udio, they all sound consistently off in some way I don't know enough music theory to explain, particularly in metal vocals and/or complex instrumentals. Do they all use the same autoencoders or something?

Street-Fighting Mathematics is not actually related to street fighting, but you should read it if you like estimating things. There is much power in being approximately right very fast, and it contains many clever tricks which are not immediately obvious but are very powerful. My favourite part so far is this exercise - you can uniquely (up to a dimensionless constant) identify this formula just from some ideas about what it should contain and a small linear algebra problem!

People are claiming (I don't know much RL) that DeepSeek-R1's training process is very simple (based on the paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) - a boring standardish (for LLMs) RL algorithm optimizing for reward on some ground-truth-verifiable tasks (they don't say which). So why did o1 not happen until late 2024 (public release) or late 2023 (rumours of Q*)? "Do RL on useful tasks" is a very obvious idea. I think the relevant algorithms are older than that.

The paper says that they tried applying it to smaller models and it didn't work nearly as well, so "base models were bad then" is a plausible explanation, but it's clearly not true - GPT-4-base is probably a generally better (if costlier) model than 4o, which o1 is based on (could be distillation from a secret bigger one though); and LLaMA-3.1-405B used a somewhat similar postttraining process and is about as good a base model, but is not competitive with o1 or R1. So I don't think it's that.

What's going on here? The process is simple-sounding but filled with pitfalls DeepSeek don't mention? What has changed between 2022/23 and now which means we have at least three decent long-CoT reasoning models around?

Religion has progressed, historically, from:

  • there is a very large quantity of widely dispersed gods and you don't know about the vast majority of them
  • there are quite a few gods, but a bounded amount
  • there is exactly one god
  • there are exactly zero gods

By extrapolation, we can conclude that the next step is that humanity has negative one god, i.e. is in theological debt and must build a god to continue. This is where the EY-style "aligned singleton" came from. But people are now moving toward "we need everyone to have pocket gods" because they are insane, in line with the pattern. The next step is of course "we need to build gods and put them in everything".

It annoys me that my bank makes it so onerous to send payments ever. Five confirm screens and an 8-character base36 OTP I can't fit in working memory. I get why (they are required to reimburse you if you get defrauded and happen to use the bank's push payments while being defrauded, in some circumstances) but this is a very silly consequence.

I finally got round to watching the political documentary "Yes, Minister". It would be very funny if it were fictional, which I am told it is not.

DeepSeek V3 was unexpectedly released recently. It's a decently big (685 billion parameters) model and apparently outperforms Claude 3.5 Sonnet and GPT-4o on a lot of benchmarks. And they release the base model! Very cool. Some notes:

  • They don't make this comparison, but the GPT-4 technical report has some benchmarks of the original GPT-4-0314 where it seems to significantly outperform DSv3 (notably, WinoGrande, HumanEval and HellaSwag). I can't easily find evaluations of current-generation cost-optimized models like 4o and Sonnet on this. Is this just because GPT-4 benefits lots from posttraining whereas DeepSeek evaluated their base model, or is the model still worse in some hard-to-test way? GPT-4 is 1.8T trained on about as much data.
  • It's conceivable that GPT-4 (the original model) is still the largest (by total parameter count) model (trained for a useful amount of time). The big labs seem to have mostly focused on optimizing inference costs, and this shows that their SOTA models can mostly be matched with ~600B. We cannot rule out larger, better models not publicly released or announced, of course.
  • DeepSeek has absurd engineers. They have 2048 H800s (slightly crippled H100s for China). LLaMA 3.1 405B is roughly competitive in benchmarks and apparently used 16384 H100s for a similar amount of time. This is due to some standard optimizations like Mixture of Experts (though their implementation is finer-grained than usual) and some newer ones like Multi-Token Prediction - but mostly because they fixed everything making their runs slow. They avoid tensor parallelism (interconnect-heavy) by carefully compacting everything so it fits on fewer GPUs, designed their own optimized pipeline parallelism, wrote their own PTX (roughly, Nvidia GPU assembly) for low-overhead communication so they can overlap it better, fix some precision issues with FP8 in software, casually implement a new FP12 format to store activations more compactly and have a section suggesting hardware design changes they'd like made.
  • It should in principle be significantly cheaper to host than LLaMA-3.1-405B, which is already $0.8/million tokens.

Mass-market robot dogs now beat biological dogs in TCO.

Experiments

Various web projects I have put together over many years. Made with at least four different JS frameworks. Some of them are bad.

A game about... apioforms... by Heavpoot.
Collect Arbitrary Points and achievements by doing things on this website! See how many you have! Do nothing with them because you can't! This is the final form of gamification.
Automatic score keeper, designed for handling Monopoly money.
Colorizes the Alphabet, using highly advanced colorizational algorithms.
The Limitless Grid screensaver (kind of) implemented in a somewhat laggy pixel shader.
An unfinished attempt to replicate an Apple screensaver.
Survive as long as possible against emus and other wildlife. Contributed by Aidan.
Fly an ominous flying square around above some ground! Includes special relativity!
A somewhat unperformant generator for pleasant watercolor-y "fractalart" images. Ported from a Haskell implementation by "TomSmeets".
My fork of GUIHacker. Possibly the only version actually on the web right now since the original website is down.
Obligatory (John Conway's) Game of Life implementation.
It is pitch black (if you ignore all of the lighting). You are likely to be eaten by Heavpoot's terrible writing skills, and/or lacerated/shot/[REDACTED]. Vaguely inspired by the SCP Foundation.
Generates ideas. Terribly. Don't do them. These are not good ideas.
The exciting multiplayer game of incrementing and decrementing! No cheating.
Outdoing all other websites with INFINITE PAGES!
Tells you how late Joe's homework is.
Lorem Ipsum (latin-like placeholder text), eternally. Somehow people have left comments at the bottom anyway.
Instead of wasting time thinking of the best political opinion to hold, simply pick them pseudorandomly per day with this tool.
A Reverse Polish Notation (check wikipedia) calculator, version 2. Buggy and kind of unreliable. This updated version implements advanced features such as subtraction.
Reverse Polish Notation calculator, version 3 - with inbuilt docs, arbitrary-size rational numbers, utterly broken float/rational conversion and quite possibly Turing-completeness.
Reverse Polish Notation calculator, version 4 - increasingly esoteric and incomprehensible. Contributed by Aidan.
Apply custom CSS to most pages on here.
Your favourite* tic-tac-toe game in 3 dimensions, transplanted onto the main website via a slightly horrifically manual process! Technically this game is solved and always leads to player 1 winning with optimal play, but the AI is not good enough to do that without more compute!
More dimensions. More confusion. Somewhat worse performance. 4D Tic-Tac-Toe.
A basic implementation of the WFC procedural generation algorithm.
Type websocket URLs in the top bar and hit enter; type messages in the bottom bar, and also hit enter. Probably useful for some weirdly designed websocket services.
Dice-rolling webapp. Not very useful pending me writing a good parser.
Unholy horrors moved from the depths of my projects directory to your browser. Theoretically, this is a calculator. Good luck using it.

Get updates to the blog (not experiments) in your favourite RSS reader using the RSS feed.

View some of my projects atmy git hosting.

Other blogs

View list
2025-02-10 / Money Stuff
CBS procedurals, Josh Allen MVP coin, SEC Crypto Task Force, RIP CFPB, volatility and some Treasuries don’t count.
Scott Alexander famously warned us to Beware Trivial Inconveniences.
2025-02-10 / rtl-sdr.com
Over on the Airframes Community forum, user 'thebaldgeek' has posted a review of our Discovery Dish product. If you weren't already aware, the Discovery Dish is an easy-to-set-up and use backyard dish system for weather satellites, Inmarsat, and...
2025-02-09 / ServeTheHome
We take a quick look at the Samsung Pro Plus 1TB now with claimed 180MB/s speeds and see how it performs compared to its competition The post Samsung Pro Plus 1TB microSDXC Card Review appeared first on ServeTheHome.
2025-02-08 / KGOnTech
It seems like all of a sudden, several companies including Meta are talking about the issue of disparity. Of particular issue with Meta, Magic Leap, and Avegant is the issue of visual distortion due to the frame twist of AR glasses frames.
2025-02-08 / Overcoming Bias
Something must be done.
Ship-mounted lasers, Santorini earthquakes, carbon sequestration via nuclear explosion, the fall of concentrating solar, and more, and more.

Comments