there is a very large quantity of widely dispersed gods and you don't know about the vast majority of them
there are quite a few gods, but a bounded amount
there is exactly one god
there are exactly zero gods
By extrapolation, we can conclude that the next step is that humanity has negative one god, i.e. is in theological debt and must build a god to continue. This is where the EY-style "aligned singleton" came from. But people are now moving toward "we need everyone to have pocket gods" because they are insane, in line with the pattern. The next step is of course "we need to build gods and put them in everything".
It annoys me that my bank makes it so onerous to send payments ever. Five confirm screens and an 8-character base36 OTP I can't fit in working memory. I get why (they are required to reimburse you if you get defrauded and happen to use the bank's push payments while being defrauded, in some circumstances) but this is a very silly consequence.
DeepSeek V3 was unexpectedly released recently. It's a decently big (685 billion parameters) model and apparently outperforms Claude 3.5 Sonnet and GPT-4o on a lot of benchmarks. And they release the base model! Very cool. Some notes:
They don't make this comparison, but the GPT-4 technical report has some benchmarks of the original GPT-4-0314 where it seems to significantly outperform DSv3 (notably, WinoGrande, HumanEval and HellaSwag). I can't easily find evaluations of current-generation cost-optimized models like 4o and Sonnet on this. Is this just because GPT-4 benefits lots from posttraining whereas DeepSeek evaluated their base model, or is the model still worse in some hard-to-test way? GPT-4 is 1.8T trained on about as much data.
It's conceivable that GPT-4 (the original model) is still the largest (by total parameter count) model (trained for a useful amount of time). The big labs seem to have mostly focused on optimizing inference costs, and this shows that their SOTA models can mostly be matched with ~600B. We cannot rule out larger, better models not publicly released or announced, of course.
DeepSeek has absurd engineers. They have 2048 H800s (slightly crippled H100s for China). LLaMA 3.1 405B is roughly competitive in benchmarks and apparently used 16384 H100s for a similar amount of time. This is due to some standard optimizations like Mixture of Experts (though their implementation is finer-grained than usual) and some newer ones like Multi-Token Prediction - but mostly because they fixed everything making their runs slow. They avoid tensor parallelism (interconnect-heavy) by carefully compacting everything so it fits on fewer GPUs, designed their own optimized pipeline parallelism, wrote their own PTX (roughly, Nvidia GPU assembly) for low-overhead communication so they can overlap it better, fix some precision issues with FP8 in software, casually implement a new FP12 format to store activations more compactly and have a section suggesting hardware design changes they'd like made.
It should in principle be significantly cheaper to host than LLaMA-3.1-405B, which is already $0.8/million tokens.
When analyzing algorithms, O(log n) is actually the same as O(1), because log n ≤ 64. Don't believe me? Try materializing 2^64 things on your computer. I dare you.
Apparently "hyperbolic discounting" - the phenomenon where humans incorrectly weight future rewards ("incorrectly" in that if you use any curve which isn't exponential you will regret it at some point) - isn't necessarily some kind of issue of "self-control", or due to uncertain future gains. It results from humans being really bad at calculating exponentials.
Experiments
Various web projects I have put together over many years. Made with at least four different JS frameworks. Some of them are bad.
Collect Arbitrary Points and achievements by doing things on this website! See how many you have! Do nothing with them because you can't! This is the final form of gamification.
It is pitch black (if you ignore all of the lighting). You are likely to be eaten by Heavpoot's terrible writing skills, and/or lacerated/shot/[REDACTED]. Vaguely inspired by the SCP Foundation.
A Reverse Polish Notation (check wikipedia) calculator, version 2. Buggy and kind of unreliable. This updated version implements advanced features such as subtraction.
Your favourite* tic-tac-toe game in 3 dimensions, transplanted onto the main website via a slightly horrifically manual process! Technically this game is solved and always leads to player 1 winning with optimal play, but the AI is not good enough to do that without more compute!
Type websocket URLs in the top bar and hit enter; type messages in the bottom bar, and also hit enter. Probably useful for some weirdly designed websocket services.
This is the AMD SMC for its Instinct 8-GPU UBB found in its AI servers to help integrate the GPU assembly into servers The post This is the AMD SMC for its Instinct UBB appeared first on ServeTheHome.
Jack Dorsey, former CEO of Twitter, ousted board member of BlueSky, and grifter extraordinaire to the tune of a $5.6B net worth, is giving a keynote at FOSDEM. The FOSDEM keynote stage is one of the biggest platforms in the free software community....
Everything put into the building that is unnecessary, every cubic foot that is used for purely ornamental purposes beyond that needed to express its use and to make it harmonize with others of its class, is a waste — is, to put it in plain English,...