Why do all three of the reasonably okay AI music tools (Udio, Suno, Riffusion) have fairly similar artifacts? Except for, I think, older versions of Udio, they all sound consistently off in some way I don't know enough music theory to explain, particularly in metal vocals and/or complex instrumentals. Do they all use the same autoencoders or something?
Street-Fighting Mathematics is not actually related to street fighting, but you should read it if you like estimating things. There is much power in being approximately right very fast, and it contains many clever tricks which are not immediately obvious but are very powerful. My favourite part so far is this exercise - you can uniquely (up to a dimensionless constant) identify this formula just from some ideas about what it should contain and a small linear algebra problem!
People are claiming (I don't know much RL) that DeepSeek-R1's training process is very simple (based on the paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf) - a boring standardish (for LLMs) RL algorithm optimizing for reward on some ground-truth-verifiable tasks (they don't say which). So why did o1 not happen until late 2024 (public release) or late 2023 (rumours of Q*)? "Do RL on useful tasks" is a very obvious idea. I think the relevant algorithms are older than that.
The paper says that they tried applying it to smaller models and it didn't work nearly as well, so "base models were bad then" is a plausible explanation, but it's clearly not true - GPT-4-base is probably a generally better (if costlier) model than 4o, which o1 is based on (could be distillation from a secret bigger one though); and LLaMA-3.1-405B used a somewhat similar postttraining process and is about as good a base model, but is not competitive with o1 or R1. So I don't think it's that.
What's going on here? The process is simple-sounding but filled with pitfalls DeepSeek don't mention? What has changed between 2022/23 and now which means we have at least three decent long-CoT reasoning models around?
there is a very large quantity of widely dispersed gods and you don't know about the vast majority of them
there are quite a few gods, but a bounded amount
there is exactly one god
there are exactly zero gods
By extrapolation, we can conclude that the next step is that humanity has negative one god, i.e. is in theological debt and must build a god to continue. This is where the EY-style "aligned singleton" came from. But people are now moving toward "we need everyone to have pocket gods" because they are insane, in line with the pattern. The next step is of course "we need to build gods and put them in everything".
It annoys me that my bank makes it so onerous to send payments ever. Five confirm screens and an 8-character base36 OTP I can't fit in working memory. I get why (they are required to reimburse you if you get defrauded and happen to use the bank's push payments while being defrauded, in some circumstances) but this is a very silly consequence.
DeepSeek V3 was unexpectedly released recently. It's a decently big (685 billion parameters) model and apparently outperforms Claude 3.5 Sonnet and GPT-4o on a lot of benchmarks. And they release the base model! Very cool. Some notes:
They don't make this comparison, but the GPT-4 technical report has some benchmarks of the original GPT-4-0314 where it seems to significantly outperform DSv3 (notably, WinoGrande, HumanEval and HellaSwag). I can't easily find evaluations of current-generation cost-optimized models like 4o and Sonnet on this. Is this just because GPT-4 benefits lots from posttraining whereas DeepSeek evaluated their base model, or is the model still worse in some hard-to-test way? GPT-4 is 1.8T trained on about as much data.
It's conceivable that GPT-4 (the original model) is still the largest (by total parameter count) model (trained for a useful amount of time). The big labs seem to have mostly focused on optimizing inference costs, and this shows that their SOTA models can mostly be matched with ~600B. We cannot rule out larger, better models not publicly released or announced, of course.
DeepSeek has absurd engineers. They have 2048 H800s (slightly crippled H100s for China). LLaMA 3.1 405B is roughly competitive in benchmarks and apparently used 16384 H100s for a similar amount of time. This is due to some standard optimizations like Mixture of Experts (though their implementation is finer-grained than usual) and some newer ones like Multi-Token Prediction - but mostly because they fixed everything making their runs slow. They avoid tensor parallelism (interconnect-heavy) by carefully compacting everything so it fits on fewer GPUs, designed their own optimized pipeline parallelism, wrote their own PTX (roughly, Nvidia GPU assembly) for low-overhead communication so they can overlap it better, fix some precision issues with FP8 in software, casually implement a new FP12 format to store activations more compactly and have a section suggesting hardware design changes they'd like made.
It should in principle be significantly cheaper to host than LLaMA-3.1-405B, which is already $0.8/million tokens.
Collect Arbitrary Points and achievements by doing things on this website! See how many you have! Do nothing with them because you can't! This is the final form of gamification.
It is pitch black (if you ignore all of the lighting). You are likely to be eaten by Heavpoot's terrible writing skills, and/or lacerated/shot/[REDACTED]. Vaguely inspired by the SCP Foundation.
A Reverse Polish Notation (check wikipedia) calculator, version 2. Buggy and kind of unreliable. This updated version implements advanced features such as subtraction.
Your favourite* tic-tac-toe game in 3 dimensions, transplanted onto the main website via a slightly horrifically manual process! Technically this game is solved and always leads to player 1 winning with optimal play, but the AI is not good enough to do that without more compute!
Type websocket URLs in the top bar and hit enter; type messages in the bottom bar, and also hit enter. Probably useful for some weirdly designed websocket services.
Over on the Airframes Community forum, user 'thebaldgeek' has posted a review of our Discovery Dish product. If you weren't already aware, the Discovery Dish is an easy-to-set-up and use backyard dish system for weather satellites, Inmarsat, and...
We take a quick look at the Samsung Pro Plus 1TB now with claimed 180MB/s speeds and see how it performs compared to its competition The post Samsung Pro Plus 1TB microSDXC Card Review appeared first on ServeTheHome.
It seems like all of a sudden, several companies including Meta are talking about the issue of disparity. Of particular issue with Meta, Magic Leap, and Avegant is the issue of visual distortion due to the frame twist of AR glasses frames.