Summarized by Dodly:
AI's Wild Week: Games, DNA, Robots, and More!
Audio Summary
Summary
This week in AI saw the release of a versatile open-source model from ByteDance called Lance, capable of generating and editing images and videos with impressive control, even solving mazes. Apple introduced LTO, a 3D model generator that captures view-dependent object appearances. For better video quality, Flash GRPO was launched, enhancing human preference alignment. Reactive GWM emerged, enabling interactive game worlds with steerable NPCs. A new pixel-space image generator, L2P, achieved high-resolution outputs, surpassing some latent models. Alibaba's Quen 3.7 model advanced agentic capabilities and vision integration, while their Live Translate model now incorporates visual context for more accurate real-time translations. In robotics, a magnetic wall-climbing industrial robot and Hugging Face's open-source humanoid robot platform were showcased, alongside voice-controlled humanoid robots like Uni Tree G1. Meta released WaveFlow for adding sound to silent videos, though with limited checkpoint access. Stability AI launched Stable Audio 3 for music generation, and Pano World offers consistent 3D panorama tours for virtual homes. Fashion Chameleon by Alibaba enables real-time virtual try-ons in video, and Mega ASR provides robust transcription for noisy audio.