最新论文列表整理

字形生成：

[2025-05矢量图生成-大模型] OmniSVG: A Unified Scalable Vector Graphics Generation Model, paper

[2024-10] Crafting layered designs from pixels, ICLR 2025

[2024-07] JoyType: A Robust Design for Multilingual Visual Text Creation, paper, code, demo

[2024-03] Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering, paper, code

[2024-06] Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering, paper, code

[2024-11, NIPS效果好] TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control, paper, code

[2024-07-26] DP-Font: Chinese Calligraphy Font Generation Using Diffusion Model and Physical Information Neural Network, paper

[2024-07] GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models, paper, code

[2024- CVPR] Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models, paper

[2024-04-18] Dynamic Typography: Bringing Words to Life, paper, website

[2023-11-17] WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models, paper

[2024 Arxiv] FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation, paper

[2023 SIGA] Anything to glyph: Artistic font synthesis via text-to-image diffusion model, paper, website

[2024 CVPR] Ds-fusion: Artistic typography via discriminated and stylized diffusion, paper, website

[2024 CHI] TypeDance: Creating semantic typographic logos from image through personalized generation, video, paper

[2024-04-12] DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation, paper

[2023-12-19] Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model, paper, code,

[2023-12-19] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning, paper, code, demo [one-shot字体生成，有点耗时，风格不太像]

[2023-12-16] VecFusion: Vector Font Generation with Diffusion, paper, code [双阶段生成矢量字形]

[!2023-11-07] AnyText: Multilingual Visual Text Generation And Editing, paper, code

[2023-12-08] UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models, paper, code, website, demo

[2023 SIGGRAPH, 有趣的语义字] Word-as-image for semantic typography, website, paper

三维重建与生成：

[2025-05：做乐高] Generating Physically Stable and Buildable LEGO Designs from Text, Arxiv 2025, paper

[2025-01] VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment, paper, website

[2024-12] FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent, paper, code

[2024-10-17] DepthSplat: Connecting Gaussian Splatting and Depth, paper, code, website [Depth预测与3DGS渲染交辉呼应]

[2024-10] Disco4D: Disentangled 4D Human Generation and Animation from a Single Image, paper

[2024-07] Mip-Splatting: Alias-free 3D Gaussian Splatting, paper, code

[2024-03] CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field, code

[2024-05-17] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields, code, paper, website

[2024-05-07] A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose, paper, code

[2024-05-03, InstantMesh] InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, paper, code, demo

[2024-ICLR，效果不错，已试，人脸动作迁移] GPAvatar: Generalizable and Precise Head Avatar from Image(s), paper, code

[2024, CVPR] HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation, paper, code

[2024-03] SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting, paper, code, website（Mesh与Gaussian splatting的结合）

[2023-12，效果不错CVPR] GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians, paper, website, code

[2023-11] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics, paper, website

[2023-07] Dynibar: Neural dynamic image-based rendering, paper

[2023-12] PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting, paper

[2023-12] Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing, paper

[2023-12好] DUSt3R: Geometric 3D Vision Made Easy, paper

[2023-12] HumanTOMATO: Text-aligned whole-body motion generation, paper, code

[2023-NIPS] MotionGPT: Human motion as a foreign language, paper, code

[2023-NIPS] Motion-X: A large-scale 3d expressive whole-body human motion dataset, paper, code

[2023-12-04] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes, paper, code

[2023-12-04] GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis, paper

[2023-12-03] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians, paper

[2023-11-27] Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling, paper

[2024-CVPR] LiSA: LiDAR Localization with Semantic Awareness, paper【未出】

[2024-CVPR] LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes, paper【未出】

[2023-11-16] Adaptive Shells for Efficient Neural Radiance Field Rendering, paper, website, code

[2023-12-05] ReconFusion: 3D Reconstruction with Diffusion Priors, paper, website, code

[2023-08] A Survey on Deep Generative 3D-aware Image Synthesis, paper, website

[2023-12] Multimodal Image Synthesis and Editing: The Generative AI Era, paper

[2023-12-11] Relightable gaussian codec avatars, paper, code, website

[2023-12-11] CAD : Photorealistic 3D Generation via Adversarial Distillation, paper, code, website

[2023-12-08] Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing, paper, code, website

[2023-11-29] MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers, paper, code

[2023-11-22] LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes, paper, code

[2023] One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion, paper, website

[2023] One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, paper, website

[2023-ICCV] Zero-1-to-3: Zero-shot one image to 3d object, paper, code

[2023-] Prolific-dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, paper, code, website

[2023-08] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis, paper, code, website

[2023-10-12, 4D Gaussian Splatting] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering, paper, website, code

[!2023-07 TOG, 3D Gaussian Splatting] 3D Gaussian Splatting for Real-Time Radiance Field Rendering, paper, code, video, video2, video3, video4, video5, tutorial video, coding example1, blog, 教程（原作者，好！）

[2019, TOG, 3DGS的理论基础] Differentiable Surface Splatting for Point-based Geometry Processing, paper

[2023-07 TOG, NeRO, 效果不错] NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images, paper, website, code

[2023-ICCV, Zip-NeRF] Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields, paper

[2022-10] EVA3D: Compositional 3D Human Generation from 2D Image Collections, paper

[2022-09-29] Dreamfusion: Text-to-3d using 2d diffusion, paper, code, website

[2022-04-27, EG3D] Efficient Geometry-aware 3D Generative Adversarial Networks, paper

[2022-03-11] StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation, paper

[2020-02-23] PolyGen: An Autoregressive Generative Model of 3D Meshes, paper, code

[2020-04-16] 3D Morphable Face Models – Past, Present and Future, paper [3DMM相关的综述]

[2009 SIGGRAPH] A Morphable Model For The Synthesis Of 3D Faces, paper [3DMM开山之作]

[2021 SIGGRAPH, 试一试] ROSEFusion: Random Optimization for Online Dense Reconstruction under Fast Camera Motion, paper, code

[2016-06] Structure-from-Motion Revisited, paper, code

3D SLAM

[2025-arxiv，实时slam] GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping, paper

[2024-NIPS，强] Deep Patch Visual SLAM, paper, code

[2024] GlORIE-SLAM, code

[2024-] Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians, paper, code

[2024-eccv] Deep Patch Visual SLAM, paper, code

[2024-06 CVPR] Gaussian Splatting SLAM, paper

[2024-02-20, 3D SLAM综述] How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey, paper

[2023-12] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM, paper, website

[2023] LONER: LiDAR Only Neural Representations for Real-Time SLAM, paper, code

图像/视频生成：

[2023-07 可控生成多分辨图片] MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, website

[2024-12, 极致压缩] 1.58-bit FLUX, paper

[2024-12] PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing, paper

[2024-12, SA] Fashion-VDM: Video Diffusion Model for Virtual Try-On, paper, code

[2024-11，可能适合矢量字形生成？] Randomized Autoregressive Visual Generation, paper, code

[2024-02, Flow Matching] Flow matching for generative modeling, paper

[2024-12] UnZipLoRA: Separating Content and Style from a Single Image, paper, webiste

[2024-12, ICLR换光照，强！] IC-Light, paper, website, code

[2024-10] The Dawn of Video Generation: Preliminary Explorations with SORA-like Models, paper, website

[2024-08] Diffusion Models Are Real-Time Game Engines, paper, website

[2024-09] Make pixels dance: High-dynamic video generation, 豆包视频生成-PixelDance，豆包视频生成-Seaweed，paper, website

[2024-08, 敖腾隆] Body of Her: A Preliminary Study on End-to-End Humanoid Agent, paper, website

[2024-10, 门怡芳] MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling, paper, website

[2024-07, 商汤] Vimi: Large Model for Controllable Character Video Generation, website

[2024-06] Open-sora: Democratizing efficient video production for all, code

[2024-06] Videocrafter2: Overcoming data limitations for high-quality video diffusion models, paper, code

[2024-02] Animatediff: Animate your personalized text-to-image diffusion models without specific tuning, paper, code

[2024-09] T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback, paper, code

[2024-09] Flux2, code

[2024-03, SD3] Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, paper

[2024-09-12 三维和视频生成的结合] ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis, paper, code

[2024-09, CogVideoX，强] CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, paper, code, CogView3,

[2024-05-13, Junyan，好] Distilling Diffusion Models into Conditional GANs, paper

[2024-06，可能改变格局？] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, paper, code

[2024-04-14, VAR, 很棒] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, paper, code

[2024-02-17, SORA，震惊！] Video generation models as world simulators (openai.com)

[2023-ICCV, Diffusion Transformer] Scalable Diffusion Models with Transformers, paper

[2024-02-09] InstanceDiffusion: Instance-level Control for Image Generation, paper

[2023-12-07] PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, paper, website, code

[2023-12-20, VideoPoet] A large language model for zero-shot video generation, paper, website

[2023-12-11] Photorealistic Video Generation with Diffusion Models, paper, website, code

[2023-12-7] PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns, paper, code [换衣]

[2023-12-04] Style Aligned Image Generation via Shared Attention, paper, code [无需训练，高效风格一致推理]

[2023-12-03] One-step Diffusion with Distribution Matching Distillation, paper, code

[2023-11-24] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, paper, code

[2023-11-22] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, paper, code, website

[!2023-11-28] Adversarial Diffusion Distillation, paper, code, website

[2023-09-21] InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation, paper, website

[2022-10-05] Imagen video: High-definition video generation with diffusion models, paper, website

[2022-05-23, Imagen] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, paper, website

[2019-CVPR, SPADE] Semantic image synthesis with spatially-adaptive normalization, paper, code

[2017-ICCV, AdaIN] Arbitrary style transfer in real-time with adaptive instance normalization, paper, code

视觉基础任务：

[2024] DINOv2: Learning Robust Visual Features without Supervision, code

[2024-TPAMI] Towards Open Vocabulary Learning: A Survey, paper

[2025-01-18] EdgeTAM: On-Device Track Anything Model, paper

[2024-11-25，跟踪一切] SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory, paper, website, code

[2024-06-13] Depth Anything V2, paper, website, code

[2024-03-21，零样本检测] T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, paper, code

[2024-02-15] Introducing Gemini 1.5, Google's next-generation AI model (blog.google)

[2023-12-20, PixelLLM] Pixel Aligned Language Models, paper, website

[2023-11-29] Simplifying transformer blocks, paper, code

[2023-06-21] Fast Segment Anything, paper, code

[!2023-04-05, SAM] Segment Anything, paper, code

[2021-NeurIPS, CLIP] Learning Transferable Visual Models From Natural Language Supervision, paper, code

大模型及网络架构相关：

[2025-01-21, DeepSeek] paper, code

[2024-07-23, LIama 3.1] https://llama.meta.com/, https://huggingface.co/meta-llama, code

[2024-07] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, paper, code

[2024 CVPR] Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks, paper, code

[2025-02] Beyond Token Compression: A Training-Free Reduction Framework for Efficient Visual Processing in MLLMs, paper, code

[2024-02，多模态大模型，强] MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone, code

[2024-02-02，遥感多模态大模型] VGI-Enhanced multimodal large language model for remote sensing images, code

[2024-05-03, KAN] KAN: Kolmogorov-Arnold Networks, paper, code, people, document

[2024-04-CVPR-Oral–InternVL很强] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V, paper, code, demo

[2024-03-29] Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model, code

[2023-12, Mamba] Mamba: Linear-Time Sequence Modeling with Selective State Spaces, paper, code

[2023-11-03, Prompt Engineering] Prompt Engineering Through the Lens of Optimal Control, paper

[2024-01-09] GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation, paper, code

[2023-12-31] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices, paper, code

[2023-11-16] Automatic Engineering of Long Prompts, paper, code

[2023-07, 大模型快速推理] vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, paper, code

[2017–MOE] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, paper

蒸馏加速：

[2024-05, DMD2] Improved Distribution Matching Distillation for Fast Image Synthesis, paper, code, website, DMD1

价值对齐与Post Training：

[2023-NeurIPS, DPO] Direct preference optimization: Your language model is secretly a reward model, paper

[2024-10] VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks, paper, code

[2024-06, SPO] Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step, paper, code

[2024-07分析了LORA等经典工作，提出新的架构] See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition, paper, code

[2022 NIPS, RLHF开山之作] Training language models to follow instructions with human feedback, paper, code

纹理渲染相关：

[2023-11-29, GaussianShader] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces, paper, code, website

[2023-08 SIGGRAPH] Relighting Neural Radiance Fields with Shadow and Highlight Hints, paper, code

[2023-11-27] Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing, paper, code

[Huamin Wang布料大佬] website

可微渲染：

[2020-11，可微渲染] Modular Primitives for High-Performance Differentiable Rendering, paper

机器人：

[2024-10-18，清华机器人扩散大模型] RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation, paper, code, website

[2024-01-04, 斯坦福20万元开源机器人] Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, paper, website, hardware_code, software_code

AI4Science:

[2023-11-29, Nature] Scaling deep learning for materials discovery, paper, code

[2023-12-14, Nature] Mathematical discoveries from program search with large language models, paper, code

超声影像重建：

[2023, PR] BabyNet: Reconstructing 3D faces of babies from uncalibrated photographs, paper

[2022, MICCAI 2022] Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images, paper, code

[2022] Mednerf: Medical neural radiance fields for reconstructing 3d-aware ct-projections from a single x-ray, paper, code

[2021] 3D Fetal Face Reconstruction from Ultrasound Imaging, paper

[2021-05] Learning to Map 2D Ultrasound Images into 3D Space with Minimal Human Annotation, paper, code

[2021-09] ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation, paper, code

[2021-04, ACMComputing Surveys] Ultrasound Medical Imaging Techniques: A Survey, paper

[2007] FREEHAND 3D ULTRASOUND RECONSTRUCTION ALGORITHMS A REVIEW, paper

[1998 Lancet，柳叶刀] In-vivo three-dimensional ultrasound reconstructions of embryos and early fetuses, paper

偏振光图像处理与三维建模：

Deep shape from polarization

Polarized 3d: High-quality depth sensing with polarization cues

Fusion-based high-quality polarization 3D reconstruction

Depth sensing using geometrically constrained polarization normals

Shape from polarization for complex scenes in the wild

High quality 3D reconstruction based on fusion of polarization imaging and binocular stereo vision

无人机

[2025-01，全面综述] UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility, paper, website

[2024-08 Arxiv] AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models, paper, code

[2024-11 Arxiv] NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation, paper

[仿真器1] AirSim

[仿真器2] XTDrone

[控制器1] PX4-Autopilot

自动驾驶

[2024 CVPR] GenAD: Generalized Predictive Model for Autonomous Driving, paper, code

[2024 CVPR] PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving, paper, website

[2023 CVPR best paper] Planning-oriented Autonomous Driving, paper, code

OCR

[2024-09-03] General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model, paper, website

[2024-09] TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models, paper

空间智能

[2024-11] EgoLM: Multi-Modal Language Model of Egocentric Motions, paper, website

智能体仿真

[2024-10-30] Project Sid: Many-agent simulations toward AI civilization, paper

[2024-10-28] ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents, paper, website

[2024-06] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models, paper, code

[2024-06] AgentGym: Evolving Large Language Model-based Agents across Diverse Environments, paper, website

[2023-09] The rise and potential of large language model-based agents: A survey, paper, website

演化计算

[2024-10] Evolutionary Retrofitting, paper

[2023-05,好] EvoTorch: Scalable Evolutionary Computation in Python, paper, code

[2024-10] EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation, paper, code

[2024-10] PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models, paper

其他代码和有趣工具：

科研项目网站和demo发布攻略（赵一鸣撰写）：知乎

截图生成网站：Screen to code, code, video

汉字IDS信息：GitHub – cjkvi/cjkvi-ids: IDS data for CJK Unified Ideographs

字体资源：1001 Free Fonts | Download Fonts

Docker安装：video

Google Studio的API：website

如何使用Gemini进行部署（GeminiProChat，不太行）：code, MyCode, MyDemo

使用各种大模型(openplayground，包括GPT-4，实测很好用)：code

使用多模态大模型等各种应用（GPTDiscord）：code

GPT-4的各种应用收集（好用）：code

如何微调LLM：video

手机视频3D建模：手机3D扫描模型APP Polycam 使用与创作_哔哩哔哩_bilibiliC

Gaussion Splatting预览：SuperSplat (playcanvas.com)， code

Gaussion Splatting资源（全）：链接
在线免费角色动画网站：mixamo

LLM使用指南：https://github.com/datawhalechina/self-llm