logo

智能图形计算实验室

字形生成:

[2024-03] Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering, paper, code

[2024-06] Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering, paper, code

[2024-11, NIPS效果好] TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control, paper, code

[2024-07-26] DP-Font: Chinese Calligraphy Font Generation Using Diffusion Model and Physical Information Neural Network, paper

[2024-07] GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models, paper, code

[2024- CVPR] Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models, paper

[2024-04-18] Dynamic Typography: Bringing Words to Life, paper, website 

[2023-11-17] WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models, paper

[2024 Arxiv] FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation, paper

[2023 SIGA] Anything to glyph: Artistic font synthesis via text-to-image diffusion model, paper, website

[2024 CVPR] Ds-fusion: Artistic typography via discriminated and stylized diffusion, paper, website

[2024 CHI] TypeDance: Creating semantic typographic logos from image through personalized generation, video, paper

[2024-04-12] DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation, paper

[2023-12-19] Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model, paper, code

[2023-12-19] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning, paper, code, demo [one-shot字体生成,有点耗时,风格不太像]

[2023-12-16] VecFusion: Vector Font Generation with Diffusion, paper, code [双阶段生成矢量字形]

[!2023-11-07] AnyText: Multilingual Visual Text Generation And Editing,  paper, code

[2023-12-08] UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models, paper, code, website, demo

[2023 SIGGRAPH, 有趣的语义字] Word-as-image for semantic typography, website, paper


三维重建与生成:

[2024-10] Disco4D: Disentangled 4D Human Generation and Animation from a Single Image, paper

[2024-03] CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field, code

[2024-05-17] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields, code, paper, website

[2024-05-07] A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose, paper, code

[2024-05-03, InstantMesh] InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models, paper, code, demo

[2024-ICLR,效果不错,已试,人脸动作迁移GPAvatar: Generalizable and Precise Head Avatar from Image(s), paper, code

[2024, CVPR] HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation, paper, code

[2024-03] SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting, paper, code, website(Mesh与Gaussian splatting的结合)

[2023-12,效果不错CVPR] GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians, paper, website, code

[2023-11] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics, paper, website

[2023-12] PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting, paper

[2023-12] Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing, paper

[2023-12] DUSt3R: Geometric 3D Vision Made Easy, paper

[2023-12] HumanTOMATO: Text-aligned whole-body motion generation, paper, code

[2023-NIPS] MotionGPT: Human motion as a foreign language, paper, code

[2023-NIPS] Motion-X: A large-scale 3d expressive whole-body human motion dataset, paper, code

[2023-12-04] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes, paper, code

[2023-12-04] GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis, paper

[2023-12-03] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians, paper

[2023-11-27] Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling, paper

[2024-CVPR] LiSA: LiDAR Localization with Semantic Awareness, paper【未出】

[2024-CVPR] LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes, paper【未出】

[2023-11-16] Adaptive Shells for Efficient Neural Radiance Field Rendering, paper, website, code

[2023-12-05] ReconFusion: 3D Reconstruction with Diffusion Priors, paper, website, code

[2023-08] A Survey on Deep Generative 3D-aware Image Synthesis, paper, website

[2023-12] Multimodal Image Synthesis and Editing: The Generative AI Era, paper

[2023-12-11] Relightable gaussian codec avatars, paper, code, website

[2023-12-11] CAD : Photorealistic 3D Generation via Adversarial Distillation, paper, code, website

[2023-12-08] Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing, paper, code, website

[2023-11-29] MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers, paper, code

[2023-11-22] LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes, paper, code

[2023] One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion, paper, website

[2023] One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization, paper, website

[2023-ICCV] Zero-1-to-3: Zero-shot one image to 3d object, paper, code

[2023-] Prolific-dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation, paper, code, website

[2023-08] Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis, paper, code, website

[2023-10-12, 4D Gaussian Splatting] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering, paperwebsitecode

[!2023-07 TOG, 3D Gaussian Splatting] 3D Gaussian Splatting for Real-Time Radiance Field Rendering, papercodevideovideo2video3video4video5, tutorial video, coding example1, blog

[2023-07 TOG, NeRO, 效果不错] NeRO: Neural Geometry and BRDF Reconstruction of Reflective Objects from Multiview Images, paper, website, code

[2023-ICCV, Zip-NeRF] Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields, paper

[2022-10] EVA3D: Compositional 3D Human Generation from 2D Image Collections, paper

[2022-09-29] Dreamfusion: Text-to-3d using 2d diffusion, paper, code, website

[2022-04-27, EG3D] Efficient Geometry-aware 3D Generative Adversarial Networks, paper 

[2022-03-11] StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation, paper

[2020-02-23] PolyGen: An Autoregressive Generative Model of 3D Meshes, paper, code

[2020-04-16] 3D Morphable Face Models – Past, Present and Future, paper [3DMM相关的综述]

[2009 SIGGRAPH] A Morphable Model For The Synthesis Of 3D Faces, paper [3DMM开山之作]

[2021 SIGGRAPH, 试一试ROSEFusion: Random Optimization for Online Dense Reconstruction under Fast Camera Motion, paper, code


3D SLAM

[2024-06 CVPR] Gaussian Splatting SLAM, paper

[2024-02-20, 3D SLAM综述] How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey, paper

[2023-12] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM, paper, website

[2023] LONER: LiDAR Only Neural Representations for Real-Time SLAM, paper, code


图像/视频生成:

[2024-10] The Dawn of Video Generation: Preliminary Explorations with SORA-like Models, paper, website

[2024-08] Diffusion Models Are Real-Time Game Engines, paper, website

[2024-09] Make pixels dance: High-dynamic video generation, 豆包视频生成-PixelDance,豆包视频生成-Seaweed,paper, website

[2024-08, 敖腾隆] Body of Her: A Preliminary Study on End-to-End Humanoid Agent, paper, website

[2024-10, 门怡芳] MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling, paper, website

[2024-07, 商汤] Vimi: Large Model for Controllable Character Video Generation, website

[2024-06] Open-sora: Democratizing efficient video production for all, code

[2024-06] Videocrafter2: Overcoming data limitations for high-quality video diffusion models, paper, code

[2024-02] Animatediff: Animate your personalized text-to-image diffusion models without specific tuning, paper, code

[2024-09] T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback, paper, code

[2024-09] Flux2, code

[2024-09-12 三维和视频生成的结合] ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis, paper, code

[2024-09, CogVideoX,强] CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer, paper, code

[2024-05-13, Junyan,好] Distilling Diffusion Models into Conditional GANs, paper

[2024-06,可能改变格局?] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, paper, code

[2024-04-14, VAR, 很棒] Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, paper, code 

[2024-02-17, SORA,震惊!Video generation models as world simulators (openai.com)

[2023-ICCV, Diffusion Transformer] Scalable Diffusion Models with Transformers, paper

[2024-02-09] InstanceDiffusion: Instance-level Control for Image Generation, paper

[2023-12-07] PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding, paper, website, code

[2023-12-20, VideoPoet] A large language model for zero-shot video generation, paper, website

[2023-12-11] Photorealistic Video Generation with Diffusion Models, paper, website, code

[2023-12-7] PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns, paper, code [换衣]

[2023-12-04] Style Aligned Image Generation via Shared Attention, paper, code [无需训练,高效风格一致推理]

[2023-12-03] One-step Diffusion with Distribution Matching Distillation, paper, code

[2023-11-24] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets, paper, code

[2023-11-22] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation, paper, code, website

[!2023-11-28] Adversarial Diffusion Distillation, paper, code, website

[2023-09-21] InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation, paper, website

[2022-10-05] Imagen video: High-definition video generation with diffusion models, paper, website

[2022-05-23, Imagen] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, paper, website


[2019-CVPR, SPADE] Semantic image synthesis with spatially-adaptive normalization, paper, code

[2017-ICCV, AdaIN] Arbitrary style transfer in real-time with adaptive instance normalization, paper, code

视觉基础任务:

[2024-06-13] Depth Anything V2, paper, webside, code

[2024-03-21,零样本检测] T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy, paper, code

[2024-02-15] Introducing Gemini 1.5, Google's next-generation AI model (blog.google)

[2023-12-20, PixelLLM] Pixel Aligned Language Models, paper, website 

[2023-11-29] Simplifying transformer blocks, paper, code

[2023-06-21] Fast Segment Anything, paper, code

[!2023-04-05, SAM] Segment Anything, paper, code

[2021-NeurIPS, CLIP] Learning Transferable Visual Models From Natural Language Supervision, paper, code


大模型及网络架构相关:

[2024-07-23, LIama 3.1]  https://llama.meta.com/, https://huggingface.co/meta-llama, code

[2024-07] VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks, paper, code

[2024 CVPR] Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks, paper, code 

[2024-02,多模态大模型,强] MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone, code

[2024-02-02,遥感多模态大模型] VGI-Enhanced multimodal large language model for remote sensing images, code

[2024-05-03, KAN] KAN: Kolmogorov-Arnold Networks, paper, code, people, document

[2024-04-CVPR-Oral–InternVL很强] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V, papercode, demo

[2024-03-29] Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model, code

[2023-12, Mamba] Mamba: Linear-Time Sequence Modeling with Selective State Spaces, paper, code

[2023-11-03, Prompt Engineering] Prompt Engineering Through the Lens of Optimal Control, paper

[2024-01-09] GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation, paper, code

[2023-12-31] MobileVLM: A Fast, Strong and Open Vision Language Assistant for Mobile Devices, paper, code

[2023-11-16Automatic Engineering of Long Prompts, paper, code 

[2023-07, 大模型快速推理] vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, paper, code

[2017–MOE] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, paper


蒸馏加速:

[2024-05, DMD2] Improved Distribution Matching Distillation for Fast Image Synthesis, paper, code, website, DMD1


价值对齐与Post Training:

[2024-06, SPO] Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step, paper, code


[2024-07分析了LORA等经典工作,提出新的架构] See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition, paper, code


[2022 NIPS, RLHF开山之作] Training language models to follow instructions with human feedback, paper, code




纹理渲染相关:

[2023-11-29, GaussianShaderGaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces, paper, code, website

[2023-08 SIGGRAPHRelighting Neural Radiance Fields with Shadow and Highlight Hints, paper, code 

[2023-11-27] Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing, paper, code

[Huamin Wang布料大佬] website


可微渲染:

[2020-11,可微渲染] Modular Primitives for High-Performance Differentiable Rendering, paper


机器人:

[2024-01-04, 斯坦福20万元开源机器人] Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, paper, website, hardware_code, software_code


AI4Science:

[2023-11-29, Nature] Scaling deep learning for materials discovery, paper, code

[2023-12-14, Nature] Mathematical discoveries from program search with large language models, paper, code


超声影像重建:


[2023, PR] BabyNet: Reconstructing 3D faces of babies from uncalibrated photographs, paper

[2022, MICCAI 2022] Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images, paper, code

[2022] Mednerf: Medical neural radiance fields for reconstructing 3d-aware ct-projections from a single x-ray, paper, code

[2021] 3D Fetal Face Reconstruction from Ultrasound Imaging, paper

[2021-05] Learning to Map 2D Ultrasound Images into 3D Space with Minimal Human Annotation, paper, code

[2021-09] ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation, paper, code

[2021-04, ACMComputing Surveys] Ultrasound Medical Imaging Techniques: A Survey, paper

[2007] FREEHAND 3D ULTRASOUND RECONSTRUCTION ALGORITHMS A REVIEW, paper

[1998 Lancet,柳叶刀] In-vivo three-dimensional ultrasound reconstructions of embryos and early fetuses, paper



偏振光图像处理与三维建模:

Deep shape from polarization

Polarized 3d: High-quality depth sensing with polarization cues

Fusion-based high-quality polarization 3D reconstruction

Depth sensing using geometrically constrained polarization normals

Shape from polarization for complex scenes in the wild

High quality 3D reconstruction based on fusion of polarization imaging and binocular stereo vision


自动驾驶

[2024 CVPR] GenAD: Generalized Predictive Model for Autonomous Driving, paper, code

[2024 CVPR] PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving, paper, website

[2023 CVPR best paper] Planning-oriented Autonomous Driving, paper, code


OCR

[2024-09-03] General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model, paper, website



其他代码和有趣工具:

科研项目网站和demo发布攻略(赵一鸣撰写):知乎

截图生成网站:Screen to code, code, video

汉字IDS信息:GitHub – cjkvi/cjkvi-ids: IDS data for CJK Unified Ideographs

字体资源:1001 Free Fonts | Download Fonts

Docker安装:video

Google Studio的API:website

如何使用Gemini进行部署(GeminiProChat,不太行):code, MyCode, MyDemo

使用各种大模型(openplayground,包括GPT-4,实测很好用):code

使用多模态大模型等各种应用(GPTDiscord):code

GPT-4的各种应用收集(好用):code

如何微调LLM:video

手机视频3D建模:手机3D扫描模型APP Polycam 使用与创作_哔哩哔哩_bilibiliC

Gaussion Splatting预览:SuperSplat (playcanvas.com)code

Gaussion Splatting资源():链接

回到顶部