Cheat Sheet - Generative AI
Table of Content (ToC)
Structure
Generative AI
A collection of resources, models, and tools for working with generative AI.
- Agents – AI agent frameworks and implementations.
- agentarium – A framework for building and evaluating AI agents.
- autogen – Automation tools for generative AI-based workflows.
- Evaluations – Methods and benchmarks for assessing model performance.
- Finetuning – Techniques and resources for fine-tuning generative models.
- GenAI Model Providers – Integration with different AI model providers.
- Hallucination (vs confabulation*?) – Analysis and mitigation of AI-generated hallucinations.
- *: Christopher Manning – Discussion on terminology.
- Inference – Running and optimizing model inference.
- LLM – Large language model-related tools and implementations.
- Notebooks – Jupyter notebooks for experiments and demonstrations.
- Retrieval Augmented Generation (RAG) – Enhancing generative models with retrieval techniques.
- Evaluations – Benchmarks for assessing RAG performance.
- Structure Outputs – Methods to structure and format AI-generated outputs.
- Tools – Various utilities for working with generative AI.
- Unified Interface – Abstraction layers for interacting with different AI models.
- Github - gibberlink: Two conversational AI agents switching from English to sound-level protocol after confirming they are both AI agents
- Github - gptme: personal assistant (command line)
- Github - EasyR1 (Multimodal RL training framework
- Github - Open Sora (video production)
- Github - LLM Engineer Toolkit (AI Engineering)
LLM Glossary
Figure: Post
Parser
- Github - docling
- Github - Omniparser (microsoft)
- Papier: https://arxiv.org/abs/2408.00203
- Github: https://github.com/microsoft/OmniParser (avec OmniTool)
- Modèle: https://huggingface.co/microsoft/OmniParser
- Blog V2: https://www.microsoft.com/en-us/research/articles/omniparser-v2-turning-any-llm-into-a-computer-use-agent/ (2025.02.12)
- Modèle V2: https://huggingface.co/microsoft/OmniParser-v2.0
- Démo V2: https://huggingface.co/spaces/microsoft/OmniParser-v2
- Ferret-UI de Apple
- Papier: https://arxiv.org/abs/2404.05719 (2024.04.08)
- Papier V2: https://arxiv.org/abs/2410.18967 (2024.10.24)
- Github: https://github.com/apple/ml-ferret/tree/main/ferretui
BERT
Leaderboards
TTS
Python libraries & articles
- Github - Step video t2v (text 2 video)
- Github - Omniparser (microsoft)
- Github - Step Aduio (Intelligent Speech interaction)
- Github colpali: visual embeddings
Robot
Learnings
- Github - AI Engineering Academy
- LinkedIn - AI Agent Courses
- Notion - Free AI Agents course
- Github - AI Agents for beginners (microsoft)
- HF cookbook/course (e.g. multimodal rag)
- YouTube - DeepLearning Introduction
References
- Github - Whisper JAX
- GenAI Handbook
- Github - Graph data science blog
- Research Google - AMIE (cardiologist): "Articulate Medical Intelligence Explorer"