Resource Compilation

Configure the working environment
Configure your data science project's working environment according to the Explanation documentation.
Learn how to use tools
How-To Guides teach you how to produce professional code and documentation using the tools available in your working environment.
Step-by-step guide
Tutorials demonstrate how to use the tools in your working environment to complete an end-to-end toy data science project.
Reference
Reference material, such as naming conventions for Data Science items, can be found in the API Reference section.

Resource Compendium¶

Coding Standards

Coding Standards Document - Comprehensive guidelines and enforced practices to maintain code quality, consistency, and maintainability across all projects using the cookiecutter-collabora template.

Python Environment Management

Setting Up Pyenv and Poetry - A step-by-step guide to configuring pyenv and Poetry for efficient Python environment and dependency management across multiple projects.

Conventions

Project Scaffolding Standards - A comprehensive guide on establishing a standardized folder and file structure for AI/ML projects, enhancing organization and collaboration within the AI team.
Introduction to Naming Conventions for AI Project Assets - An essential guide that underscores the significance of systematic naming conventions in fostering clear communication, efficient project management, and reproducibility in AI and ML collaborations.
File Naming Conventions - A comprehensive guide on using the snake_case naming convention for all file types in your project to maintain consistency and readability.
Column Naming Conventions for ML/AI Projects - A comprehensive guide on using the snake_case naming convention for column names across different types of data files in your ML/AI projects to maintain consistency and readability.
Python Docstrings Conventions - A detailed guide on how to write and automate Google-style docstrings in Python projects, ensuring clear and maintainable code documentation.
Code and Comment Length Standards in Python Projects - A detailed guide on maintaining consistent line lengths for code and comments in Python projects, including configuration and usage of VS Code tools to enforce these standards.
Effective GitHub Naming Conventions - Essential guidelines for naming GitHub repositories in data science, enhancing clarity and organization.
Git Branch Naming Standards - A comprehensive guide on standardized branch naming for ML projects, enhancing repository clarity and collaboration.
Commit Message Standards in ML Projects - A guide on structuring and standardizing commit messages to improve team collaboration in ML projects.
Best Practices for Using Git and Pushing to GitHub - A guide outlining best practices for effective use of Git and GitHub, including commit frequency, message standards, branch management, and collaboration techniques.
ML Data Folder Naming Guide - A comprehensive guide to naming data folders in ML projects for enhanced organization and efficiency.
Model Persistence File Naming Conventions - Guide on naming conventions for persisting machine learning models, covering format, versioning, metadata storage, and documentation.
Automating Metadata Creation - A practical guide for automating the creation and saving of machine learning model metadata using a Python script.
Markdown Documentation for ML Models - Step-by-step tutorial on documenting machine learning models using Markdown, covering the training process, parameters, performance metrics, and metadata.
Notebook and Script Naming Conventions in ML Projects - Essential guidelines for naming Jupyter notebooks and scripts in ML projects, emphasizing clarity, consistency, and efficient file management.
Enforcing Naming Conventions with GitHub Actions - A practical guide to set up GitHub Actions for enforcing naming conventions in machine learning projects, ensuring consistency and organization.
Templates for pull requests, issues/stories, feature and readme files
Branching Strategies for ML Projects - Branching strategies for AI-based projects.
Python Scripting for Data Conversion - Detailed Python functions for converting data between XLSX and CSV formats, tailored for machine learning experts working on NLP projects.
Project Data Management Practices - Best practices for organizing and managing project data, including folder structure, naming conventions, and centralized data repository usage.
Using Code Tags - A practical guide on why and how to use code tags in ML/AI projects, leveraging VS Code extensions to enhance code readability and collaboration.
Using TODO Tree with Code Tags - A guide on how to effectively use the TODO Tree VS Code extension in combination with code tags to manage tasks and comments in your codebase.
AI/ML Project Lifecycle with Git and GitHub - A comprehensive guide outlining the lifecycle of using Git and GitHub for AI/ML projects, detailing steps from opening a JIRA issue to managing successful and unsuccessful experiments.
Introduction to Doctest - A beginner-friendly guide to using doctest for unit testing and documentation in Python, tailored for data scientists in machine learning projects.
Python OOP for Machine Learning Projects - A comprehensive guide on using Object-Oriented Programming (OOP) in Python to create modular, collaborative, and reproducible code for machine learning projects.
Code Review Best Practices - A comprehensive guide on conducting effective code reviews, covering the importance, best practices, and workflows to ensure code quality and collaboration in ML/AI projects.
Best Practices for Creating JIRA Stories for ML/AI Projects - A comprehensive guide to creating well-defined JIRA stories for ML/AI projects, ensuring clarity, actionable tasks, and alignment with project goals.
Moving from Jupyter Notebooks to Production Python Code - A comprehensive guide on transitioning your ML/AI prototype code from Jupyter notebooks to production-ready Python scripts, covering class design, formatting, testing, and more.
Using Pre-Commit Hooks to Enforce Coding Standards - A detailed guide on setting up and using pre-commit hooks to maintain code quality and adherence to coding standards, including descriptions and examples of each hook.
Using Configuration Files to Avoid Hardcoding Values - Guidelines on best practices for creating and managing YAML configuration files to externalize hardcoded values in Python scripts, enhancing maintainability and flexibility.
Error Handling and Logging - A guide on implementing robust error handling and logging in Python projects to enhance maintainability and debugging.
Docstrings and Inline Commentaries - A guide on writing Google-style docstrings and adding inline commentaries to your Python code to enhance readability, maintainability, and usability.
Input Validation with Pydantic - A comprehensive guide on using Pydantic for runtime input validation in Python classes to ensure data integrity and maintainability.

Python Code Quality Tools

Formatting Your Code with Black - A comprehensive guide on using Black to format Python code, ensuring consistency and readability across your codebase.
Accelerating Linting with Ruff - Learn how to integrate Ruff for fast and efficient linting to maintain high-quality Python code.
Static Type Checking with Mypy - An in-depth guide on using Mypy to enhance your Python code with static type checking, including type inference, union types, optionals, and advanced features like overloads and generics.
Pytest Introduction Guide - A beginner's guide to unit testing with Pytest, covering the basics of unit testing, its importance, and how to write and run tests using Pytest.
Pytest Configuration Guide - Detailed explanation and setup for Pytest configuration within VS Code, ensuring efficient and effective test execution.
Mypy Configuration Guide - Detailed explanation and setup for Mypy configuration within VS Code and pyproject.toml, ensuring efficient and effective type checking.

Data Version Control with DVC

Understanding DVC - Essentials of DVC for large datasets management in ML, covering integration, versioning, and reproducibility.
Setting Up DVC - Comprehensive setup guide for DVC, from initialization to managing data in the cloud.
Local Data Updates - Step-by-step tutorial on managing and tracking local dataset updates with DVC.
Cloud Data Updates - Tutorial on syncing cloud data updates in services like AWS S3 or Azure Blob Storage using DVC.
Data Version Communication - Guidelines for effective communication of data versions in collaborative settings using DVC and GitHub.
Collaborative Data Workflow - Workflow for collaborative data updates, tracking changes, and team syncing with DVC and GitHub.
Branches for Data Versions - Understanding the use of Git branches for data version management and DVC integration with collaboration tools.
DVC in VS Code - How to use the DVC extension in Visual Studio Code for efficient data version control and management.

Data Management

Effective Data Documentation - A comprehensive guide on creating and managing metadata and documentation for data science projects, emphasizing importance, standard practices, and automation.
Metadata Integration in Data Analysis - A practical guide on integrating metadata with data analysis tools, detailing steps, strategies, and an example Python script for effective collaboration.
Creating and Managing Metadata and Documentation - An in-depth tutorial on crafting effective metadata and maintaining thorough data documentation, complete with steps, examples, and best practices.

Weights & Biases Experiment Tracking

ML Experiments Life-Cycle with Weights & Biases - This guide provides a comprehensive overview of managing the life-cycle of machine learning (ML) experiments using Weights & Biases (W&B). It offers a structured approach for documenting experiments, establishing coding practices, configuring and tracking experiments with W&B, and efficiently managing outcomes.
Automating Backups for Weights & Biases - A comprehensive guide dedicated to establishing robust backup mechanisms for Weights & Biases experiments on your MacBook Pro

VS Code Configuration

Sharing VS Code Settings for Python Projects - A detailed guide on how to maintain consistent VS Code settings across your Python projects, ensuring uniform coding standards and best practices.