Resource Compilation
-
Configure the working environment
Configure your data science project's working environment according to the Explanation documentation. -
Learn how to use tools
How-To Guides teach you how to produce professional code and documentation using the tools available in your working environment. -
Step-by-step guide
Tutorials demonstrate how to use the tools in your working environment to complete an end-to-end toy data science project. -
Reference
Reference material, such as naming conventions for Data Science items, can be found in the API Reference section.
Resource Compendium¶
Conventions
- Project Scaffolding Standards - A comprehensive guide on establishing a standardized folder and file structure for AI/ML projects, enhancing organization and collaboration within the AI team.
- Introduction to Naming Conventions for AI Project Assets - An essential guide that underscores the significance of systematic naming conventions in fostering clear communication, efficient project management, and reproducibility in AI and ML collaborations.
- File Naming Conventions - A comprehensive guide on using the snake_case naming convention for all file types in your project to maintain consistency and readability.
- Column Naming Conventions for ML/AI Projects - A comprehensive guide on using the snake_case naming convention for column names across different types of data files in your ML/AI projects to maintain consistency and readability.
- Python Docstrings Conventions - A detailed guide on how to write and automate Google-style docstrings in Python projects, ensuring clear and maintainable code documentation.
- Code and Comment Length Standards in Python Projects - A detailed guide on maintaining consistent line lengths for code and comments in Python projects, including configuration and usage of VS Code tools to enforce these standards.
- Effective GitHub Naming Conventions - Essential guidelines for naming GitHub repositories in data science, enhancing clarity and organization.
- Git Branch Naming Standards - A comprehensive guide on standardized branch naming for ML projects, enhancing repository clarity and collaboration.
- Commit Message Standards in ML Projects - A guide on structuring and standardizing commit messages to improve team collaboration in ML projects.
- Best Practices for Using Git and Pushing to GitHub - A guide outlining best practices for effective use of Git and GitHub, including commit frequency, message standards, branch management, and collaboration techniques.
- ML Data Folder Naming Guide - A comprehensive guide to naming data folders in ML projects for enhanced organization and efficiency.
- Model Persistence File Naming Conventions - Guide on naming conventions for persisting machine learning models, covering format, versioning, metadata storage, and documentation.
- Automating Metadata Creation - A practical guide for automating the creation and saving of machine learning model metadata using a Python script.
- Markdown Documentation for ML Models - Step-by-step tutorial on documenting machine learning models using Markdown, covering the training process, parameters, performance metrics, and metadata.
- Notebook and Script Naming Conventions in ML Projects - Essential guidelines for naming Jupyter notebooks and scripts in ML projects, emphasizing clarity, consistency, and efficient file management.
- Enforcing Naming Conventions with GitHub Actions - A practical guide to set up GitHub Actions for enforcing naming conventions in machine learning projects, ensuring consistency and organization.
- Templates for pull requests, issues/stories, feature and readme files
- Branching Strategies for ML Projects - Branching strategies for AI-based projects.
- Python Scripting for Data Conversion - Detailed Python functions for converting data between XLSX and CSV formats, tailored for machine learning experts working on NLP projects.
- Project Data Management Practices - Best practices for organizing and managing project data, including folder structure, naming conventions, and centralized data repository usage.
- Using Code Tags - A practical guide on why and how to use code tags in ML/AI projects, leveraging VS Code extensions to enhance code readability and collaboration.
- Using TODO Tree with Code Tags - A guide on how to effectively use the TODO Tree VS Code extension in combination with code tags to manage tasks and comments in your codebase.
- AI/ML Project Lifecycle with Git and GitHub - A comprehensive guide outlining the lifecycle of using Git and GitHub for AI/ML projects, detailing steps from opening a JIRA issue to managing successful and unsuccessful experiments.
- Introduction to Doctest - A beginner-friendly guide to using doctest for unit testing and documentation in Python, tailored for data scientists in machine learning projects.
- Python OOP for Machine Learning Projects - A comprehensive guide on using Object-Oriented Programming (OOP) in Python to create modular, collaborative, and reproducible code for machine learning projects.
- Code Review Best Practices - A comprehensive guide on conducting effective code reviews, covering the importance, best practices, and workflows to ensure code quality and collaboration in ML/AI projects.
- Best Practices for Creating JIRA Stories for ML/AI Projects - A comprehensive guide to creating well-defined JIRA stories for ML/AI projects, ensuring clarity, actionable tasks, and alignment with project goals.
- Moving from Jupyter Notebooks to Production Python Code - A comprehensive guide on transitioning your ML/AI prototype code from Jupyter notebooks to production-ready Python scripts, covering class design, formatting, testing, and more.
- Using Pre-Commit Hooks to Enforce Coding Standards - A detailed guide on setting up and using pre-commit hooks to maintain code quality and adherence to coding standards, including descriptions and examples of each hook.
- Using Configuration Files to Avoid Hardcoding Values - Guidelines on best practices for creating and managing YAML configuration files to externalize hardcoded values in Python scripts, enhancing maintainability and flexibility.
- Error Handling and Logging - A guide on implementing robust error handling and logging in Python projects to enhance maintainability and debugging.
- Docstrings and Inline Commentaries - A guide on writing Google-style docstrings and adding inline commentaries to your Python code to enhance readability, maintainability, and usability.
- Input Validation with Pydantic - A comprehensive guide on using Pydantic for runtime input validation in Python classes to ensure data integrity and maintainability.
Python Code Quality Tools
- Formatting Your Code with Black - A comprehensive guide on using Black to format Python code, ensuring consistency and readability across your codebase.
- Accelerating Linting with Ruff - Learn how to integrate Ruff for fast and efficient linting to maintain high-quality Python code.
- Static Type Checking with Mypy - An in-depth guide on using Mypy to enhance your Python code with static type checking, including type inference, union types, optionals, and advanced features like overloads and generics.
- Pytest Introduction Guide - A beginner's guide to unit testing with Pytest, covering the basics of unit testing, its importance, and how to write and run tests using Pytest.
- Pytest Configuration Guide - Detailed explanation and setup for Pytest configuration within VS Code, ensuring efficient and effective test execution.
- Mypy Configuration Guide - Detailed explanation and setup for Mypy configuration within VS Code and
pyproject.toml
, ensuring efficient and effective type checking.
Data Version Control with DVC
- Understanding DVC - Essentials of DVC for large datasets management in ML, covering integration, versioning, and reproducibility.
- Setting Up DVC - Comprehensive setup guide for DVC, from initialization to managing data in the cloud.
- Local Data Updates - Step-by-step tutorial on managing and tracking local dataset updates with DVC.
- Cloud Data Updates - Tutorial on syncing cloud data updates in services like AWS S3 or Azure Blob Storage using DVC.
- Data Version Communication - Guidelines for effective communication of data versions in collaborative settings using DVC and GitHub.
- Collaborative Data Workflow - Workflow for collaborative data updates, tracking changes, and team syncing with DVC and GitHub.
- Branches for Data Versions - Understanding the use of Git branches for data version management and DVC integration with collaboration tools.
- DVC in VS Code - How to use the DVC extension in Visual Studio Code for efficient data version control and management.
Data Management
- Effective Data Documentation - A comprehensive guide on creating and managing metadata and documentation for data science projects, emphasizing importance, standard practices, and automation.
- Metadata Integration in Data Analysis - A practical guide on integrating metadata with data analysis tools, detailing steps, strategies, and an example Python script for effective collaboration.
- Creating and Managing Metadata and Documentation - An in-depth tutorial on crafting effective metadata and maintaining thorough data documentation, complete with steps, examples, and best practices.
Weights & Biases Experiment Tracking
- ML Experiments Life-Cycle with Weights & Biases - This guide provides a comprehensive overview of managing the life-cycle of machine learning (ML) experiments using Weights & Biases (W&B). It offers a structured approach for documenting experiments, establishing coding practices, configuring and tracking experiments with W&B, and efficiently managing outcomes.
- Automating Backups for Weights & Biases - A comprehensive guide dedicated to establishing robust backup mechanisms for Weights & Biases experiments on your MacBook Pro
VS Code Configuration
- Sharing VS Code Settings for Python Projects - A detailed guide on how to maintain consistent VS Code settings across your Python projects, ensuring uniform coding standards and best practices.