Skip to content

Resource Compilation

  • Configure the working environment
    Configure your data science project's working environment according to the Explanation documentation.

  • Learn how to use tools
    How-To Guides teach you how to produce professional code and documentation using the tools available in your working environment.

  • Step-by-step guide
    Tutorials demonstrate how to use the tools in your working environment to complete an end-to-end toy data science project.

  • Reference
    Reference material, such as naming conventions for Data Science items, can be found in the API Reference section.

Resource Compendium

Conventions
Python Code Quality Tools
  • Formatting Your Code with Black - A comprehensive guide on using Black to format Python code, ensuring consistency and readability across your codebase.
  • Accelerating Linting with Ruff - Learn how to integrate Ruff for fast and efficient linting to maintain high-quality Python code.
  • Static Type Checking with Mypy - An in-depth guide on using Mypy to enhance your Python code with static type checking, including type inference, union types, optionals, and advanced features like overloads and generics.
  • Pytest Introduction Guide - A beginner's guide to unit testing with Pytest, covering the basics of unit testing, its importance, and how to write and run tests using Pytest.
  • Pytest Configuration Guide - Detailed explanation and setup for Pytest configuration within VS Code, ensuring efficient and effective test execution.
  • Mypy Configuration Guide - Detailed explanation and setup for Mypy configuration within VS Code and pyproject.toml, ensuring efficient and effective type checking.
Data Version Control with DVC
  • Understanding DVC - Essentials of DVC for large datasets management in ML, covering integration, versioning, and reproducibility.🚧
  • Setting Up DVC - Comprehensive setup guide for DVC, from initialization to managing data in the cloud.🚧
  • Local Data Updates - Step-by-step tutorial on managing and tracking local dataset updates with DVC.🚧
  • Cloud Data Updates - Tutorial on syncing cloud data updates in services like AWS S3 or Azure Blob Storage using DVC.🚧
  • Data Version Communication - Guidelines for effective communication of data versions in collaborative settings using DVC and GitHub.🚧
  • Collaborative Data Workflow - Workflow for collaborative data updates, tracking changes, and team syncing with DVC and GitHub.🚧
  • Branches for Data Versions - Understanding the use of Git branches for data version management and DVC integration with collaboration tools.🚧
  • DVC in VS Code - How to use the DVC extension in Visual Studio Code for efficient data version control and management.
Data Management
  • Effective Data Documentation - A comprehensive guide on creating and managing metadata and documentation for data science projects, emphasizing importance, standard practices, and automation.🚧
  • Metadata Integration in Data Analysis - A practical guide on integrating metadata with data analysis tools, detailing steps, strategies, and an example Python script for effective collaboration. 🚧
  • Creating and Managing Metadata and Documentation - An in-depth tutorial on crafting effective metadata and maintaining thorough data documentation, complete with steps, examples, and best practices.🚧
Weights & Biases Experiment Tracking
  • ML Experiments Life-Cycle with Weights & Biases - This guide provides a comprehensive overview of managing the life-cycle of machine learning (ML) experiments using Weights & Biases (W&B). It offers a structured approach for documenting experiments, establishing coding practices, configuring and tracking experiments with W&B, and efficiently managing outcomes.
  • Automating Backups for Weights & Biases - A comprehensive guide dedicated to establishing robust backup mechanisms for Weights & Biases experiments on your MacBook Pro
VS Code Configuration