Search this site
Embedded Files
Stanford Data Users
  • Home
  • Current Workshops
  • GWE Data Storage
  • Details
  • Video Tutorials
  • 2019 Workshop
Stanford Data Users
  • Home
  • Current Workshops
  • GWE Data Storage
  • Details
  • Video Tutorials
  • 2019 Workshop
  • More
    • Home
    • Current Workshops
    • GWE Data Storage
    • Details
    • Video Tutorials
    • 2019 Workshop

Click here to apply!

Lab Meeting Workshops

Interested in having a workshop given directly to your lab? Contact Bryce at bdgrier@stanford.edu.

Current Offerings

100 - Bash and the Unix Shell

  • introduction to Unix shells

  • essential Bash commands

  • developing and executing a Bash script

110 - Git and Version Control

  • Introduction to version control concepts

  • Common git commands and workflows

  • Getting started with Stanford GitLab

120 - Python For Scientific Computing I

  • Basic syntax, data types, and core libraries

  • Jupyter

  • NumPy

  • Pandas

200 - Data Transfer and Storage

      • Stanford-specific and external resources for data management

      • command line data transfer tools:

          • rsync (needed for FarmShare/Sherlock/OAK)

          • rclone (useful for cloud services)

220 - LLMs as Tools in Research

  • history, development, and current state of large language models (LLMs)

  • building a retrieval-augmented model that allows for management and querying of scientific knowledge

300 - Containerized Analysis Notebooks

  • promoting reproducibility through using containerized analysis

  • interactive, containerized, brower-based analysis using HPC resources:

      • FarmShare/Sherlock OnDemand

      • hosting python kernels on FarmShare/Sherlock

301 - Containerizing Analyses and Workflows

  • modifying existing containers

  • building custom containers from scratch

  • making custom containers available to your lab or to the public

310 - Parallel Processing I

  • introduction to embarrassingly parallel problems

  • methods of parallelization at large scales:

      • GNU Parallel (powerful, shell-based tool that allows for easy parallization of tasks on a local machine)

      • Pub/Sub (cloud-based service that allows for scalable, aynchronous analysis pipelines)

400 - Reproducible Scientific Data Pipelines

  • Fundamentals of continuous integration and pipelines

  • GitLab CI

Report abuse
Report abuse