Jupyter at BNL

National Synchrotron Light Source II

Trial Deployment

July 2015, starting from JupyterHub v0.1

  • Hub and all user servers share one VM.
  • Kernels are distributed on instrument-specific compute resources.

Each kernel is designated to run on a specific host.

Example kernel.json, specifying a host:

{"argv":["python", "-m", "ipykernel.ipykernel_laucnher",
         "-f", "{connection_file}"],
"display_name":"Python 3 on Remote",
"host":"<HOSTNAME>" }

Lessons Learned

  • Trade-offs of running Jupyter servers and kernels on separate hosts
  • Kernel environment alone is insufficient for reproducibility

New Deployment

a work in progress

Decamping both compute and data across site to the Scientific Data & Computing Center (SDCC)

Centralize Kernels on Dedicated Nodes

  • Discarding "remote kernel" architecture
  • Using batchspawner on dedicated nodes
  • Data transferred to shared file system during/after acquisition

NSLS-II Tutorial Binder

EPICS IOCs in a Binder

  • Binder provides the hooks you need to do this.
  • Users can design and rehearse their experiments.

Rich Outputs Help User Interaction

Sharing Notebooks

Who is my audience?

How Far Does your Code Go?

Should you take the trouble to...

  • Ensure cells execute top to bottom
  • Refactor large code blocks into modules
  • Use version control (nbdime!)
  • Specify software dependencies in requirements.txt or environment.yml.
  • Are the sender and recipient on the same Hub?
  • Is the notebook being "published" for long-term reuse by many people?
  • Is the unit being shared just one notebook or a collection of notebooks? Are there additional files necessary to run them?
  • Does the notebook have specialized resource requirements?
Effort (or Expertise) for Recipient vs Sender

nbexamples

our first attempt to improve on emailing notebooks

github.com/danielballan/nbexamples

nbexamples

To share, user clicks a button in the notebook toolbar

nbexamples

Users can browse all shared notebooks

nbexamples

A copy is made in the recipient's working directory

Drawbacks similar to emailing a notebook

  • The shared directory quickly becomes a junk drawer.
  • The notebooks do not know what environment they run in.
  • Recipient could have missing dependencies or incompatible versions.

Related Projects, similar drawbacks

jupyterhub-share-link

a better solution for low-effort short-term sharing

github.com/danielballan/jupyterhub-share-link
  • This works for local spawners, container-based, batch spawners.
  • Recipient is put into a server using the same env/container as the sender.
  • Copying happens via Jupyter REST API. No shared file sytem assumed.
  • No extra state, just a key pair controlled by the Hub Serivce, used to sign and verify the share links.

What about long-term, "publish"-style sharing?

Binder

  • An opinionated JupyterHub distribution
  • Reproducible Execution Environment Spec
  • Uses existing standards and best practices (e.g. requirements.txt) and rewards them

Some of Binder's "opinions" aren't a great fit for us

  • Fully open, no authentication
  • No persistent storage, container is deleted after some timeout
  • Uses containers
  • Requires Kubernetes (unless images are pre-built)

What else could we assemble from the components of Binder?

Ideas...

A JupyterHub Service that builds an environment from a REES without requiring Kubernetes?

That is, a REST API to repo2docker

Ideas...

An alternate builder that builds a conda environment instead of a container, for the subset of REES where this is possible (i.e. no Dockerfile support of course)

That is, a variant on repo2docker

that makes a conda-pack instead of an image

Ideas...

A gallery of REES containers published by other users with options to spawn in a container that mounts the recipient’s local storage to provide persistence

Related:

Thanks

  • Mizuki Karasawa, Will Strecker-Kellogg, and Ofer Rind at SDCC
  • The NSLS-II Controls Group
  • The Jupyter Community