References and Resources to Match Poster Sections¶
Background¶
Nature’s special collection: Challenges in Irreproducible Research
-
“In spite of much recent interest in many scientific areas, emphasis remains more on procedures, strictures and discussion, reflecting the inexperience of most scientific journals when it comes to software…”
A statistical definition for reproducibility and replicability
“In spite of much recent interest in many scientific areas, emphasis remains more on procedures, strictures and discussion, reflecting the inexperience of most scientific journals when it comes to software…”
Why scientists must share their research code in Nature News.
Interactive notebooks: Sharing the code by Helen Shen. Nature. 2014 Nov 6;515(7525):151-2. doi: 10.1038/515151a. PMID: 25373681. This article is a couple of years old now but gives a good background on the issues and impetus for now what are Jupyter Notebooks.
How Jupyter Notebooks Will Improve Your Computational Life - a nice illustrated introduction to Jupyter Notebooks
IPython Notebook/Project Jupyter Half day Workshop
“Project Jupyter is a literate data analysis environment (similar to knitr and Shiny) that supports over 40 different programming languages, including R and Python both. It can be used to build reproducible analyses for publication, collaborate over distances on data analysis, and build interactive tutorials and homeworks around data analysis.”
A gallery of interesting IPython Notebooks, in particular see Reproducible academic publications section . Some other notebooks are highlighted here.
Ten Simple Rules for Effective Statistical Practice
“Modern reproducible research tools like Sweave [18], knitr [19], and iPython [20] notebooks take this a step further and combine the research report with the code. Reproducible research is itself an ongoing area of research and a very important area that we all need to pay attention to.”
Jupyter Notebook will evolve next into a platform that will still run in the browser but expose more interface functionality while including more, full-featured notebooks, see here to see what is planned in the next couple of years as they roll out JupyterLab. Additionally, JupyterHub is being developed to serve multiple persistent, authenticated Jupyter Notebooks for teaching and collaborative uses; see slide 21 here as a guide to when you may need what implementation.
A sampling of scientific notebooks & extensions¶
Developments in next generation sequencing was plotted using a notebook
LIGO folks published a Python notebook along with the data to explain their analysis and findings (plots, audio files). and a few hours later you can now use Binder to bring up Jupyter with that notebook and all the dependencies preloaded, and step through their analysis yourself
Bioconductor’s RNA-seq Workflow in Jupyter notebook format with related video here with making-of described here
Exploratory bioinformatics with plot.ly and IPython notebook: Visualizing gene expression data features using a high-end plotting interface with bioinformatics data
Introduction to Applied Bioinformatics (or IAB) is a free, open source interactive text that introduces readers to core concepts of bioinformatics in the context of their implementation and application. - uses Jupyter Notebooks and mybinder.org
Exploring proteomics data from TCGA/CPTAC breast cancer samples as described here
Dynamics and associations of microbial community types across the human body, by Tao Ding & Patrick D. Schloss. Notebook replicating results
Indication of family-specific DNA methylation patterns in developing oysters, Claire E. Olson, Steven B. Roberts doi: http://dx.doi.org/10.1101/012831. Notebook to generate results in the paper.
Transcriptome Sequencing Reveals Potential Mechanism of Cryptic 3’ Splice Site Selection in *SF3B1*-mutated Cancers by Christopher DeBoever et al. There are several notebooks to replicate results and make figures.
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data, by C.T. Brown et al.. Full notebook
The Broad Institute built an extension for working their GenePattern platform from within a Jupyter Notebook environment
Be sure to look at the list of example notebooks using the Github/Binder approach below as well.
Resources for Running Active Notebooks in the Cloud¶
Freeman Lab’s MyBinder.org site - where you’ll go to point their system at your Github repository with a Jupyer Notebook to make active notebooks available online
Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?
“The split that my lab has made here is to use a workflow engine (e.g. make, pydoit, or snakemake) for the compute & data intensive stuff, and then feed those intermediate results (assembly and mapping stats, quantification, etc.) into analysis notebooks. For mybinder purposes, there should be no problem saving those intermediate results into a github repo for us and everyone else to analyze and reanalyze.”
tmpnb.org or try.jupyter.org - launch active, transient Jupyter Notebooks in the cloud for basic development, see Instant Temporary IPython Notebooks
I have made a page to walk you through trying tmpnb.org or try.jupyter.org, find it here.
- Another alternative is to click here. I have not extensively run this site through it’s paces, and so I can offer a couple points about it and beyond that your mileage may vary. It was set up by the excellent Domino Data Lab to serve as a place to run an active notebook about differences between Python 2.x and 3.x without need for signing into the Domino Data Lab service. I know the free tier for signed in users is limited to 15 minutes, and so maybe the anonymous one also has this limitation as well? Also similar to tmpnb.org unless it is already installed, you won’t have access to other modules or be able to scrape data from other sites.
The Binder/Github set-up allows you to designate other modules you need loaded when the instance is spun-up, but that does mean some set-up steps as discussed in the appendix. Contact me if you need help understanding how to set this up.
Launching Active Versions of My Notebooks¶
Click the button at any of the following repositories for an active notebook:
Ammonium Sulfate Precipitation Screen Calculator
The Cell Density Estimator where only a single sample is analyzed may be easier for novices to follow, and an active notebook can be launched here.
Notebook designed as an active computing exercise for young students visiting the lab
The example notebook used for the introduction section of the poster can be found here
Contrast the transparency of the
Ammonium Sulfate Precipitation Screen Calculator
with a form-based
Django site that performs same calculation
here.
Other Noteable Notebooks Using Github/Binder approach¶
- Molecular Design Toolkit
Demo - after
hitting the button there I suggest the early parts of the
Example 1. Build and simulate DNA.ipynb
andExample 3. Simulating a crystal structure.ipynb
notebooks. - nglview is a Python package
that makes it easy to visualize molecular systems, including
trajectories, directly in the Jupyter Notebook. (Launch a Binder by
clicking the
Binder
logo there.) See more about nglview here. - VPython - Visual Python
demos has a
button at the bottom. Try
Atomic solid
for a simulation of interatomic interactions. - Introduction to Applied Bioinformatics (or IAB) is a free, open source interactive text that introduces readers to core concepts of bioinformatics in the context of their implementation and application.
- ** the LIGO notebook is most famous Jupyter Notebook presently, and it is available in active form** - LIGO folks published a Python notebook along with the data to explain their analysis and findings of gravitational waves and you can now use Binder to bring up Jupyter with that notebook and all the dependencies preloaded, and step through their analysis yourself