Jupyter Tips, Tricks, Best Practices with Sample Code for Productivity Boost
Found useful by Nobel Laureates and more:
“…, this looks very helpful”
- Economics Nobel Laureate 2018, Dr. Paul Romer on Twitter
set_trace()
git diff
for Jupytersupervisor
or tmux
instead of direct ssh
or bash
. This works out to be more stable Jupyter server which doesn’t die unexpectedly. Consider writing the buffer logs to a file rather than stdout
docker attach
where you might not see a lot of logs%debug
in a new cell to activate IPython Debugger. Standard keyboard shortcuts such as c
for continue, n
for next, q
for quit applyfrom IPython.core.debugger import set_trace
to IPython debugger checkpoints, the same way you would for pdb
in PyCharmfrom IPython.core.debugger import set_trace
def foobar(n):
x = 1337
y = x + n
set_trace() #this one triggers the debugger
return y
foobar(3)
Returns:
> <ipython-input-9-04f82805e71f>(7)fobar()
5 y = x + n
6 set_trace() #this one triggers the debugger
----> 7 return y
8
9 foobar(3)
ipdb> q
Exiting Debugger.
Preference Note: If I already have an exception, I prefer %debug
because I can zero down to the exact line where code breaks compared to set_trace()
where I have to traverse line by line
%load_ext autoreload; %autoreload 2
. The autoreload utility reloads modules automatically before entering the execution of code typed at the IPython prompt.This makes the following workflow possible:
In [1]: %load_ext autoreload
In [2]: %autoreload 2 # set autoreload flag to 2. Why? This reloads modules every time before executing the typed Python code
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
print(out_var)
on a nested list or dictionary, consider doing print(json.dumps(out_var, indent=2))
instead. It will pretty print the output string.!ls *.csv
or even !pwd
to check your current working directory
cd {PATH}
where PATH is a Python variable, similarly you can do PATH = !pwd
to use relative paths instead of absolutepwd
and !pwd
work with mild preference for !pwd
to signal other code readers that this is a shell commandcd ../../
in Jupyter could be done using os.setcwd()
as well!
will have the same environment variables
!pip install foo
(or conda install bar
) will use the pip
which is in the path for the sh
shell which might be different from whatever bash
shell environment you use!pip install foo
doesn’t seem to do it, try:import sys
!{sys.executable} -m pip install foo # sys.executable points to the python that is running in your kernel
Use the Search Magic file - no need to pip install. Download and use the file.
In [1]: from search_magic import SearchMagic
In [2]: get_ipython().register_magics(SearchMagic)
In [3]: %create_index
In [4]: %search tesseract
Out[4]: Cell Number -> 2
Notebook -> similarity.ipynb
Notebook Execution Number -> 2
shift + tab
, it will display function’s docstring in a tooltip, and it has options to expand the tooltip or expand it at the bottom of the screen?func_name()
to view function, class docstrings etc. For example:?str.replace()
Returns:
Docstring:
S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
Type: method_descriptor
module_name.*?
. For instance: pd.*?
. Additionally, this works with prefixes: pd.read_*?
and pd.*_csv?
will also workpd.read_csv??
h
to view keyboard shortcutsconda
for instead of pip virtualenv
similar because that ensures package versions are consistent. conda
is not a Python package manager. Check conda (vs pip): Myths and Misconceptions from the creator of PandasSelective Diff/Merge Tool for jupyter notebooks
Install it first:
pip install -e git+https://github.com/jupyter/nbdime#egg=nbdime
It should automatically configure it for jupyter notebook. If something doesn’t work, see installation.
Then put the following into ~/.jupyter/nbdime_config.json
:
{
"Extension": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbDiff": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbDiffDriver": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbMergeDriver": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"dummy": {}
}
Change outputs value to true if you care to see outputs diffs too.
Including markdown in your code’s output is very useful. Use this to highlight parameters, performance notes and so on. This enables colors, Bold, etc.
from IPython.display import Markdown, display
def printmd(string, color=None):
colorstr = "<span style='color:{}'>{}</span>".format(color, string)
display(Markdown(colorstr))
printmd("**bold and blue**", color="blue")
Add this snippet to the start of your notebook. Press Alt+I
to find the cell being executed right now. This does not work if you have enabled vim bindings:
%%javascript
// Go to Running cell shortcut
Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-I', {
help : 'Go to Running cell',
help_index : 'zz',
handler : function (event) {
setTimeout(function() {
// Find running cell and click the first one
if ($('.running').length > 0) {
//alert("found running cell");
$('.running')[0].scrollIntoView();
}}, 250);
return false;
}
});
.py
files on regular intervals. Your notebook run should be mainly function calls.
%%time
cell magic as a warning + runtime loggerf"This is iteration: {iter_number}"
is much more readable than .format()
syntaxfrom xxx_imports import *
Pathlib
instead of os.path
wherever possible for more readable code. Here is a beginner friendly tutorial. If you just want to review, refer the crisp tutorial or official docs%matplotlib inline
to ensure that the plots are rendered inside the notebookplt.plot
code to avoid code bloating. Using subplots
from Matplotlib OO API is usually neater than using more plt.plots
def show_img(im, figsize=None, ax=None, title=None):
import matplotlib.pyplot as plt
if not ax: fig,ax = plt.subplots(figsize=figsize)
ax.imshow(im, cmap='gray')
if title is not None: ax.set_title(title)
ax.get_xaxis().set_visible(True)
ax.get_yaxis().set_visible(True)
return ax
def draw_rect(ax, bbox):
import matplotlib.patches as patches
x, y, w, h = bbox
patch = ax.add_patch(patches.Rectangle((x, y), w,h, fill=False, edgecolor='red', lw=2))
show_img
is a reusable plotting function which can be easily extended to plot one off images as well properly use subplots.
In below example, I use a single figure and add new images as subplots using the neater axes.flat syntax:
fig, axes = plt.subplots(1, 2, figsize=(6, 2))
ax = show_img(char_img, ax= axes.flat[0], title = 'char_img_line_cropping:\n'+str(char_img.shape))
ax = show_img(char_bg_mask, ax=axes.flat[1], title = 'Bkg_mask:\n'+str(char_bg_mask.shape))
# If you are working on image segmentation task, you can easily add red rectangles per subplot:
draw_rect(ax, char_bounding_boxes) # will add red bounding boxes for each character
%%timeit
in your code. Why? Because it does 1,00,000 runs of the cell and then return average of best 3 runtimes. This is not always needed. Instead use %%time
or add average times in inline comments