Jupyter Tips, Tricks, Best Practices with Sample Code for Productivity Boost
Found useful by Nobel Laureates and more:
“…, this looks very helpful”
- Economics Nobel Laureate 2018, Dr. Paul Romer on Twitter
set_trace()git diff for Jupytersupervisor or tmux instead of direct ssh or bash. This works out to be more stable Jupyter server which doesn’t die unexpectedly. Consider writing the buffer logs to a file rather than stdout
docker attach where you might not see a lot of logs%debug in a new cell to activate IPython Debugger. Standard keyboard shortcuts such as c for continue, n for next, q for quit applyfrom IPython.core.debugger import set_trace to IPython debugger checkpoints, the same way you would for pdb in PyCharmfrom IPython.core.debugger import set_trace
def foobar(n):
x = 1337
y = x + n
set_trace() #this one triggers the debugger
return y
foobar(3)
Returns:
> <ipython-input-9-04f82805e71f>(7)fobar()
5 y = x + n
6 set_trace() #this one triggers the debugger
----> 7 return y
8
9 foobar(3)
ipdb> q
Exiting Debugger.
Preference Note: If I already have an exception, I prefer %debug because I can zero down to the exact line where code breaks compared to set_trace() where I have to traverse line by line
%load_ext autoreload; %autoreload 2. The autoreload utility reloads modules automatically before entering the execution of code typed at the IPython prompt.This makes the following workflow possible:
In [1]: %load_ext autoreload
In [2]: %autoreload 2 # set autoreload flag to 2. Why? This reloads modules every time before executing the typed Python code
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43
print(out_var) on a nested list or dictionary, consider doing print(json.dumps(out_var, indent=2)) instead. It will pretty print the output string.!ls *.csv or even !pwd to check your current working directory
cd {PATH} where PATH is a Python variable, similarly you can do PATH = !pwd to use relative paths instead of absolutepwd and !pwd work with mild preference for !pwd to signal other code readers that this is a shell commandcd ../../ in Jupyter could be done using os.setcwd()as well! will have the same environment variables
!pip install foo (or conda install bar) will use the pip which is in the path for the sh shell which might be different from whatever bash shell environment you use!pip install foo doesn’t seem to do it, try:import sys
!{sys.executable} -m pip install foo # sys.executable points to the python that is running in your kernel
Use the Search Magic file - no need to pip install. Download and use the file.
In [1]: from search_magic import SearchMagic
In [2]: get_ipython().register_magics(SearchMagic)
In [3]: %create_index
In [4]: %search tesseract
Out[4]: Cell Number -> 2
Notebook -> similarity.ipynb
Notebook Execution Number -> 2
shift + tab, it will display function’s docstring in a tooltip, and it has options to expand the tooltip or expand it at the bottom of the screen?func_name() to view function, class docstrings etc. For example:?str.replace()
Returns:
Docstring:
S.replace(old, new[, count]) -> str
Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
Type: method_descriptor
module_name.*?. For instance: pd.*?. Additionally, this works with prefixes: pd.read_*? and pd.*_csv? will also workpd.read_csv??h to view keyboard shortcutsconda for instead of pip virtualenv similar because that ensures package versions are consistent. conda is not a Python package manager. Check conda (vs pip): Myths and Misconceptions from the creator of PandasSelective Diff/Merge Tool for jupyter notebooks
Install it first:
pip install -e git+https://github.com/jupyter/nbdime#egg=nbdime
It should automatically configure it for jupyter notebook. If something doesn’t work, see installation.
Then put the following into ~/.jupyter/nbdime_config.json:
{
"Extension": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbDiff": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbDiffDriver": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"NbMergeDriver": {
"source": true,
"details": false,
"outputs": false,
"metadata": false
},
"dummy": {}
}
Change outputs value to true if you care to see outputs diffs too.
Including markdown in your code’s output is very useful. Use this to highlight parameters, performance notes and so on. This enables colors, Bold, etc.
from IPython.display import Markdown, display
def printmd(string, color=None):
colorstr = "<span style='color:{}'>{}</span>".format(color, string)
display(Markdown(colorstr))
printmd("**bold and blue**", color="blue")
Add this snippet to the start of your notebook. Press Alt+I to find the cell being executed right now. This does not work if you have enabled vim bindings:
%%javascript
// Go to Running cell shortcut
Jupyter.keyboard_manager.command_shortcuts.add_shortcut('Alt-I', {
help : 'Go to Running cell',
help_index : 'zz',
handler : function (event) {
setTimeout(function() {
// Find running cell and click the first one
if ($('.running').length > 0) {
//alert("found running cell");
$('.running')[0].scrollIntoView();
}}, 250);
return false;
}
});
.py files on regular intervals. Your notebook run should be mainly function calls.
%%time cell magic as a warning + runtime loggerf"This is iteration: {iter_number}"is much more readable than .format() syntaxfrom xxx_imports import *Pathlib instead of os.path wherever possible for more readable code. Here is a beginner friendly tutorial. If you just want to review, refer the crisp tutorial or official docs%matplotlib inline to ensure that the plots are rendered inside the notebookplt.plot code to avoid code bloating. Using subplots from Matplotlib OO API is usually neater than using more plt.plotsdef show_img(im, figsize=None, ax=None, title=None):
import matplotlib.pyplot as plt
if not ax: fig,ax = plt.subplots(figsize=figsize)
ax.imshow(im, cmap='gray')
if title is not None: ax.set_title(title)
ax.get_xaxis().set_visible(True)
ax.get_yaxis().set_visible(True)
return ax
def draw_rect(ax, bbox):
import matplotlib.patches as patches
x, y, w, h = bbox
patch = ax.add_patch(patches.Rectangle((x, y), w,h, fill=False, edgecolor='red', lw=2))
show_img is a reusable plotting function which can be easily extended to plot one off images as well properly use subplots.
In below example, I use a single figure and add new images as subplots using the neater axes.flat syntax:
fig, axes = plt.subplots(1, 2, figsize=(6, 2))
ax = show_img(char_img, ax= axes.flat[0], title = 'char_img_line_cropping:\n'+str(char_img.shape))
ax = show_img(char_bg_mask, ax=axes.flat[1], title = 'Bkg_mask:\n'+str(char_bg_mask.shape))
# If you are working on image segmentation task, you can easily add red rectangles per subplot:
draw_rect(ax, char_bounding_boxes) # will add red bounding boxes for each character
%%timeit in your code. Why? Because it does 1,00,000 runs of the cell and then return average of best 3 runtimes. This is not always needed. Instead use %%time or add average times in inline comments