plotbox.plot module¶

plotbox.plot.dist_plot(distObj, saveAs=None, title='Probability Dashboard', logscale=False)¶

Generates a plot object which is either displayed or saved. The plot includes 4 subplots:

a probability plot generated using scipy.stats “probplot”, comparing the theoretical distribution and its data X
a scatter plot of the data sample
the CDF of the distribution
the PDF of the distribution

plotbox.plot.hist_box(ar, perc=0, val=None, type='sym', annotation=True, distplot=False, trunc=False, bins=30, normed=True, alpha=0.6, color='b', kde=True, legend='', xmax=None)¶

Plots a histogram highlighting or filtering out the given percentage of data, according to the given type of filtering.

3 different types available:

‘left’: filter out the lower given percentage of data
‘right’: filter out the upper given percentage of data
‘sym’: both left and right filtering

Parameters:

ar (array-like) – Array of data
perc (float in [0,100] (defaults to 0)) – Percentage of data to filter
val (float) – An alternative to perc, gives the threshold value instead of percentage
type (string (defaults to 'sym')) – Type of filtering
annotation (bool (defaults to True)) – If True, writes complementary annotations on the plot
distplot (bool (defaults to False)) – If True, uses seaborn distplot
trunc (bool (defaults to False)) – If True, filters out the data and only displays remaining data. Otherwise, only highlight the selected data.
bins (int (default is 30)) – Number of bins to use
normed (bool (defaults to True)) – If True, histogram is normed
alpha (float in [0,1] (defaults is 0.6)) – Transparency of the plot (0 is completely transparent, 1 is opaque)
color (string (default is 'b')) – Color of the plot
kde (bool (defaults to True)) – If using seaborn distplot, tells wether or not to plot the kernel density estimation ef the distribution legend : string (default is ‘’) Legend to use
xmax (float) – Maximum x-value of the plot. If none is given, chooses the maximum value of the data.

plotbox.plot.make_readable_ticks(type='x')¶

Turn your unreadable plot ticks (x-ticks or y-ticks) into nice, clean ticks.

Works only for floats for now.

Parameters:	type ('x' or 'y' (default is 'x')) – Choose ‘x’ if you want to change your x-ticks, ‘y’ for your y-ticks

Notes

Must be incorporated as part of a “regular” plot script (see example)

Examples

>>>

plotbox.plot.plotScatter(x, y, data=None, hue=None, bestfit=False, ci=95, alpha=1, size=20, xlab='x', ylab='y', axFontSize=9, title='', saveAs=None, figsize=(9.5, 6), snssize=5, label=None)¶: Robust scatterplot tool. Takes x and y as names of columns if given a df (you also get proper x/y labels), else takes x and y as arrays. Also takes hue as a way to color points as well as facet the bestfit, CI, alpha, size, x/y labels, and title. If given a date array (as x), draws simple scatterplot.

plotbox.plot.prepare_plot(xticks, yticks, figsize=(10.5, 6), hideLabels=False, gridColor='#999999', gridWidth=1.0)¶: function for generating pretty plot layout

plotbox.plot.save_plot(fig, path, filename=None, show=True, filetype='.png', dpi=270)¶: Function to save plots at high resolution and clean crop Requires a matplotlib figure object and a save path.

plotbox.plot.scatter_plot(x, y, data=None, hue=None, title='Scatter Plot', xlabel='X-Values', ylabel='Y-Values', alpha=0.3, figsize=(14.25, 9.0), saveAs=None, vlines=None, hlines=None, xlim=(None, None), ylim=(None, None), legend=True, model=None, plotly=False, show=True)¶

scatter_plot uses matplotlib.pyplot.scatter in a seaborn like functional paridigm

Parameters:

y (x,) – Column names in data.
data (DataFrame) – Long-form (tidy) dataframe with variables in columns and observations in rows.
col, row (hue,) – Variable names to facet on the hue, col, or row dimensions (see FacetGrid docs for more information).
xlabel, ylabel (title,) – labels of scatter plot.
alpha (float) – opacity of scatter points.
figsize (touple, (width, height)) –
saveAs (optional) – filename to save figure as
vlines (list) – list of x points to make vertical lines in the plot
xlim (touple (xmin, xmax)) – horizontal boundries of the figure
ylim (tuple (ymin, ymax)) – vertical boundries of the plot
legend (boolean, optional) – Draw a legend for the data when using a hue variable.
model (str, optional) – regression on given data.

Notes

This function can be used in 2 different ways:

Using the arguments to generate titles, legends, etc... and then save/display the plot

Incorporate the plot in a script and overriding the plotting features this way:

>>> import matplotlib.pyplot as plt
>>>
>>> f = 1000
>>> hue = ['one' for i in range(50*f)] + ['two' for i in range(30*f)] + ['three' for i in range(20*f)]
>>> rp.plotBox.scatter_plot(x = np.random.randn(100*f), y = np.random.randn(100*f), hue = hue, vlines = 0, alpha= .1, hlines = 0)
>>> plt.title('My title')
>>> plt.xlabel('X label I want')
>>>
>>> # To change the figure size :
>>> fig = plt.gcf()     # get the figure object
>>> fig.set_size_inches(5,10)
>>>
>>> plt.show()

Add arguments: * dropna : boolean, optional Drop missing values from the data before plotting.

add regression :

f, popt, pcov = rp.statBox.regression_model(x,y, model) plt.plot(np.linspace(0,max(x)+100,50), f(np.linspace(0,max(x)+100,50), *popt), ‘r-‘, label=”Fitted Curve”)

plotbox.plot.violinOne(X, col=None, subplot=111, alpha=0.2)¶

Given a sample of data (and optionnally a boolean vector), returns a pyplot axis object which is the violin plot of the data.

Parameters:	X (array-like) – Vector col (array-like) – Indicates suspension and failure times subplot (int) – Subplot value for the axis to return alpha (float (0<=,=>1)) – Transparency of the dots
Returns:	ax – Violin plot
Return type:	pyplot axis

plotbox.plot.violinPlot(savefig=None, **kwargs)¶

Given a set of n data samples with their boolean vectors, returns n subplots with the violin plot of each sample.

Usage: violinPlot(sample1=(X1,Xbool1), sample2=(X2,Xbool2), etc...)

Parameters:	savefig (string (default is None)) – If given, save the figure using savefig as the file name. kwargs (tuples) – For each sample, the data sample and the boolean vector must be provided in a tuple.

plotbox.plot.wblPlots(distList, title=None, labelList=None, saveAs=None, xlabel='Miles to Failure', min_x=None, max_x=None, ylim=(None, None), xlim=(None, None), use_sci=False, show=True)¶: Produces a weibull probability plot from a list of dist objects labelList is a list of strings the same size and order as distList that provides label information for each distribution