The data module is designed to load and prepare arbitrary data sets for use in machine learning algorithms.
heron.data.
Data
(targets, labels, target_sigma=None, label_sigma=None, target_names=None, label_names=None, test_targets=None, test_labels=None, test_size=0.05)[source]¶Bases: object
The data class is designed to hold non-timeseries data, and is capable of automatically selecting test data from the provided dataset.
Future development will include the ability to add pre-selected test and verification data to the object.
Methods
add_data (self, targets, labels[, …]) |
Add new rows into the data object. |
calculate_normalisation (self, data, name) |
Calculate the offsets for the normalisation. |
copy (self) |
Return a copy of this data object. |
denormalise (self, data, name) |
Reverse the normalise() method’s effect on the data, and return it to the correct scaling. |
get_starting (self) |
Attempts to guess sensible starting values for the hyperparameter values. |
ix2name (self, name) |
Convert the index of a column to a column name. |
name2ix (self, name) |
Convert the name of a column to a column index. |
normalise (self, data, name) |
Normalise a given array of data so that the values of the data have a minimum at 0 and a maximum at 1. |
add_data
(self, targets, labels, target_sigma=None, label_sigma=None)[source]¶Add new rows into the data object.
calculate_normalisation
(self, data, name)[source]¶Calculate the offsets for the normalisation. We’ll normally want to normalise the training data, and then be able to normalise and denormalise new inputs according to that.
Parameters: |
|
---|
denormalise
(self, data, name)[source]¶Reverse the normalise() method’s effect on the data, and return it to the correct scaling.
Parameters: |
|
---|---|
Returns: |
|
get_starting
(self)[source]¶Attempts to guess sensible starting values for the hyperparameter values.
Returns: |
|
---|
normalise
(self, data, name)[source]¶Normalise a given array of data so that the values of the data have a minimum at 0 and a maximum at 1. This improves the computability of the majority of data sets.
Parameters: |
|
---|---|
Returns: |
|
Notes
In order to perform the normalisation we need two steps: 1) Subtract the “DC Offset”, which is the minimum of the data 2) Divide by the range of the data
heron.data.
Timeseries
(targets, labels, target_names=None, label_names=None, test_size=0.05)[source]¶Bases: object
This is a class designed to hold timeseries data for machine learning algorithms.
Timeseries data needs to be handled differently from other datasets as it is rarely likely to be advantageous to select individual points from a timeseries as either test data or verification data. Instead the timeseries class will select individual timeseries as the test and verification data.
Matched filtering functions.
░█░█░█▀▀░█▀▄░█▀█░█▀█ ░█▀█░█▀▀░█▀▄░█░█░█░█ ░▀░▀░▀▀▀░▀░▀░▀▀▀░▀░▀
This code is designed for performing matched filtering using a Gaussian Process Surrogate model. —————————————————————
heron.filtering.
Filter
(gp, data, times)[source]¶Bases: object
This class builds the filtering machinery from a provided surrogate model and noisy data.
Methods
matched_likelihood (self, theta[, psd, srate]) |
Calculate the simple match of some data, given a template, and return its log-likelihood. |
matched_likelihood
(self, theta, psd=None, srate=16834)[source]¶Calculate the simple match of some data, given a template, and return its log-likelihood.
Parameters: |
|
---|
heron.filtering.
inner_product_noise
(x, y, sigma, psd=None, srate=16834)[source]¶Calculate the noise-weighted inner product of two random arrays.
Parameters: |
|
---|
Kernel functions for GPs.
heron.kernels.
ExponentialSineSq
(period=1, width=15, ax=0)[source]¶Bases: heron.kernels.Kernel
An implementation of the exponential sine-squared kernel.
Methods
distance (self, data1, data2[, hypers]) |
Calculate the squared distance to the point in parameter space. |
function (self, data1, data2, period) |
The functional form of the kernel inside the exponential. |
matrix (self, data1, data2) |
Produce a gram matrix based off this kernel. |
gradient | |
set_hyperparameters |
function
(self, data1, data2, period)[source]¶The functional form of the kernel inside the exponential.
hyper
= [1, 1]¶matrix
(self, data1, data2)[source]¶Produce a gram matrix based off this kernel.
Parameters: |
|
---|---|
Returns: |
|
name
= 'Exponential sine-squared kernel'¶heron.kernels.
Kernel
[source]¶Bases: object
A generic factory for Kernel classes.
Methods
distance (self, data1, data2[, hypers]) |
Calculate the squared distance to the point in parameter space. |
matrix (self, data1, data2) |
Produce a gram matrix based off this kernel. |
set_hyperparameters |
distance
(self, data1, data2, hypers=None)[source]¶Calculate the squared distance to the point in parameter space.
matrix
(self, data1, data2)[source]¶Produce a gram matrix based off this kernel.
Parameters: |
|
---|---|
Returns: |
|
name
= 'Generic kernel'¶ndim
= 1¶heron.kernels.
Matern
(order=1.5, amplitude=100, width=15)[source]¶Bases: heron.kernels.Kernel
An implementation of the Matern Kernel.
Methods
distance (self, data1, data2[, hypers]) |
Calculate the squared distance to the point in parameter space. |
matrix (self, data1, data2) |
Produce a gram matrix based off this kernel. |
function | |
set_hyperparameters |
name
= 'Matern'¶order
= 1.5¶heron.kernels.
SquaredExponential
(ndim=1, amplitude=100, width=15)[source]¶Bases: heron.kernels.Kernel
An implementation of the squared-exponential kernel.
Attributes: |
|
---|
Methods
distance (self, data1, data2[, hypers]) |
Calculate the squared distance to the point in parameter space. |
function (self, data1, data2) |
The functional form of the kernel. |
gradient (self, data1, data2) |
Calculate the graient of the kernel. |
matrix (self, data1, data2) |
Produce a gram matrix based off this kernel. |
set_hyperparameters |
flat_hyper
¶hyper
= [1.0]¶name
= 'Squared exponential kernel'¶Prior distributions for GP hyperpriors.
heron.priors.
Normal
(mean, std)[source]¶Bases: heron.priors.Prior
A normal prior probability distribution.
Methods
transform (self, x) |
Transform from unit normalisation to this prior. |
logp |
Functions and classes for contructing regression surrogate models.
heron.regression.
MultiTaskGP
(training_data, kernel, tikh=1e-06, solver=<class 'george.solvers.hodlr.HODLRSolver'>, hyperpriors=None)[source]¶Bases: heron.regression.SingleTaskGP
An implementation of a co-trained set of Gaussian processes which share the same hyperparameters, but which model differing data. The training of these models is described in RW pp115–116.
A multi-task GPR is capable of acting as a surrogate to a many-to-many function, and is trained by making the assumption that all of the outputs from the function share a common correlation structure.
The principle difference compared to a single task GP is the presence of multiple Gaussian Processes, with one to model each dimension of the output data.
Notes
The MultiTask GPR implementation is very much a work in progress at the moment, and not all methods implemented in the SingleTask GPR are implemented correctly yet.
Attributes: |
|
---|
Methods
active_learn (self, afunction, x, y[, iters, …]) |
Actively train the Gaussian process from a set of provided labels and targets using some acquisition function. |
add_data (self, target, label[, label_error]) |
Add data to the Gaussian process. |
correlation (self) |
Calculate the correlation between the model and the test data. |
entropy (self) |
Return the entropy of the Gaussian Process distribution. |
expected_improvement (self, x) |
Returns the expected improvement at the design vector X in the model |
get_hyperparameters (self) |
Return the kernel hyperparameters. |
grad_neg_ln_likelihood (self, p) |
Return the negative of the gradient of the log likelihood for the GP when its hyperparameters have some specified value. |
hyperpriortransform (self, p) |
Return the true value in the desired hyperprior space, given an input of a unit-hypercube prior space. |
ln_likelihood (self, p) |
Provides a wrapper to the ln_likelihood functions for each component Gaussian process in the multi-task system. |
loghyperpriors (self, p) |
Calculate the log of the hyperprior distributions at a given point. |
neg_ln_likelihood (self, p) |
Returns the negative of the log-likelihood; designed for use with minimisation algorithms. |
nei (self, x) |
Calculate the negative of the expected improvement at a point x. |
prediction (self, new_datum) |
Produce a prediction at a new point, or set of points. |
rmse (self) |
Calculate the root mean squared error of the whole model. |
save (self, filename) |
Save the Gaussian Process to a file which can be reloaded later. |
set_bmatrix (self, values) |
Set the values of the B matrix from a vector. |
set_hyperparameters (self, hypers) |
Set the hyperparameters of the kernel function on each Gaussian process. |
test_predict (self) |
Calculate the value of the GP at the test targets. |
train (self[, method, metric, sampler]) |
Train the Gaussian process by finding the optimal values for the kernel hyperparameters. |
update (self) |
Update the stored matrices. |
get_hyperparameters
(self)[source]¶Return the kernel hyperparameters. Returns the hyperparameters of only the first GP in the network; the others /should/ all be the same, but there might be something to be said for checking this.
Returns: |
|
---|
ln_likelihood
(self, p)[source]¶Provides a wrapper to the ln_likelihood functions for each component Gaussian process in the multi-task system.
Notes
This is implemented in a separate function because of the mild peculiarities of how the pickle module needs to serialise functions, which means that instancemethods (which this would become) can’t be serialised.
prediction
(self, new_datum)[source]¶Produce a prediction at a new point, or set of points.
Parameters: |
|
---|---|
Returns: |
|
set_hyperparameters
(self, hypers)[source]¶Set the hyperparameters of the kernel function on each Gaussian process.
train
(self, method='MCMC', metric='loglikelihood', sampler='ensemble', **kwargs)[source]¶Train the Gaussian process by finding the optimal values for the kernel hyperparameters.
Parameters: |
|
---|
heron.regression.
Regressor
(training_data, kernel, tikh=1e-06, solver=<class 'george.solvers.hodlr.HODLRSolver'>, hyperpriors=None, **kwargs)[source]¶Bases: heron.regression.SingleTaskGP
Attributes: |
|
---|
Methods
active_learn (self, afunction, x, y[, iters, …]) |
Actively train the Gaussian process from a set of provided labels and targets using some acquisition function. |
add_data (self, target, label[, label_error]) |
Add data to the Gaussian process. |
correlation (self) |
Calculate the correlation between the model and the test data. |
entropy (self) |
Return the entropy of the Gaussian Process distribution. |
expected_improvement (self, x) |
Returns the expected improvement at the design vector X in the model |
get_hyperparameters (self) |
Return the kernel hyperparameters. |
grad_neg_ln_likelihood (self, p) |
Return the negative of the gradient of the log likelihood for the GP when its hyperparameters have some specified value. |
hyperpriortransform (self, p) |
Return the true value in the desired hyperprior space, given an input of a unit-hypercube prior space. |
ln_likelihood (self, p) |
Provides a convenient wrapper to the ln likelihood function. |
loghyperpriors (self, p) |
Calculate the log of the hyperprior distributions at a given point. |
neg_ln_likelihood (self, p) |
Returns the negative of the log-likelihood; designed for use with minimisation algorithms. |
nei (self, x) |
Calculate the negative of the expected improvement at a point x. |
prediction (self, new_datum[, normalised]) |
Produce a prediction at a new point, or set of points. |
rmse (self) |
Calculate the root mean squared error of the whole model. |
save (self, filename) |
Save the Gaussian Process to a file which can be reloaded later. |
set_bmatrix (self, values) |
Set the values of the B matrix from a vector. |
set_hyperparameters (self, hypers) |
Set the hyperparameters of the kernel function. |
test_predict (self) |
Calculate the value of the GP at the test targets. |
train (self[, method, metric, sampler]) |
Train the Gaussian process by finding the optimal values for the kernel hyperparameters. |
update (self) |
Update the stored matrices. |
heron.regression.
SingleTaskGP
(training_data, kernel, tikh=1e-06, solver=<class 'george.solvers.hodlr.HODLRSolver'>, hyperpriors=None, **kwargs)[source]¶Bases: object
This is an implementaion of a Single task Gaussian process regressor. That is, a GPR which is capable of acting as a surrogate to a many-to-one function. The Single Task GPR is the fundamental building block of the MultiTask GPR, which consists of multiple Single Tasks which are trained in tandem (but which do NOT share correlation information). — Ahem… There /are/ components of this code in here, but things need a little bit more thought before this will work efficiently… An implementation of a Gaussian Process Regressor with multiple response outputs and multiple inputs.
Attributes: |
|
---|
Methods
active_learn (self, afunction, x, y[, iters, …]) |
Actively train the Gaussian process from a set of provided labels and targets using some acquisition function. |
add_data (self, target, label[, label_error]) |
Add data to the Gaussian process. |
correlation (self) |
Calculate the correlation between the model and the test data. |
entropy (self) |
Return the entropy of the Gaussian Process distribution. |
expected_improvement (self, x) |
Returns the expected improvement at the design vector X in the model |
get_hyperparameters (self) |
Return the kernel hyperparameters. |
grad_neg_ln_likelihood (self, p) |
Return the negative of the gradient of the log likelihood for the GP when its hyperparameters have some specified value. |
hyperpriortransform (self, p) |
Return the true value in the desired hyperprior space, given an input of a unit-hypercube prior space. |
ln_likelihood (self, p) |
Provides a convenient wrapper to the ln likelihood function. |
loghyperpriors (self, p) |
Calculate the log of the hyperprior distributions at a given point. |
neg_ln_likelihood (self, p) |
Returns the negative of the log-likelihood; designed for use with minimisation algorithms. |
nei (self, x) |
Calculate the negative of the expected improvement at a point x. |
prediction (self, new_datum[, normalised]) |
Produce a prediction at a new point, or set of points. |
rmse (self) |
Calculate the root mean squared error of the whole model. |
save (self, filename) |
Save the Gaussian Process to a file which can be reloaded later. |
set_bmatrix (self, values) |
Set the values of the B matrix from a vector. |
set_hyperparameters (self, hypers) |
Set the hyperparameters of the kernel function. |
test_predict (self) |
Calculate the value of the GP at the test targets. |
train (self[, method, metric, sampler]) |
Train the Gaussian process by finding the optimal values for the kernel hyperparameters. |
update (self) |
Update the stored matrices. |
active_learn
(self, afunction, x, y, iters=1, afunc_args={})[source]¶Actively train the Gaussian process from a set of provided labels and targets using some acquisition function.
correlation
(self)[source]¶Calculate the correlation between the model and the test data.
Returns: |
|
---|
entropy
(self)[source]¶Return the entropy of the Gaussian Process distribution. This can be calculated directly from the covariance matrix, making this a nice, quick calculation to perform.
Returns: |
|
---|
expected_improvement
(self, x)[source]¶Returns the expected improvement at the design vector X in the model
Parameters: |
|
---|---|
Returns: |
|
grad_neg_ln_likelihood
(self, p)[source]¶Return the negative of the gradient of the log likelihood for the GP when its hyperparameters have some specified value.
Parameters: |
|
---|---|
Returns: |
|
hyperpriortransform
(self, p)[source]¶Return the true value in the desired hyperprior space, given an input of a unit-hypercube prior space.
Parameters: |
|
---|---|
Returns: |
|
km
= None¶ln_likelihood
(self, p)[source]¶Provides a convenient wrapper to the ln likelihood function.
Notes
This is implemented in a separate function because of the mild peculiarities of how the pickle module needs to serialise functions, which means that instancemethods (which this would become) can’t be serialised.
loghyperpriors
(self, p)[source]¶Calculate the log of the hyperprior distributions at a given point.
Parameters: |
|
---|
neg_ln_likelihood
(self, p)[source]¶Returns the negative of the log-likelihood; designed for use with minimisation algorithms.
Parameters: |
|
---|---|
Returns: |
|
prediction
(self, new_datum, normalised=False)[source]¶Produce a prediction at a new point, or set of points.
Parameters: |
|
---|---|
Returns: |
|
rmse
(self)[source]¶Calculate the root mean squared error of the whole model.
Returns: |
|
---|
save
(self, filename)[source]¶Save the Gaussian Process to a file which can be reloaded later.
Parameters: |
|
---|
Notes
In the current implementation the serialisation of the GP is performed by the python pickle library, which isn’t guaranteed to be binary-compatible with all machines.
train
(self, method='MCMC', metric='loglikelihood', sampler='ensemble', **kwargs)[source]¶Train the Gaussian process by finding the optimal values for the kernel hyperparameters.
Parameters: |
|
---|
Code to simplify sampling the Gaussian process.
heron.sampling.
draw_samples
(gp, **kwargs)[source]¶Construct an array to pass to the Gaussian process to pull out a number of samples from a high dimensional GP.
Parameters: |
|
---|
These are functions designed to be used for training a Gaussian process made using heron.
heron.training.
cross_validation
(p, gp)[source]¶Calculate the cross-validation factor between the training set and the test set.
Parameters: |
|
---|---|
Returns: |
|
heron.training.
ln_likelihood
(p, gp)[source]¶Returns to log-likelihood of the Gaussian process, which can be used to learn the hyperparameters of the GP.
Parameters: |
|
---|---|
Returns: |
|
Notes
heron.training.
run_sampler
(sampler, initial, iterations)[source]¶Run the MCMC sampler for some number of iterations, but output a progress bar so you can keep track of what’s going on
heron.training.
run_training_map
(gp, metric='loglikelihood', repeats=20, **kwargs)[source]¶Find the maximum a posteriori training values for the Gaussian Process.
Parameters: |
|
---|
Notes
The current implementation has no way of specifying the optimisation algorithm.
heron.training.
run_training_mcmc
(gp, walkers=200, burn=500, samples=1000, metric='loglikelihood', samplertype='ensemble')[source]¶Train a Gaussian process using an MCMC process to find the maximum evidence.
Parameters: |
|
---|---|
Returns: |
|
Notes
At present the algorithm assigns the median of the samples to the value of the kernel vector; this may not ultimately be the best way to do this, and so it should be possible to specify the desired value to be used from the distribution.
heron.training.
run_training_nested
(gp, method='multi', maxiter=None, npoints=1000)[source]¶Train the Gaussian Process model using nested sampling.
Parameters: |
|
---|