Preprocess

The preprocess module contains methods for image and feature selections.

Image selection

class amp.preprocess.image_selection.FurthestPointSampling(images, k=None, encoder='gaussian', order=2, log=None, cutoff=None)[source]

Bases: object

Hierarchical Furthest Point Sampling algorithm is a general selection technique.

To search for informative points which are spread out and can largely represent the original dataset.

Parameters:
  • images (list or str) – List of ASE atoms objects with positions, symbols, energies, and forces in ASE format. This is the training set of data. This can also be the path to an ASE trajectory (.traj) or database (.db) file. Energies can be obtained from any reference, e.g. DFT calculations.

  • k (int) – Number of images to be selected.

  • encoder (str) – Method for encoding the atomic configurations. Available methods are ‘gaussian’ and ‘zernike’.

  • ord ({non-zero int, inf, -inf, 'fro', 'nuc'}, optional) – Order of the norm (see table under Notes). inf means numpy’s inf object. The default is None.

Notes

The following norms can be calculated: ===== ============================ ========================== ord norm for matrices norm for vectors ===== ============================ ========================== None Frobenius norm 2-norm ‘fro’ Frobenius norm – ‘nuc’ nuclear norm – inf max(sum(abs(x), axis=1)) max(abs(x)) -inf min(sum(abs(x), axis=1)) min(abs(x)) 0 – sum(x != 0) 1 max(sum(abs(x), axis=0)) as below -1 min(sum(abs(x), axis=0)) as below 2 2-norm (largest sing. value) as below -2 smallest singular value as below other – sum(abs(x)**ord)**(1./ord) ===== ============================ ==========================

Return type:

Selected images saved in a trajectory file.

distance(x, y)[source]
search(calculate_dev=False, save_traj=False)[source]

Feature selection