Nft¶
The nft module contains methods for generating initial images, and the active learning protocol based on the nearsighted force-training approach.
Initialization¶
NFT active learning¶
- class amp.nft.activelearner.NFT(load=None, log=None, threshold=None, stop_delta=0.1, max_iterations=5, steps_not_improved=2, dblabel='amp-data')[source]¶
Bases:
objectAn automatic protocol, which trains a bootstrap ensemble calculator using the nearsighted force-training approach. Logging with an amp.utilities.Logger instance.
If an ensemble is given, it will be loaded and used as the starting model for training the initial images.
- Parameters:
load (str) – If an ensemble model is given, it will be loaded as the initial model for training the initial images.
log (str or an amp.utilities.Logger instance) – Logging file.
threshold (float) – Controls the number of atomic chunks to be evaluated by single-point calculations. If threshold is positive, the chunks centering on atoms whose atomic uncertainty is above the threshold will be calculated. If threshold is in the range of (-1.0, 0.), 100*abs(1+threshold) percent of all possible atomic chunks will be calculated. For example, threshold=-0.9 indicates that chunks with top 10 percent uncertainties will be calculated.
stop_delta (float) – Termination criterion—if the maximum atomic uncertainty is below the stop_delta, the NFT iteration is stopped.
max_iterations (int) – Termination criterion—if the number of NFT iterations is above max_iterations, the NFT iteration is stopped.
steps_not_improved (int) – Termination criterion—if the structure uncertainty has not improved for consecutive steps_not_improved, the NFT iteration is stopped.
dblabel (str) – Optional separate prefix/location for database files, including fingerprints, fingerprint derivatives, and neighborlists.
- run(images, target_image, n=10, calc_text="\nfrom amp import Amp\nfrom amp.descriptor.gaussian import Gaussian\nfrom amp.model.neuralnetwork import NeuralNetwork\ncalc = Amp(descriptor=Gaussian(),\n model=NeuralNetwork(),\n dblabel='../amp-db')\ncalc.model.lossfunction.parameters['weight_duplicates'] = False\n", headerlines='', start_command='python3 run.py &', train_line='calc.train(images=trainfile)', label='al', parent_calc=None, expired=600.0, cutoff=6.5, init_nft_ids=None, dft_cores=None, dft_memory=None)[source]¶
Trains a bootstrap ensemble in the NFT framework.
For bootstrap training jobs, it can be submitted sequentially or in parallel, depending on the start_command.
As for single-point calculations, if simple calculator, for example EMT, is used, it will be calculated sequentially. In comparison, for DFT calculations, single-point calculations are submitted independently.
The NFT iteration is terminated if either stopping criterion is met. The model with the lowest structure uncertainty is saved in a ‘json’ file named ‘best.[label].ensemble’.
- Parameters:
images (list or str) – List of ASE atoms objects with positions, symbols, energies, and forces in ASE format. This is the initial training data, for example simple bulk cells.
target_images (str or list) – List of ASE atoms objects which should only have one atoms object, which is the target large structure to be learned by NFT.
n (int) – size of ensemble (number of calculators to train)
calc_text (str) – text that is used to initiate the Amp calculator. see the example in this module in calc_text; must produce a ‘calc’ object
headerlines (str) – lines in the top of the python script that will be submitted this would typically contain comment lines for the batching system, such as ‘#SBATCH -n=8…’
start_command (str) – command to start the job in the current queuing system, such as ‘sbatch run.py’ (‘run.py’ is the scriptname here) for serial operation use ‘python run.py’
train_line (str) – line to use to train each amp instance; usually the default is fine but user may want to use this to insert additional keywords such as train_forces=False
label (string) – label to give final trained calculator
parent_calc (instance) – a parent calculator instance. For example EMT().
expired (float) – When checking jobs, age (s) of log file at which to consider that the job is no longer running (timed out) and should be restarted.
cutoff (float) – Cutoff radius to extract atomic chunks.
init_nft_ids – list of length-two tuples to specify initial force only ids. In cases where the initial training data consists of some images which require nearsighted force training, it should be supplied.
dft_cores (int) – Number of DFT cores to be requested for DFT jobs.
dft_memory (str) – Amount of DFT memory per node to be requested. For example, ‘40G’ indicates that memory of 40 Gigabytes per node will be requested.
- Returns:
boolean
- Return type:
whether the NFT is converged on the target image.