Data
The Data
class creates a data object that handles reading inputs and outputs.
For example, mydata = Data()
creates a Data object mydata
. After read inputs and outputs data using read_inputdata()
and read_outputdata()
, the data are stored as numpy array in two dictionaries, which can be accessed by mydata.input_dict['keys']
and mydata.output_dict['key']
, where the 'key'
refers to input and output variables.
Input data dictionary 'key'
:
'fingerprints'
: 3D array [num_images, num_atoms, num_fingerprints] of the fingerprints.
'atom_type'
: 2D array [num_images, num_atoms] of the atom element type, starting from 1 and sequentially increases.
'volume'
: 2D array [num_images,1] of cell volume, the last dimension is used for matrix multiplication.
'dGdr'
: 4D array [num_images, num_der_paris, num_fingerprints, 3] of the derivative of fingerprints w.r.t atom coordiante, the last dimension is three coordinates.
'neighbor_atom_coord'
: 4D array [num_images, num_der_pairs, 3, 1] to store the neighbor atom coordinates, the last dimension is added for the convenience of matrix multiplication in tensorflow.
'center_atom_id'
: 2D array [num_images, num_der_pairs] to store the center atom ID.
'neighbor_atom_id'
: 2D array [num_images, num_der_pairs] to store the neighbor atom ID, note that the neighbors could be ghost atoms.
Output data dictionary 'keys'
:
'pe'
: 1D array [num_images] of potential energy.
'force'
: 3D array [num_images, num_atoms, 3] of atomic force.
'stress'
: 2D array [num_images, 9] of stress tensor with 9 components (6 independent).
Class
- class atomdnn.data.Data(descriptors_path=None, fp_filename=None, der_filename=None, xyzfile_path=None, xyzfile_name=None, format='extxyz', image_num=None, skip=0, verbose=False, silent=False, read_der=True, **kwargs)[source]
Create Data object, with an option to read inputs and outputs. Parameters are explained in
read_inputdata()
.- read_inputdata(descriptors_path, fp_filename, der_filename=None, image_num=None, skip=0, append=False, verbose=False, silent=False, read_der=True)[source]
Read input data from
read_fingerprints_from_lmpdump()
andread_der_from_lmpdump()
.- Parameters:
descriptor_path – directory to descriptor files
fp_filename – file names for descriptors, use ‘*’ for multiple files order numerically
der_filename – file names for derivatives, use ‘*’ for multiple files order numerically
image_num – None if read all files given by the fp_filename
skip (int) – skip some images
append (bool) – True if append inputs to already existing data object
verbose (bool) – True to show all reading file names
read_der (bool) – True if read derivatives
- read_fingerprints_from_lmpdump(descriptors_path, fp_filename, image_num=None, skip=0, append=False, verbose=False, silent=False)[source]
Read descriptors(fingerprints), atom_type and volume from the descriptor files created with LAMMPS, and save them into data object.
- read_der_from_lmpdump(descriptor_path, der_filename, image_num=None, skip=0, append=False, verbose=False, silent=False)[source]
Read derivatives of fingerprints w.r.t. coordinates (dGdr), neibhor_atom_coord, center_atom_id, neighbor_atom_id.
- read_outputdata(xyzfile_path, xyzfile_name, format='extxyz', image_num=None, skip=0, append=False, verbose=False, silent=False, read_force=True, read_stress=True, **kwargs)[source]
Read outputs(energy, force and stress) from extxyz files
- Parameters:
xyzfile_path – directory contains a serials of input atomic structures
xyzfile_name – atomic structure filename, wildcard * is used for files numerically ordered
format – ‘lammp-data’,’extxyz’,’vasp’ etc. See complete list on https://wiki.fysik.dtu.dk/ase/ase/io/io.html#ase.io.read. ‘extxyz’ is recommanded.
read_force (bool) – make sure extxyz files have force data if it’s True
read_stress (bool) – make sure extxyz files have stress data if it’s True
image_num – number of images that will be used, if it’s None then read all files specified by xyzfile_name
append (bool) – append the reading to previous data object
verbose (bool) – set to True if want to print out the extxyz file names
kwargs – used to pass optional file styles
- slice(start=None, end=None)[source]
Slice the data between image start and image end, and return both the input and output dictionaries. Index starts from 1
- get_input_dict(start=None, end=None)[source]
Return the input dictionaries from image start to image end. Index starts from 1. If end is not privided, return only one dictionary of image start.
- get_output_dict(start=None, end=None)[source]
Return the output dictionaries from image start to image end. Index starts from 1. If end is not privided, return only one dictionary of image start.
Functions
These functions are used to manipulate Tensorflow dataset
- atomdnn.data.split_dataset(dataset, train_pct, val_pct=None, test_pct=None, shuffle=False, data_size=None)[source]
Split the tensorflow dataset into training, validation and test.
- Parameters:
dataset – tensorflow dataset
train_pct – the percentage of data used for training
val_pct – the percentage of data used for validation
test_pct – the percentage of data used for testing
shuffle (bool) – shuffle the dataset
data_size (int) – if None, then use all data in the dataset
- Returns:
training, validation and test dataset
- Return type:
tensorflow dataset
- atomdnn.data.get_input_dict(dataset)[source]
- Parameters:
dataset – Tensorflow dataset
- Returns:
input dictionary, see
Data
for the structure of the dictionary- Return type:
dictionary