Data

The Data class creates a data object that handles reading inputs and outputs.

For example, mydata = Data() creates a Data object mydata. After read inputs and outputs data using read_inputdata() and read_outputdata(), the data are stored as numpy array in two dictionaries, which can be accessed by mydata.input_dict['keys'] and mydata.output_dict['key'], where the 'key' refers to input and output variables.

Input data dictionary 'key':

'fingerprints': 3D array [num_images, num_atoms, num_fingerprints] of the fingerprints.

'atom_type': 2D array [num_images, num_atoms] of the atom element type, starting from 1 and sequentially increases.

'volume': 2D array [num_images,1] of cell volume, the last dimension is used for matrix multiplication.

'dGdr': 4D array [num_images, num_der_paris, num_fingerprints, 3] of the derivative of fingerprints w.r.t atom coordiante, the last dimension is three coordinates.

'neighbor_atom_coord': 4D array [num_images, num_der_pairs, 3, 1] to store the neighbor atom coordinates, the last dimension is added for the convenience of matrix multiplication in tensorflow.

'center_atom_id': 2D array [num_images, num_der_pairs] to store the center atom ID.

'neighbor_atom_id': 2D array [num_images, num_der_pairs] to store the neighbor atom ID, note that the neighbors could be ghost atoms.

Output data dictionary 'keys':

'pe': 1D array [num_images] of potential energy.

'force': 3D array [num_images, num_atoms, 3] of atomic force.

'stress': 2D array [num_images, 9] of stress tensor with 9 components (6 independent).

Class

class atomdnn.data.Data(descriptors_path=None, fp_filename=None, der_filename=None, xyzfile_path=None, xyzfile_name=None, format='extxyz', image_num=None, skip=0, verbose=False, silent=False, read_der=True, **kwargs)[source]

Create Data object, with an option to read inputs and outputs. Parameters are explained in read_inputdata().

read_inputdata(descriptors_path, fp_filename, der_filename=None, image_num=None, skip=0, append=False, verbose=False, silent=False, read_der=True)[source]

Read input data from read_fingerprints_from_lmpdump() and read_der_from_lmpdump().

Parameters:

descriptor_path – directory to descriptor files
fp_filename – file names for descriptors, use ‘*’ for multiple files order numerically
der_filename – file names for derivatives, use ‘*’ for multiple files order numerically
image_num – None if read all files given by the fp_filename
skip (int) – skip some images
append (bool) – True if append inputs to already existing data object
verbose (bool) – True to show all reading file names
read_der (bool) – True if read derivatives

read_fingerprints_from_lmpdump(descriptors_path, fp_filename, image_num=None, skip=0, append=False, verbose=False, silent=False)[source]: Read descriptors(fingerprints), atom_type and volume from the descriptor files created with LAMMPS, and save them into data object.

read_der_from_lmpdump(descriptor_path, der_filename, image_num=None, skip=0, append=False, verbose=False, silent=False)[source]: Read derivatives of fingerprints w.r.t. coordinates (dGdr), neibhor_atom_coord, center_atom_id, neighbor_atom_id.

read_outputdata(xyzfile_path, xyzfile_name, format='extxyz', image_num=None, skip=0, append=False, verbose=False, silent=False, read_force=True, read_stress=True, **kwargs)[source]

Read outputs(energy, force and stress) from extxyz files

Parameters:

xyzfile_path – directory contains a serials of input atomic structures
xyzfile_name – atomic structure filename, wildcard * is used for files numerically ordered
format – ‘lammp-data’,’extxyz’,’vasp’ etc. See complete list on https://wiki.fysik.dtu.dk/ase/ase/io/io.html#ase.io.read. ‘extxyz’ is recommanded.
read_force (bool) – make sure extxyz files have force data if it’s True
read_stress (bool) – make sure extxyz files have stress data if it’s True
image_num – number of images that will be used, if it’s None then read all files specified by xyzfile_name
append (bool) – append the reading to previous data object
verbose (bool) – set to True if want to print out the extxyz file names
kwargs – used to pass optional file styles

shuffle()[source]: Shuffle the data.

slice(start=None, end=None)[source]: Slice the data between image start and image end, and return both the input and output dictionaries. Index starts from 1

get_input_dict(start=None, end=None)[source]: Return the input dictionaries from image start to image end. Index starts from 1. If end is not privided, return only one dictionary of image start.

get_output_dict(start=None, end=None)[source]: Return the output dictionaries from image start to image end. Index starts from 1. If end is not privided, return only one dictionary of image start.

convert_data_to_tensor()[source]: Convert the input and ouput data to Tensorflow tensors. This can speed up the data manipulation using Tensorflow functions.

check_data()[source]: Check consistance of input and output data.

append(apdata, read_force=True, read_stress=True)[source]: Append one dataset with a second dataset.

Functions

These functions are used to manipulate Tensorflow dataset

atomdnn.data.split_dataset(dataset, train_pct, val_pct=None, test_pct=None, shuffle=False, data_size=None)[source]

Split the tensorflow dataset into training, validation and test.

Parameters:

dataset – tensorflow dataset
train_pct – the percentage of data used for training
val_pct – the percentage of data used for validation
test_pct – the percentage of data used for testing
shuffle (bool) – shuffle the dataset
data_size (int) – if None, then use all data in the dataset

Returns:

training, validation and test dataset

Return type:

tensorflow dataset

atomdnn.data.get_input_dict(dataset)[source]

Parameters:: dataset – Tensorflow dataset
Returns:: input dictionary, see Data for the structure of the dictionary
Return type:: dictionary

atomdnn.data.get_output_dict(dataset)[source]

Parameters:: dataset – Tensorflow dataset
Returns:: output dictionary, see Data for the structure of the dictionary
Return type:: dictionary

atomdnn.data.slice_dataset(dataset, start, end)[source]

Get a slice of the dataset. :param dataset: input dataset :param start: starting index :param end: ending index

Returns:: tensorflow dataset