data
augmentations
add_depth_dim(X, y)
Add extra dimension at tail for x only. This is trivial to do in-line. This is slightly more convenient than writing a labmda.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
tf.tensor |
required | |
y |
tf.tensor |
required |
Returns:
Type | Description |
---|---|
tf.tensor, tf.tensor |
X, y tuple, with X having a new trailing dimension. |
Source code in indl/data/augmentations.py
def add_depth_dim(X, y):
"""
Add extra dimension at tail for x only. This is trivial to do in-line.
This is slightly more convenient than writing a labmda.
Args:
X (tf.tensor):
y (tf.tensor):
Returns:
tf.tensor, tf.tensor: X, y tuple, with X having a new trailing dimension.
"""
x_dat = tf.expand_dims(X, -1) # Prepare as an image, with only 1 colour-depth channel.
return x_dat, y
cast_type(X, y, x_type=tf.float32, y_type=tf.uint8)
Cast input pair to new dtypes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
tf.tensor |
Input tensor |
required |
y |
tf.tensor |
Input labels |
required |
x_type |
tf.dtypes |
tf data type |
tf.float32 |
y_type |
tf.dtypes |
tf data type |
tf.uint8 |
Returns:
Type | Description |
---|---|
tf.tensor, tf.tensor |
X, y tuple, each cast to its new type. |
Source code in indl/data/augmentations.py
def cast_type(X, y, x_type=tf.float32, y_type=tf.uint8):
"""
Cast input pair to new dtypes.
Args:
X (tf.tensor): Input tensor
y (tf.tensor): Input labels
x_type (tf.dtypes): tf data type
y_type (tf.dtypes): tf data type
Returns:
tf.tensor, tf.tensor: X, y tuple, each cast to its new type.
"""
x_dat = tf.cast(X, x_type)
y_dat = tf.cast(y, y_type)
return x_dat, y_dat
random_slice(X, y, training=True, max_offset=0, axis=1)
Slice a tensor X along axis, beginning at a random offset up to max_offset, taking (X.shape[axis] - max_offset) samples. If training==False, this will take the last N-max_offset samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
tf.tensor |
input tensor |
required |
y |
tf.tensor |
input labels |
required |
training |
bool |
if the model is run in training state |
True |
max_offset |
int |
number of samples |
0 |
axis |
int |
axis along which to slice |
1 |
Returns:
Type | Description |
---|---|
tf.tensor, tf.tensor |
X, y tuple randomly sliced. |
Source code in indl/data/augmentations.py
def random_slice(X, y, training=True, max_offset=0, axis=1):
"""
Slice a tensor X along axis, beginning at a random offset up to max_offset,
taking (X.shape[axis] - max_offset) samples.
If training==False, this will take the last N-max_offset samples.
Args:
X (tf.tensor): input tensor
y (tf.tensor): input labels
training (bool): if the model is run in training state
max_offset (int): number of samples
axis (int): axis along which to slice
Returns:
tf.tensor, tf.tensor: X, y tuple randomly sliced.
"""
if training:
offset = tf.random.uniform(shape=[], minval=0, maxval=max_offset, dtype=tf.int32)
else:
offset = max_offset
n_subsamps = X.shape[axis] - max_offset
if axis == 0:
if len(y.shape) > axis and y.shape[axis] == X.shape[axis]:
y = tf.slice(y, [offset, 0], [n_subsamps, -1])
X = tf.slice(X, [offset, 0], [n_subsamps, -1])
else: # axis == 1
if len(y.shape) > axis and y.shape[axis] == X.shape[axis]:
y = tf.slice(y, [0, offset], [-1, n_subsamps])
X = tf.slice(X, [0, offset], [-1, n_subsamps])
return X, y
helper
get_tf_dataset(X, Y, training=True, batch_size=8, max_offset=0, slice_ax=1)
Convert a pair of tf tensors into a tf.data.Dataset with some augmentations. The added augmentations are:
add_depth_dim
(with default params)cast_type
(with default params)random_slice
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
tf.tensor |
X data - must be compatible with above augmentations. |
required |
Y |
tf.tensor |
Y data - must be compatible with above augmentations. |
required |
training |
bool or tuple |
passed to |
True |
batch_size |
int |
Unused I think. |
8 |
max_offset |
int |
Passed to |
0 |
slice_ax |
int |
Passed to |
1 |
Returns:
Type | Description |
---|---|
tf.data.Dataset(, tf.Dataset) |
A tensorflow dataset with extra augmentations. If training is a tuple then two datasets are returning: training set and test set. |
Source code in indl/data/helper.py
def get_tf_dataset(X, Y, training=True, batch_size=8, max_offset=0, slice_ax=1):
"""
Convert a pair of tf tensors into a tf.data.Dataset with some augmentations.
The added augmentations are:
- `add_depth_dim` (with default params)
- `cast_type` (with default params)
- `random_slice`
Args:
X (tf.tensor): X data - must be compatible with above augmentations.
Y (tf.tensor): Y data - must be compatible with above augmentations.
training (bool or tuple): passed to `random_slice`, or if a tuple
(e.g. from sklearn.model_selection.train_test_split) then this function returns training and test sets.
batch_size (int): Unused I think.
max_offset (int): Passed to `random_slice`
slice_ax (int): Passed to `random_slice`
Returns:
tf.data.Dataset(, tf.Dataset): A tensorflow dataset with extra augmentations. If training is a tuple
then two datasets are returning: training set and test set.
"""
# TODO: trn_test as arg
if isinstance(training, tuple):
ds_train = get_tf_dataset(X[training[0]], Y[training[0]], training=True, batch_size=batch_size)
ds_test = get_tf_dataset(X[training[1]], Y[training[1]], training=False, batch_size=batch_size)
return ds_train, ds_test
_ds = tf.data.Dataset.from_tensor_slices((X, Y))
_ds = _ds.map(add_depth_dim)
_ds = _ds.map(cast_type)
slice_fun = partial(random_slice, training=training, max_offset=max_offset, axis=slice_ax)
_ds = _ds.map(slice_fun)
if training:
_ds = _ds.shuffle()
_ds = _ds.batch(X.shape[0] + 1, drop_remainder=not training)
return _ds