audioperm package

Submodules

audioperm.audioperm module

class audioperm.audioperm.AudioPerm(audio, sr=22050, **kwargs)

Bases: object

The main class for audioperm. Takes an audio file (or a batch of files) path or numpy array (int16, float). Internal audio representation is pcm 16 (not same as librosa default).

permute(n_permutations=1, interm_silence=1000)

Get the permutation of words. TODO: Use yield.

Parameters
  • n_permutations (int) – Number of (max) permutations to return

  • interm_silence (int) – Intermediate silence between words (in ms).

Returns

Union[list of list of ndarray, list of ndarray]

word_segments(silence_thresh=- 60.0, min_silence_len=5, return_words=True)

Segments the audio files into multiple segments or words. TODO: Improve word segmentation. Add label wise segmentation (If given n words as labels, find n appropriate words).

Parameters
  • silence_thresh (float) – Silence threshold for segmenting the audio. Same as pydub.

  • min_silence_len (int) – Minimum silence lenth (in ms). Same as pydub.

Returns

Union[list of list of ndarray, list of ndarray]

audioperm.utils module

Helper functions for audioperm.

audioperm.utils.max_min_heuristics(sig, max_perc=0.2, min_perc=0.2)

Calculates the avg max and avg min considering a percentage of sorted amplitudes. For audio signals finding a single peak or valley is not enough. So, we take the average of top perc percentage of the population. :param sig: a numpy array :type sig: ndarray :param max_perc: Population percentage for taking max :type max_perc: float :param min_perc: Population percentage for taking max :type min_perc: float

Returns

tuple containing:

max_p(float): population max for positive signal min_p(float): population min for positive signal max_n(float): population max for negative signal min_n(float): population min for negative signal

Return type

(tuple)

audioperm.utils.noise_boundaries(sig, max_perc=0.2, min_perc=0.2)

Calculates maximum noise boundaries for a signal. :param sig: a numpy array :type sig: ndarray :param max_perc: Population percentage for taking max :type max_perc: float :param min_perc: Population percentage for taking max :type min_perc: float

Returns

tuple containing:

max_n(float): maximum boundary for noise min_n(float): minimum boundary for noise

Return type

(tuple)

audioperm.utils.save_audio(sig, filename, sr=22050)

Takes a PCM 16 or float32 signal and saves the audio in pcm16 format. :param sig: a numpy array :type sig: ndarray :param filename: Filepath and filename. :type filename: str :param sr: Sampling rate. :type sr: int

audioperm.utils.type_nested(iterable, tp)

Finds if array is of type tp (homogenous). :param iterable: a list :type iterable: list :param tp: type of iterable :type tp: type

Returns

If all are of same type.

Return type

bool

Module contents

A python library for generating different permutations of audible segments from audio files.