audioperm package¶

Submodules¶

audioperm.audioperm module¶

class audioperm.audioperm.AudioPerm(audio, sr=22050, **kwargs)¶

Bases: object

The main class for audioperm. Takes an audio file (or a batch of files) path or numpy array (int16, float). Internal audio representation is pcm 16 (not same as librosa default).

permute(n_permutations=1, interm_silence=1000)¶

Get the permutation of words. TODO: Use yield.

Parameters

n_permutations (int) – Number of (max) permutations to return
interm_silence (int) – Intermediate silence between words (in ms).

Returns

Union[list of list of ndarray, list of ndarray]

word_segments(silence_thresh=- 60.0, min_silence_len=5, return_words=True)¶

Segments the audio files into multiple segments or words. TODO: Improve word segmentation. Add label wise segmentation (If given n words as labels, find n appropriate words).

Parameters

silence_thresh (float) – Silence threshold for segmenting the audio. Same as pydub.
min_silence_len (int) – Minimum silence lenth (in ms). Same as pydub.

Returns

Union[list of list of ndarray, list of ndarray]

audioperm.utils module¶

Helper functions for audioperm.

audioperm.utils.max_min_heuristics(sig, max_perc=0.2, min_perc=0.2)¶

Calculates the avg max and avg min considering a percentage of sorted amplitudes. For audio signals finding a single peak or valley is not enough. So, we take the average of top perc percentage of the population. :param sig: a numpy array :type sig: ndarray :param max_perc: Population percentage for taking max :type max_perc: float :param min_perc: Population percentage for taking max :type min_perc: float

Returns

tuple containing:: max_p(float): population max for positive signal min_p(float): population min for positive signal max_n(float): population max for negative signal min_n(float): population min for negative signal

Return type

(tuple)

audioperm.utils.noise_boundaries(sig, max_perc=0.2, min_perc=0.2)¶

Calculates maximum noise boundaries for a signal. :param sig: a numpy array :type sig: ndarray :param max_perc: Population percentage for taking max :type max_perc: float :param min_perc: Population percentage for taking max :type min_perc: float

Returns

tuple containing:: max_n(float): maximum boundary for noise min_n(float): minimum boundary for noise

Return type

(tuple)

audioperm.utils.save_audio(sig, filename, sr=22050)¶: Takes a PCM 16 or float32 signal and saves the audio in pcm16 format. :param sig: a numpy array :type sig: ndarray :param filename: Filepath and filename. :type filename: str :param sr: Sampling rate. :type sr: int

audioperm.utils.type_nested(iterable, tp)¶

Finds if array is of type tp (homogenous). :param iterable: a list :type iterable: list :param tp: type of iterable :type tp: type

Returns: If all are of same type.
Return type: bool

Module contents¶

A python library for generating different permutations of audible segments from audio files.