web-dev-qa-db-fra.com

Extraire les niveaux du compteur à partir d'un fichier audio

J'ai besoin d'extraire les niveaux du sonomètre d'un fichier afin de pouvoir rendre les niveaux avant de lire l'audio. Je sais que AVAudioPlayer peut obtenir ces informations lors de la lecture du fichier audio via

func averagePower(forChannel channelNumber: Int) -> Float.

Mais dans mon cas, je voudrais obtenir un [Float] des niveaux des compteurs au préalable.

16
Peter Warbo

Swift 4

Il faut un iPhone:

  • 0,538 s pour traiter un lecteur mp3 8MByte Avec une durée de 4min47s Et un taux d'échantillonnage de 44,100

  • 0,170 s pour traiter un lecteur mp3 712KByte Avec une durée de 22s Et un taux d'échantillonnage de 44,100

  • 0,089 s pour traiter caffichier créé en convertissant le fichier ci-dessus en utilisant cette commande afconvert -f caff -d LEI16 audio.mp3 audio.caf Dans le terminal.

Commençons:

A) Déclarez cette classe qui contiendra les informations nécessaires sur la ressource audio:

/// Holds audio information used for building waveforms
final class AudioContext {

    /// The audio asset URL used to load the context
    public let audioURL: URL

    /// Total number of samples in loaded asset
    public let totalSamples: Int

    /// Loaded asset
    public let asset: AVAsset

    // Loaded assetTrack
    public let assetTrack: AVAssetTrack

    private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
        self.audioURL = audioURL
        self.totalSamples = totalSamples
        self.asset = asset
        self.assetTrack = assetTrack
    }

    public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
        let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])

        guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
            fatalError("Couldn't load AVAssetTrack")
        }

        asset.loadValuesAsynchronously(forKeys: ["duration"]) {
            var error: NSError?
            let status = asset.statusOfValue(forKey: "duration", error: &error)
            switch status {
            case .loaded:
                guard
                    let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                    let audioFormatDesc = formatDescriptions.first,
                    let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                    else { break }

                let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                completionHandler(audioContext)
                return

            case .failed, .cancelled, .loading, .unknown:
                print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
            }

            completionHandler(nil)
        }
    }
}

Nous allons utiliser sa fonction asynchrone load et gérer son résultat dans un gestionnaire de complétion.

B) Importez AVFoundation et Accelerate dans votre contrôleur de vue:

import AVFoundation
import Accelerate

C) Déclarez le niveau de bruit dans votre contrôleur de vue (en dB):

let noiseFloor: Float = -80

Par exemple, tout ce qui est inférieur à -80dB Sera considéré comme du silence.

D) La fonction suivante prend un contexte audio et produit les puissances dB souhaitées. targetSamples est par défaut réglé sur 100, vous pouvez le modifier pour l'adapter à vos besoins d'interface utilisateur:

func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }

    let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples/3

    guard let reader = try? AVAssetReader(asset: audioContext.asset)
        else {
            fatalError("Couldn't initialize the AVAssetReader")
    }

    reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                   duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))

    let outputSettingsDict: [String : Any] = [
        AVFormatIDKey: Int(kAudioFormatLinearPCM),
        AVLinearPCMBitDepthKey: 16,
        AVLinearPCMIsBigEndianKey: false,
        AVLinearPCMIsFloatKey: false,
        AVLinearPCMIsNonInterleaved: false
    ]

    let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                outputSettings: outputSettingsDict)
    readerOutput.alwaysCopiesSampleData = false
    reader.add(readerOutput)

    var channelCount = 1
    let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
    for item in formatDescriptions {
        guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
            fatalError("Couldn't get the format description")
        }
        channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
    }

    let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
    let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)

    var outputSamples = [Float]()
    var sampleBuffer = Data()

    // 16-bit samples
    reader.startReading()
    defer { reader.cancelReading() }

    while reader.status == .reading {
        guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
            let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                break
        }
        // Append audio sample buffer into our current sample buffer
        var readBufferLength = 0
        var readBufferPointer: UnsafeMutablePointer<Int8>?
        CMBlockBufferGetDataPointer(readBuffer, 0, &readBufferLength, nil, &readBufferPointer)
        sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
        CMSampleBufferInvalidate(readSampleBuffer)

        let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
        let downSampledLength = totalSamples / samplesPerPixel
        let samplesToProcess = downSampledLength * samplesPerPixel

        guard samplesToProcess > 0 else { continue }

        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }

    // Process the remaining samples at the end which didn't fit into samplesPerPixel
    let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
    if samplesToProcess > 0 {
        let downSampledLength = 1
        let samplesPerPixel = samplesToProcess
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)

        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }

    // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
    guard reader.status == .completed else {
        fatalError("Couldn't read the audio file")
    }

    return outputSamples
}

E) render utilise cette fonction pour sous-échantillonner les données du fichier audio et les convertir en décibels:

func processSamples(fromData sampleBuffer: inout Data,
                    outputSamples: inout [Float],
                    samplesToProcess: Int,
                    downSampledLength: Int,
                    samplesPerPixel: Int,
                    filter: [Float]) {
    sampleBuffer.withUnsafeBytes { (samples: UnsafePointer<Int16>) in
        var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)

        let sampleCount = vDSP_Length(samplesToProcess)

        //Convert 16bit int samples to floats
        vDSP_vflt16(samples, 1, &processingBuffer, 1, sampleCount)

        //Take the absolute values to get amplitude
        vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)

        //get the corresponding dB, and clip the results
        getdB(from: &processingBuffer)

        //Downsample and average
        var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
        vDSP_desamp(processingBuffer,
                    vDSP_Stride(samplesPerPixel),
                    filter, &downSampledData,
                    vDSP_Length(downSampledLength),
                    vDSP_Length(samplesPerPixel))

        //Remove processed samples
        sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)

        outputSamples += downSampledData
    }
}

F) Qui à son tour appelle cette fonction qui obtient le dB correspondant et coupe les résultats dans [noiseFloor, 0]:

func getdB(from normalizedSamples: inout [Float]) {
    // Convert samples to a log scale
    var zero: Float = 32768.0
    vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)

    //Clip to [noiseFloor, 0]
    var ceil: Float = 0.0
    var noiseFloorMutable = noiseFloor
    vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
}

G) Enfin, vous pouvez obtenir la forme d'onde de l'audio comme suit:

guard let path = Bundle.main.path(forResource: "audio", ofType:"mp3") else {
    fatalError("Couldn't find the file path")
}
let url = URL(fileURLWithPath: path)
var outputArray : [Float] = []
AudioContext.load(fromAudioURL: url, completionHandler: { audioContext in
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }
    outputArray = self.render(audioContext: audioContext, targetSamples: 300)
})

N'oubliez pas que AudioContext.load(fromAudioURL:) est asynchrone.

Cette solution est synthétisée à partir de ce dépôt par William Entriken . Tout le mérite lui revient.


Swift 5

Voici le même code mis à jour en Swift 5:

import AVFoundation
import Accelerate

/// Holds audio information used for building waveforms
final class AudioContext {

    /// The audio asset URL used to load the context
    public let audioURL: URL

    /// Total number of samples in loaded asset
    public let totalSamples: Int

    /// Loaded asset
    public let asset: AVAsset

    // Loaded assetTrack
    public let assetTrack: AVAssetTrack

    private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
        self.audioURL = audioURL
        self.totalSamples = totalSamples
        self.asset = asset
        self.assetTrack = assetTrack
    }

    public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
        let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])

        guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
            fatalError("Couldn't load AVAssetTrack")
        }

        asset.loadValuesAsynchronously(forKeys: ["duration"]) {
            var error: NSError?
            let status = asset.statusOfValue(forKey: "duration", error: &error)
            switch status {
            case .loaded:
                guard
                    let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                    let audioFormatDesc = formatDescriptions.first,
                    let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                    else { break }

                let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                completionHandler(audioContext)
                return

            case .failed, .cancelled, .loading, .unknown:
                print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
            }

            completionHandler(nil)
        }
    }
}

let noiseFloor: Float = -80

func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }

    let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples/3

    guard let reader = try? AVAssetReader(asset: audioContext.asset)
        else {
            fatalError("Couldn't initialize the AVAssetReader")
    }

    reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                   duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))

    let outputSettingsDict: [String : Any] = [
        AVFormatIDKey: Int(kAudioFormatLinearPCM),
        AVLinearPCMBitDepthKey: 16,
        AVLinearPCMIsBigEndianKey: false,
        AVLinearPCMIsFloatKey: false,
        AVLinearPCMIsNonInterleaved: false
    ]

    let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                outputSettings: outputSettingsDict)
    readerOutput.alwaysCopiesSampleData = false
    reader.add(readerOutput)

    var channelCount = 1
    let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
    for item in formatDescriptions {
        guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
            fatalError("Couldn't get the format description")
        }
        channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
    }

    let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
    let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)

    var outputSamples = [Float]()
    var sampleBuffer = Data()

    // 16-bit samples
    reader.startReading()
    defer { reader.cancelReading() }

    while reader.status == .reading {
        guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
            let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                break
        }
        // Append audio sample buffer into our current sample buffer
        var readBufferLength = 0
        var readBufferPointer: UnsafeMutablePointer<Int8>?
        CMBlockBufferGetDataPointer(readBuffer,
                                    atOffset: 0,
                                    lengthAtOffsetOut: &readBufferLength,
                                    totalLengthOut: nil,
                                    dataPointerOut: &readBufferPointer)
        sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
        CMSampleBufferInvalidate(readSampleBuffer)

        let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
        let downSampledLength = totalSamples / samplesPerPixel
        let samplesToProcess = downSampledLength * samplesPerPixel

        guard samplesToProcess > 0 else { continue }

        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }

    // Process the remaining samples at the end which didn't fit into samplesPerPixel
    let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
    if samplesToProcess > 0 {
        let downSampledLength = 1
        let samplesPerPixel = samplesToProcess
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)

        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }

    // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
    guard reader.status == .completed else {
        fatalError("Couldn't read the audio file")
    }

    return outputSamples
}

func processSamples(fromData sampleBuffer: inout Data,
                    outputSamples: inout [Float],
                    samplesToProcess: Int,
                    downSampledLength: Int,
                    samplesPerPixel: Int,
                    filter: [Float]) {

    sampleBuffer.withUnsafeBytes { (samples: UnsafeRawBufferPointer) in
        var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)

        let sampleCount = vDSP_Length(samplesToProcess)

        //Create an UnsafePointer<Int16> from samples
        let unsafeBufferPointer = samples.bindMemory(to: Int16.self)
        let unsafePointer = unsafeBufferPointer.baseAddress!

        //Convert 16bit int samples to floats
        vDSP_vflt16(unsafePointer, 1, &processingBuffer, 1, sampleCount)

        //Take the absolute values to get amplitude
        vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)

        //get the corresponding dB, and clip the results
        getdB(from: &processingBuffer)

        //Downsample and average
        var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
        vDSP_desamp(processingBuffer,
                    vDSP_Stride(samplesPerPixel),
                    filter, &downSampledData,
                    vDSP_Length(downSampledLength),
                    vDSP_Length(samplesPerPixel))

        //Remove processed samples
        sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)

        outputSamples += downSampledData
    }
}

func getdB(from normalizedSamples: inout [Float]) {
    // Convert samples to a log scale
    var zero: Float = 32768.0
    vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)

    //Clip to [noiseFloor, 0]
    var ceil: Float = 0.0
    var noiseFloorMutable = noiseFloor
    vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
}

Ancienne solution

Voici une fonction que vous pouvez utiliser pour pré-rendre les niveaux de mesure d'un fichier audio sans le lire:

func averagePowers(audioFileURL: URL, forChannel channelNumber: Int, completionHandler: @escaping(_ success: [Float]) -> ()) {
    let audioFile = try! AVAudioFile(forReading: audioFileURL)
    let audioFilePFormat = audioFile.processingFormat
    let audioFileLength = audioFile.length

    //Set the size of frames to read from the audio file, you can adjust this to your liking
    let frameSizeToRead = Int(audioFilePFormat.sampleRate/20)

    //This is to how many frames/portions we're going to divide the audio file
    let numberOfFrames = Int(audioFileLength)/frameSizeToRead

    //Create a pcm buffer the size of a frame
    guard let audioBuffer = AVAudioPCMBuffer(pcmFormat: audioFilePFormat, frameCapacity: AVAudioFrameCount(frameSizeToRead)) else {
        fatalError("Couldn't create the audio buffer")
    }

    //Do the calculations in a background thread, if you don't want to block the main thread for larger audio files
    DispatchQueue.global(qos: .userInitiated).async {

        //This is the array to be returned
        var returnArray : [Float] = [Float]()

        //We're going to read the audio file, frame by frame
        for i in 0..<numberOfFrames {

            //Change the position from which we are reading the audio file, since each frame starts from a different position in the audio file
            audioFile.framePosition = AVAudioFramePosition(i * frameSizeToRead)

            //Read the frame from the audio file
            try! audioFile.read(into: audioBuffer, frameCount: AVAudioFrameCount(frameSizeToRead))

            //Get the data from the chosen channel
            let channelData = audioBuffer.floatChannelData![channelNumber]

            //This is the array of floats
            let arr = Array(UnsafeBufferPointer(start:channelData, count: frameSizeToRead))

            //Calculate the mean value of the absolute values
            let meanValue = arr.reduce(0, {$0 + abs($1)})/Float(arr.count)

            //Calculate the dB power (You can adjust this), if average is less than 0.000_000_01 we limit it to -160.0
            let dbPower: Float = meanValue > 0.000_000_01 ? 20 * log10(meanValue) : -160.0

            //append the db power in the current frame to the returnArray
            returnArray.append(dbPower)
        }

        //Return the dBPowers
        completionHandler(returnArray)
    }
}

Et vous pouvez l'appeler ainsi:

let path = Bundle.main.path(forResource: "audio.mp3", ofType:nil)!
let url = URL(fileURLWithPath: path)
averagePowers(audioFileURL: url, forChannel: 0, completionHandler: { array in
    //Use the array
})

En utilisant des instruments, cette solution permet une utilisation élevée du processeur pendant 1,2 seconde, prend environ 5 secondes pour revenir au thread principal avec le returnArray, et jusqu'à 10 secondes lorsqu'elle est activée mode batterie faible =.

17
ielyamani

Tout d'abord, c'est une opération lourde, il faudra donc du temps et des ressources du système d'exploitation pour y parvenir. Dans l'exemple ci-dessous, j'utiliserai des fréquences d'images et un échantillonnage standard, mais vous devriez vraiment échantillonner beaucoup moins si vous ne souhaitez par exemple afficher que des barres à titre indicatif

OK, vous n'avez donc pas besoin de jouer du son pour l'analyser. Donc, dans ce cas, je n'utiliserai pas du tout AVAudioPlayer je suppose que je prendrai la trace en tant que URL:

    let path = Bundle.main.path(forResource: "example3.mp3", ofType:nil)!
    let url = URL(fileURLWithPath: path)

Ensuite, j'utiliserai AVAudioFile pour obtenir les informations de piste dans AVAudioPCMBuffer . Chaque fois que vous l'avez en mémoire tampon, vous avez toutes les informations concernant votre piste:

func buffer(url: URL) {
    do {
        let track = try AVAudioFile(forReading: url)
        let format = AVAudioFormat(commonFormat:.pcmFormatFloat32, sampleRate:track.fileFormat.sampleRate, channels: track.fileFormat.channelCount,  interleaved: false)
        let buffer = AVAudioPCMBuffer(pcmFormat: format!, frameCapacity: UInt32(track.length))!
        try track.read(into : buffer, frameCount:UInt32(track.length))
        self.analyze(buffer: buffer)
    } catch {
        print(error)
    }
}

Comme vous pouvez le constater, il existe une méthode analyze. Vous devriez avoir près de la variable floatChannelData dans votre tampon. Il s'agit de données simples, vous devrez donc les analyser. Je vais poster une méthode et ci-dessous l'expliquer:

func analyze(buffer: AVAudioPCMBuffer) {
    let channelCount = Int(buffer.format.channelCount)
    let frameLength = Int(buffer.frameLength)
    var result = Array(repeating: [Float](repeatElement(0, count: frameLength)), count: channelCount)
    for channel in 0..<channelCount {
        for sampleIndex in 0..<frameLength {
            let sqrtV = sqrt(buffer.floatChannelData![channel][sampleIndex*buffer.stride]/Float(buffer.frameLength))
            let dbPower = 20 * log10(sqrtV)
            result[channel][sampleIndex] = dbPower
        }
    }
}

Il y a des calculs (lourds) impliqués. Quand je travaillais sur des solutions similaires il y a quelques mois, je suis tombé sur ce tutoriel: https://www.raywenderlich.com/5154-avaudioengine-tutorial-for-ios-getting-started il y a un excellent explication de ce calcul là-bas et aussi des parties du code que j'ai collé ci-dessus et que j'utilise également dans mon projet, donc je veux créditer l'auteur ici: Scott McAlister ????

10
Jakub