Convertir une image en CVPixelBuffer pour Swift d'apprentissage automatique

Question

J'essaie de faire en sorte que les exemples de modèles Core ML d'Apple présentés au WWDC 2017 fonctionnent correctement. J'utilise GoogLeNet pour essayer de classer les images (voir la page Apple Machine Learning ). Le modèle prend un CVPixelBuffer en entrée. J'ai une image appelée imageSample.jpg que j'utilise pour cette démo. Mon code est ci-dessous:

 var sample = UIImage(named: "imageSample")?.cgImage let bufferThree = getCVPixelBuffer(sample!) let model = GoogLeNetPlaces() guard let output = try? model.prediction(input: GoogLeNetPlacesInput.init(sceneImage: bufferThree!)) else { fatalError("Unexpected runtime error.") } print(output.sceneLabel)

Je reçois toujours l'erreur d'exécution inattendue dans la sortie plutôt qu'une classification d'image. Mon code pour convertir l'image est ci-dessous:

func getCVPixelBuffer(_ image: CGImage) -> CVPixelBuffer? { let imageWidth = Int(image.width) let imageHeight = Int(image.height) let attributes : [NSObject:AnyObject] = [ kCVPixelBufferCGImageCompatibilityKey : true as AnyObject, kCVPixelBufferCGBitmapContextCompatibilityKey : true as AnyObject ] var pxbuffer: CVPixelBuffer? = nil CVPixelBufferCreate(kCFAllocatorDefault, imageWidth, imageHeight, kCVPixelFormatType_32ARGB, attributes as CFDictionary?, &pxbuffer) if let _pxbuffer = pxbuffer { let flags = CVPixelBufferLockFlags(rawValue: 0) CVPixelBufferLockBaseAddress(_pxbuffer, flags) let pxdata = CVPixelBufferGetBaseAddress(_pxbuffer) let rgbColorSpace = CGColorSpaceCreateDeviceRGB(); let context = CGContext(data: pxdata, width: imageWidth, height: imageHeight, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(_pxbuffer), space: rgbColorSpace, bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue) if let _context = context { _context.draw(image, in: CGRect.init(x: 0, y: 0, width: imageWidth, height: imageHeight)) } else { CVPixelBufferUnlockBaseAddress(_pxbuffer, flags); return nil } CVPixelBufferUnlockBaseAddress(_pxbuffer, flags); return _pxbuffer; } return nil }

J'ai reçu ce code d'un précédent article de StackOverflow (dernière réponse ici ). Je reconnais que le code peut ne pas être correct, mais je ne sais pas comment faire cela moi-même. Je crois que c'est la section qui contient l'erreur. Le modèle appelle pour le type d'entrée suivant: Image<RGB,224,224>

rickster · Accepted Answer

Vous n'avez pas besoin de faire beaucoup d'images pour utiliser un modèle Core ML avec une image - le nouveau framework Vision peut le faire pour vous.

import Vision import CoreML let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model) let request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod) let handler = VNImageRequestHandler(url: myImageURL) handler.perform([request]) func myResultsMethod(request: VNRequest, error: Error?) { guard let results = request.results as? [VNClassificationObservation] else { fatalError("huh") } for classification in results { print(classification.identifier, // the scene label classification.confidence) } }

La session WWDC17 sur Vision devrait contenir un peu plus d’informations - c’est demain après-midi.

coldfire · Answer

Vous pouvez utiliser un CoreML pur, mais vous devez redimensionner une image en (224,224).

 DispatchQueue.global(qos: .userInitiated).async { // Resnet50 expects an image 224 x 224, so we should resize and crop the source image let inputImageSize: CGFloat = 224.0 let minLen = min(image.size.width, image.size.height) let resizedImage = image.resize(to: CGSize(width: inputImageSize * image.size.width / minLen, height: inputImageSize * image.size.height / minLen)) let cropedToSquareImage = resizedImage.cropToSquare() guard let pixelBuffer = cropedToSquareImage?.pixelBuffer() else { fatalError() } guard let classifierOutput = try? self.classifier.prediction(image: pixelBuffer) else { fatalError() } DispatchQueue.main.async { self.title = classifierOutput.classLabel } } // ... extension UIImage { func resize(to newSize: CGSize) -> UIImage { UIGraphicsBeginImageContextWithOptions(CGSize(width: newSize.width, height: newSize.height), true, 1.0) self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height)) let resizedImage = UIGraphicsGetImageFromCurrentImageContext()! UIGraphicsEndImageContext() return resizedImage } func cropToSquare() -> UIImage? { guard let cgImage = self.cgImage else { return nil } var imageHeight = self.size.height var imageWidth = self.size.width if imageHeight > imageWidth { imageHeight = imageWidth } else { imageWidth = imageHeight } let size = CGSize(width: imageWidth, height: imageHeight) let x = ((CGFloat(cgImage.width) - size.width) / 2).rounded() let y = ((CGFloat(cgImage.height) - size.height) / 2).rounded() let cropRect = CGRect(x: x, y: y, width: size.height, height: size.width) if let croppedCgImage = cgImage.cropping(to: cropRect) { return UIImage(cgImage: croppedCgImage, scale: 0, orientation: self.imageOrientation) } return nil } func pixelBuffer() -> CVPixelBuffer? { let width = self.size.width let height = self.size.height let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary var pixelBuffer: CVPixelBuffer? let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(width), Int(height), kCVPixelFormatType_32ARGB, attrs, &pixelBuffer) guard let resultPixelBuffer = pixelBuffer, status == kCVReturnSuccess else { return nil } CVPixelBufferLockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0)) let pixelData = CVPixelBufferGetBaseAddress(resultPixelBuffer) let rgbColorSpace = CGColorSpaceCreateDeviceRGB() guard let context = CGContext(data: pixelData, width: Int(width), height: Int(height), bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(resultPixelBuffer), space: rgbColorSpace, bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else { return nil } context.translateBy(x: 0, y: height) context.scaleBy(x: 1.0, y: -1.0) UIGraphicsPushContext(context) self.draw(in: CGRect(x: 0, y: 0, width: width, height: height)) UIGraphicsPopContext() CVPixelBufferUnlockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0)) return resultPixelBuffer } }

La taille d’image attendue pour les entrées que vous pouvez trouver dans le fichier mimodel:

Un projet de démonstration qui utilise à la fois des variantes CoreML et Vision pures se trouve ici: https://github.com/handsomecode/iOS11-Demos/tree/coreml_vision/CoreML/CoreMLDemo

Presen · Answer

Si l'entrée est UIImage, plutôt qu'une URL, et que vous souhaitez utiliser VNImageRequestHandler, vous pouvez utiliser CIImage.

func updateClassifications(for image: UIImage) { let orientation = CGImagePropertyOrientation(image.imageOrientation) guard let ciImage = CIImage(image: image) else { return } let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation) }

De Classification des images avec Vision et Core ML