Преобразование модели caffe в CoreML с использованием coremltools приводит к противоречивым предположениям
Я обучил модель, используя Caffe и ЦИФРЫ NVIDIA. Тестирование на DIGITS для следующих изображений приводит к следующему:
Когда я загружаю модель из DIGITS, я получаю snapshot_iter_24240.caffemodel
вместе с deploy.prototxt
, mean.binaryproto
а также labels.txt
, (а также solver.prototxt
а также train_val.prototxt
что я думаю не актуально)
я использую coremltools
чтобы преобразовать caffemodel в mlmodel, запустив следующее:
import coremltools
# Convert a caffe model to a classifier in Core ML
coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
image_input_names = 'data',
class_labels = 'labels.txt')
# Now save the model
Код выводит следующее:
(/anaconda/envs/coreml) bash-3.2$ python run.py
================= Starting Conversion from Caffe to CoreML ======================
Layer 0: Type: 'Input', Name: 'input'. Output(s): 'data'.
Ignoring batch size and retaining only the trailing 3 dimensions for conversion.
Layer 1: Type: 'Convolution', Name: 'conv1'. Input(s): 'data'. Output(s): 'conv1'.
Layer 2: Type: 'ReLU', Name: 'relu1'. Input(s): 'conv1'. Output(s): 'conv1'.
Layer 3: Type: 'LRN', Name: 'norm1'. Input(s): 'conv1'. Output(s): 'norm1'.
Layer 4: Type: 'Pooling', Name: 'pool1'. Input(s): 'norm1'. Output(s): 'pool1'.
Layer 5: Type: 'Convolution', Name: 'conv2'. Input(s): 'pool1'. Output(s): 'conv2'.
Layer 6: Type: 'ReLU', Name: 'relu2'. Input(s): 'conv2'. Output(s): 'conv2'.
Layer 7: Type: 'LRN', Name: 'norm2'. Input(s): 'conv2'. Output(s): 'norm2'.
Layer 8: Type: 'Pooling', Name: 'pool2'. Input(s): 'norm2'. Output(s): 'pool2'.
Layer 9: Type: 'Convolution', Name: 'conv3'. Input(s): 'pool2'. Output(s): 'conv3'.
Layer 10: Type: 'ReLU', Name: 'relu3'. Input(s): 'conv3'. Output(s): 'conv3'.
Layer 11: Type: 'Convolution', Name: 'conv4'. Input(s): 'conv3'. Output(s): 'conv4'.
Layer 12: Type: 'ReLU', Name: 'relu4'. Input(s): 'conv4'. Output(s): 'conv4'.
Layer 13: Type: 'Convolution', Name: 'conv5'. Input(s): 'conv4'. Output(s): 'conv5'.
Layer 14: Type: 'ReLU', Name: 'relu5'. Input(s): 'conv5'. Output(s): 'conv5'.
Layer 15: Type: 'Pooling', Name: 'pool5'. Input(s): 'conv5'. Output(s): 'pool5'.
Layer 16: Type: 'InnerProduct', Name: 'fc6'. Input(s): 'pool5'. Output(s): 'fc6'.
Layer 17: Type: 'ReLU', Name: 'relu6'. Input(s): 'fc6'. Output(s): 'fc6'.
Layer 18: Type: 'Dropout', Name: 'drop6'. Input(s): 'fc6'. Output(s): 'fc6'.
WARNING: Skipping training related layer 'drop6' of type 'Dropout'.
Layer 19: Type: 'InnerProduct', Name: 'fc7'. Input(s): 'fc6'. Output(s): 'fc7'.
Layer 20: Type: 'ReLU', Name: 'relu7'. Input(s): 'fc7'. Output(s): 'fc7'.
Layer 21: Type: 'Dropout', Name: 'drop7'. Input(s): 'fc7'. Output(s): 'fc7'.
WARNING: Skipping training related layer 'drop7' of type 'Dropout'.
Layer 22: Type: 'InnerProduct', Name: 'fc8_food'. Input(s): 'fc7'. Output(s): 'fc8_food'.
Layer 23: Type: 'Softmax', Name: 'prob'. Input(s): 'fc8_food'. Output(s): 'prob'.
================= Summary of the conversion: ===================================
Detected input(s) and shape(s) (ignoring batch size):
'data' : 3, 227, 227
Size of mean image: (H,W) = (256, 256) is greater than input image size: (H,W) = (227, 227). Mean image will be center cropped to match the input image dimensions.
Network Input name(s): 'data'.
Network Output name(s): 'prob'.
(/anaconda/envs/coreml) bash-3.2$
Примерно через 45 секунд food.mlmodel
генерируется. Я импортирую его в проект iOS с использованием Xcode Version 9.0 beta 3 (9M174d) и запускаю следующий код в одном проекте iOS.
// ViewController.swift
// SeeFood
// Создано Резой Ширазян 23.07.17.
// Copyright © 2017 Реза Ширазян. Все права защищены. //
import UIKit
import CoreML
import Vision
class ViewController: UIViewController {
override func viewDidLoad() {
var images = [CIImage]()
// guard let ciImage = CIImage(image: #imageLiteral(resourceName: "pizza")) else {
// fatalError("couldn't convert UIImage to CIImage")
// }
images.append(CIImage(image: #imageLiteral(resourceName: "pizza"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "spaghetti"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "burger"))!)
images.append(CIImage(image: #imageLiteral(resourceName: "sushi"))!)
images.forEach{detectScene(image: $0)}
// Do any additional setup after loading the view, typically from a nib.
override func didReceiveMemoryWarning() {
// Dispose of any resources that can be recreated.
func detectScene(image: CIImage) {
guard let model = try? VNCoreMLModel(for: food().model) else {
// Create a Vision request with completion handler
let request = VNCoreMLRequest(model: model) { [weak self] request, error in
guard let results = request.results as? [VNClassificationObservation],
let topResult = results.first else {
fatalError("unexpected result type from VNCoreMLRequest")
// Update UI on main queue
//let article = (self?.vowels.contains(topResult.identifier.first!))! ? "an" : "a"
DispatchQueue.main.async { [weak self] in
results.forEach({ (result) in
if Int(result.confidence * 100) > 1 {
print("\(Int(result.confidence * 100))% it's \(result.identifier)")
let handler = VNImageRequestHandler(ciImage: image)
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([request])
} catch {
который выводит следующее:
22% it's cup cakes
8% it's ice cream
5% it's falafel
5% it's macarons
3% it's churros
3% it's gyoza
3% it's donuts
2% it's tacos
2% it's cannoli
35% it's cup cakes
22% it's frozen yogurt
8% it's chocolate cake
7% it's chocolate mousse
6% it's ice cream
2% it's donuts
38% it's gyoza
7% it's falafel
6% it's tacos
4% it's hamburger
3% it's oysters
2% it's peking duck
2% it's hot dog
2% it's baby back ribs
2% it's cannoli
7% it's hamburger
6% it's pork chop
6% it's steak
6% it's peking duck
5% it's pho
5% it's prime rib
5% it's baby back ribs
4% it's mussels
4% it's grilled salmon
2% it's filet mignon
2% it's foie gras
2% it's pulled pork sandwich
это полностью отключено и несовместимо с тем, как модель работала на DIGITS. Я не уверен, что делаю неправильно или пропустил шаг. Я пытался создать модель без mean.binaryproto
но это не имело значения.
Если это поможет, вот deploy.prototxt
input: "data"
input_shape {
dim: 1
dim: 3
dim: 227
dim: 227
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.0
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.1
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.0
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.1
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.1
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
bias_filler {
type: "constant"
value: 0.1
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
bias_filler {
type: "constant"
value: 0.1
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
layer {
name: "fc8_food"
type: "InnerProduct"
bottom: "fc7"
top: "fc8_food"
param {
lr_mult: 1.0
decay_mult: 1.0
param {
lr_mult: 2.0
decay_mult: 0.0
inner_product_param {
num_output: 101
weight_filler {
type: "gaussian"
std: 0.01
bias_filler {
type: "constant"
value: 0.0
layer {
name: "prob"
type: "Softmax"
bottom: "fc8_food"
top: "prob"
Расхождение между прогнозами DIGITS с использованием CaffeModel и CoreML было связано с тем, что CoreML интерпретировал входные данные иначе, чем DIGITS. Изменение вызова на convert
со следующими параметрами решена проблема
coreml_model = coremltools.converters.caffe.convert(('snapshot_iter_24240.caffemodel',
image_input_names = 'data',
class_labels = 'labels.txt',
is_bgr=True, image_scale=255.)
99% it's spaghetti bolognese
73% it's pizza
10% it's lasagna
7% it's spaghetti bolognese
2% it's spaghetti carbonara
97% it's sushi
97% it's hamburger
В своем нынешнем виде coremltools имеет тенденцию изменять типы ввода / вывода и диапазоны значений для соответствия собственным внутренним оптимизациям. Я настоятельно рекомендую повторно импортировать ваш вновь созданный файл.mlmodel в код Python и проверить, какие типы данных он ожидает.
Например: он преобразует значения Int в Float (использует тип Double в Swift) и значения Bool в Int (True:1, False:0)