Распознавание текста из живого видеопотока с использованием набора ML (с CMSampleBuffer)

Question

Распознавание текста из живого видеопотока с использованием набора ML (с CMSampleBuffer)

Я пытаюсь изменить пример распознавания текста на устройстве, предоставленный Google, чтобы он работал с прямой трансляцией камеры.

При удерживании камеры над текстом (это работает с примером изображения), моя консоль выдает следующее в потоке, прежде чем в конечном итоге исчерпает память:

2018-05-16 10:48:22.129901+1200 TextRecognition[32138:5593533] An empty result returned from from GMVDetector for VisionTextDetector.

Это мой метод захвата видео:

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

        if let textDetector = self.textDetector {

            let visionImage = VisionImage(buffer: sampleBuffer)
            let metadata = VisionImageMetadata()
            metadata.orientation = .rightTop
            visionImage.metadata = metadata

            textDetector.detect(in: visionImage) { (features, error) in
                guard error == nil, let features = features, !features.isEmpty else {
                    // Error. You should also check the console for error messages.
                    // ...
                    return
                }

                // Recognized and extracted text
                print("Detected text has: \(features.count) blocks")
                // ...
            }

        }

    }

Это правильный способ сделать это?

6

ios swift firebase firebase-mlkit

Источник

user1969888 15 май '18 в 23:03

2 ответа

Решение

ML Kit все еще находится в процессе добавления примера кода для использования CMSampleBuffer в Firebase Quick Start.

Тем временем приведенный ниже код работает для CMSampleBuffer.

Установите AV Capture (используйте kCVPixelFormatType_32BGRA для kCVPixelBufferPixelFormatTypeKey):

@property(nonatomic, strong) AVCaptureSession *session;
@property(nonatomic, strong) AVCaptureVideoDataOutput *videoDataOutput;

- (void)setupVideoProcessing {
  self.videoDataOutput = [[AVCaptureVideoDataOutput alloc] init];
  NSDictionary *rgbOutputSettings = @{
      (__bridge NSString*)kCVPixelBufferPixelFormatTypeKey :  @(kCVPixelFormatType_32BGRA)
  };
  [self.videoDataOutput setVideoSettings:rgbOutputSettings];

  if (![self.session canAddOutput:self.videoDataOutput]) {
    [self cleanupVideoProcessing];
    NSLog(@"Failed to setup video output");
    return;
  }
  [self.videoDataOutput setAlwaysDiscardsLateVideoFrames:YES];
  [self.videoDataOutput setSampleBufferDelegate:self queue:self.videoDataOutputQueue];
  [self.session addOutput:self.videoDataOutput];
}

Поглотите CMSampleBuffer и запустите обнаружение:

- (void)runDetection:(AVCaptureOutput *)captureOutput
    didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
           fromConnection:(AVCaptureConnection *)connection {

  CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
  size_t imageWidth = CVPixelBufferGetWidth(imageBuffer);
  size_t imageHeight = CVPixelBufferGetHeight(imageBuffer);

  AVCaptureDevicePosition devicePosition = self.isUsingFrontCamera ? AVCaptureDevicePositionFront : AVCaptureDevicePositionBack;

  // Calculate the image orientation.
  UIDeviceOrientation deviceOrientation = [[UIDevice currentDevice] orientation];
  ImageOrientation orientation =
      [ImageUtility imageOrientationFromOrientation:deviceOrientation
                        withCaptureDevicePosition:devicePosition
                         defaultDeviceOrientation:[self deviceOrientationFromInterfaceOrientation]];
  // Invoke text detection.
  FIRVisionImage *image = [[FIRVisionImage alloc] initWithBuffer:sampleBuffer];
  FIRVisionImageMetadata *metadata = [[FIRVisionImageMetadata alloc] init];
  metadata.orientation = orientation;
  image.metadata = metadata;

  FIRVisionTextDetectionCallback callback =
      ^(NSArray<id<FIRVisionText>> *_Nullable features, NSError *_Nullable error) {
     ...
  };

 [self.textDetector detectInImage:image completion:callback];
}

Вспомогательная функция ImageUtility используется выше для определения ориентации:

+ (FIRVisionDetectorImageOrientation)imageOrientationFromOrientation:(UIDeviceOrientation)deviceOrientation
                             withCaptureDevicePosition:(AVCaptureDevicePosition)position
                              defaultDeviceOrientation:(UIDeviceOrientation)defaultOrientation {
  if (deviceOrientation == UIDeviceOrientationFaceDown ||
      deviceOrientation == UIDeviceOrientationFaceUp ||
      deviceOrientation == UIDeviceOrientationUnknown) {
    deviceOrientation = defaultOrientation;
  }
  FIRVisionDetectorImageOrientation orientation = FIRVisionDetectorImageOrientationTopLeft;
  switch (deviceOrientation) {
    case UIDeviceOrientationPortrait:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationLeftTop;
      } else {
        orientation = FIRVisionDetectorImageOrientationRightTop;
      }
      break;
    case UIDeviceOrientationLandscapeLeft:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationBottomLeft;
      } else {
        orientation = FIRVisionDetectorImageOrientationTopLeft;
      }
      break;
    case UIDeviceOrientationPortraitUpsideDown:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationRightBottom;
      } else {
        orientation = FIRVisionDetectorImageOrientationLeftBottom;
      }
      break;
    case UIDeviceOrientationLandscapeRight:
      if (position == AVCaptureDevicePositionFront) {
        orientation = FIRVisionDetectorImageOrientationTopRight;
      } else {
        orientation = FIRVisionDetectorImageOrientationBottomRight;
      }
      break;
    default:
      orientation = FIRVisionDetectorImageOrientationTopLeft;
      break;
  }

  return orientation;
}

2

Источник

user5827947 17 май '18 в 21:53

Другие вопросы по тегам ios swift firebase firebase-mlkit

user9797193 18 июн '18 в 22:32 2018-06-18 22:32 · Accepted Answer · 2018-06-18 22:32

Пример приложения Quick Start в Swift, показывающий, как выполнить распознавание текста из потока живого видео с помощью ML Kit (с CMSampleBuffer), теперь доступно здесь:

https://github.com/firebase/quickstart-ios/tree/master/mlvision/MLVisionExample

Прямая трансляция реализована в CameraViewController.swift:

https://github.com/firebase/quickstart-ios/blob/master/mlvision/MLVisionExample/CameraViewController.swift