Image Deep Learning 실무적용

Image Deep Learning 실무적용
WRITTEN BY YOUNGJAE KIM

개발 환경 설정
• git pull skp_edu_docker
• docker-compose up -d
• docker exec -it CONTAINER ID bash
• /run_vnc.sh
• chrome://apps
• pycharm file/settings/Project Interpreter/Add Local/”/opt/conda/bin/python3.5”

Deep Learning Process
• Data PreProcessing
• Training : Network 알고리즘
• Evaluation
• Service

Image Data 전처리
참고 : https://github.com/TensorMSA/tensormsa/blob/master/cluster/data/data_node_image.py

• Viewpoint variation(시점 변화). 객체의 단일 인스턴스는 카메라에 의해 시점이 달라질 수 있다.
• Scale variation(크기 변화). 비주얼 클래스는 대부분 그것들의 크기의 변화를 나타낸다(이미지의
크기뿐만 아니라 실제 세계에서의 크기까지 포함함).
• Deformation(변형). 많은 객체들은 고정된 형태가 없고, 극단적인 형태로 변형될 수 있다.
• Occlusion(폐색). 객체들은 전체가 보이지 않을 수 있다. 때로는 물체의 매우 적은 부분(매우 적은
픽셀)만이 보인다.
• Illumination conditions(조명 상태). 조명의 영향으로 픽셀 값이 변형된다.
• Background clutter(배경 분규류). 객체가 주변 환경에 섞여(blend) 알아보기 힘들게 된다.
• Intra-class variation(내부클래스의 다양성). 분류해야할 클래스는 범위가 큰 것들이 많다. 예를 들어
의자 의 경우, 매우 다양한 형태의 객체가 있다.
좋은 이미지 분류기는 각 클래스간의 감도를 유지하면서 동시에 이런 다양한 문제들에 대해 변함 없이
분류할 수 있는 성능을 유지해야 한다.

• ReSizing
• Labeling
• Store
• Augmentation

CIFAR-10
• http://www.cs.toronto.edu/~kriz/cifar.html

Data Store
Reasons to use HDF5:
• Simple format to read/write.
Reasons to use LMDB:
● LMDB uses memory-mapped files, giving much better I/O
performance.

Data Store : HDF5apt-get update apt-get install hdfview

Augmentation
● Random Cropping

Augmentation
● Horizontal Flip

Augmentation
● Vertical Flip

Augmentation
● Random Brightness

Augmentation
● Random Contrast

Image Data Deep Learning
적용이론

Image Classification Network History
출처 : http://laonple.blog.me
• LeNet-5
CNN 고전이라고 부를 수 있는 LeNet-5이다. LeCun 교수가 1998년에 만든 모델로 6개의 hidden
layer사용. 해당 네트워크 로 인해 CNN의 기본 개념이 많은 곳에 알려 졌다.
1. Input - 크기 32x32x1. 흑백 이미지. (필터 5x5, stride 1)
2. C1 - 크기 28x28x6. feature maps 6개. (필터 2x2, subsampling)
3. S2 - 크기 14x14x6. feature maps 6개. (필터 5x5, stride 1)
4. C3 - 크기 10x10x16. feature maps 16개
5. S4 - 크기 5x5x16. feature maps 16개.
6. C5 - FC(Full Connection )120개
7. F6 - FC(Full Connection) 84개
8. Output - GC(Gaussian Connections) 10개. (최종 분류)

• AlexNet
Alex Krizhevsky의 이름을 따서 작명된 AlexNet은 2012년 ILSVRC에서 압도적인 winner가 된
네트워크이다. 이 네트워크 이후 ZFNet, NIN, VGGNet, GoogLeNet, ResNet등 다양한 뉴럴넷 기반의
모델들이 ILSVRC 혹은 다른 데이터셋에서 outperform한 결과를 보이게 되는데, AlexNet은 이 돌풍을
열게 한 선두주자라 말할 수 있을 것이다.Vanishing gradient 문제를 해결하기 위해 sigmoid 혹은 tanh
대신 ReLU activation 함수를 사용하였다.

• VGGNet
VGGNet은 2014년 ILSVRC에서 GoogLeNet과 함께 높은 성능을 보인 네트워크이다. 또한 간단한 구조,
단일 네트워크에서 좋은 성능등을 이유로 여러 응용 분야에서 기본 네트워크로 많이 사용되고 있다.
VGGNet은 AlexNet과 마찬가지로 224x224의 이미지 크기를 입력으로 사용
lexNet은 11x11, s=4인 필터를 사용하였고 ZFNet은 7x7, s=2인 필터를 사용
VGGNet은 3x3, s=1 p=1 사용. 점점 작은 필터의 사용이 이루어짐.

• ResNet
MSRA에서 만든 ResNet은 2015년 ImageNet의 Classification, Detection, Localization 부문에서 모두
1위를 차지했으며, 매우매우 깊은 레이어를 자랑하는 네트워크이다

• ResNet
하지만 레이어를 깊게 쌓는 것이 항상 좋은 결과를 낼 까?
네트워크를 깊게 쌓으면 gradient vanishing/exploding 현상이 발생할 수 있기 때문에 네트워크는
학습의 초기 단계부터 saturated되어 버릴 우려가 있다. 하지만 이 문제는 Batch Normalization, Xavier
초기화 등을 이용하면 수십개의 레이어까지는 해결이 된 상태이다.
ResNet은 또한 네트워크가 더 깊어지면 degradation 이라 불리는 문제가 발생하게 되는데 이 문제를
해결하기 위해 deep residual learning 이라는 학습법으로 해결하여 무려 152개의 레이어를 쌓은 깊은
네트워크를 만들 수 있게 되었다.
deep residual learning

CNN : Convolutional Neural Network
• 기존 Multy-layerd Neural Network를 vision에 적용할때 문제점

•기존 Multy-layerd Neural Network를 vision에 적용할때 문제점
글자의 topology는 고려하지 않고,
말 그대로 raw data에 대해
직접적으로 처리를 하기 때문에
엄청나게 많은 학습 데이터를 필요로 하고,
또한 거기에 따른 학습 시간을
대가로 지불해하는 문제점이 있다.

출처 : http://operatingsystems.tistory.com

ReLu(x) = max(0, x)

max pooling

local feature → global feature
topology 변화에 무관한 항상성 확보

Filter

실제 분류를 위한

• hyper-parameter
1. Filter의 개수
convolution layer의 연산시간 = pixel의 수 X Filter의 수 X 각 Filter당 연산시간
→ 각 layer에서의 연산시간을 비교적 일정하게 유지하여 시스템의 균형을
맞춤
2. Filter의 크기
일반적으로 32X32나 28X28과 같은 작은 크기의 입력 영상에 대해서는 5X5
필터를 주로 사용 하지만 큰 크기의 영상을 처리할때는 11X11과 같은 큰
크기의 필터를 사용하기도 한다.
큰 크기 필터 1개 < 작은 크기 필터 여러개

• hyper-parameter
3. Stride 값
통상적으로 stride는 1로 하는 것이 좋다
4. Zero-padding 지원 여부
경계면의 정보까지 살릴 수 있어 Zero-padding을 지원하는 것이 결과가 좋다

TensorFlow Tutorials
• https://www.tensorflow.org/tutorials
• https://tensorflowkorea.gitbooks.io/tensorflow-kr/content/g3doc/tutorials
• https://github.com/tensorflow/models/tree/master/tutorials

cifar10.py Convolution, max pool

cifar10.py Fully Connected Layer

MNIST
• https://github.com/youngjaekim0129/DeepLearning_Tutorial/blob/cdebf
190d099b1562199f849824c4112a6a00387/jupyter/CNN/CNN.ipynb

Keras Resnet을 활용한 개발예제

Keras
• Keras Documentation : https://keras.io/
• Keras: The Python Deep Learning library
• Keras is a high-level neural networks API, written in Python and capable of
running on top of either TensorFlow, CNTK or Theano. It was developed with a
focus on enabling fast experimentation. Being able to go from idea to result with
the least possible delay is key to doing good research.

Keras Resnet을 활용한 개발예제
• https://github.com/raghakot/keras-resnet
깊은망의 문제점
• Vanishing/Exploding Gradient 문제 : CNN에서 파라미터 update를 할 때, gradient값이
너무 큰 값이나 작은 값으로 포화되어 더 이상 움직이지 않아 학습의 효과가 없어지거나
학습 속도가 아주 느려지는 문제가있다.
• 망이 깊어지게 되면, 파라미터의 수가 비례적으로 늘어나게 되어 overfitting의 문제가
아닐지라도 오히려 에러가 커지는 상황이 발생한다.

Residual Learning
목표 : H(x) → H(x) - x
F(x) = H(x) - x
H(x) = F(x) + x
shortcut은 파라미터가 없이 바로 연결되어 덧셈 추가
최적의 경우 F(x)가 거의 0
입력의 작은 움직임도 쉽게 검출
shortcut연결로 forward, backward path가 단순
→ 깊은 망도 쉽게 최적화, 늘어난 깊이로 정확도 개선

Keras API
• Model Save : keras.models.save_model(model, filepath)
→tf.contrib.keras.models.save_model(TF1.2)
•Model Load : keras.models.load_model(filepath)
→tf.contrib.keras.models.save_model(TF1.2)
•Predict : Model.predict(x)

Keras API
• Callbacks
• ModelCheckpoint : keras.callbacks.ModelCheckpoint(filepath,
monitor='val_loss', verbose=0, save_best_only=False,
save_weights_only=False, mode='auto', period=1)
→tf.contrib.keras.callbacks.ModelCheckpoint(TF1.2)
• Create a callback

AutoEncoder
입력과 출력의 차원이 같음
학습의 목표 : 출력을 가능한 입력에 가깝게 하는것
Encoding : 차원감소, 특징추출
unsupervised learning

Stacked AutoEncoder

Denoising AutoEncoder

k-Sparse AutoEncoder
• Hidden layer에서의 activation을 최대
k개로 제한하여 sparsity 조건을 적용
• 크키가 k번째 되는 뉴런까지는 그 결과를
그대로 사용하고, 나머지는 모두 0으로
만듬
• Back-propagation 시에도 activation이 0인
path는 무시가 되고, 0이 아닌쪽의
weight만 수정이됨

k-Sparse AutoEncoder
Local Feature
Best!
Global Feature

Convolutional AutoEncoder

AutoEncoder 개발예제
• https://github.com/golbin/TensorFlow-Tutorials/tree/master/06%20-
%20Autoencoder

YOLO
• You Only Live Once
• You Only Look Once
https://pjreddie.com/darknet/yolo/
Faster R-CNN
FPS : 7
mAP : 73.2
YOLO
FPS : 45
mAP : 63.4
출처 : https://m.blog.naver.com/sogangori

Faster R-CNN 구조

오브젝트 경계박스를 찾는 방식
Proposal 방식
grid 방식 : grid cell의 수 = Prosal의 수

YOLO Network 7 X 7 X 2 = 98개의 경계박스(굵기 : 확률)
해당 영역에서 제안한 경계박스 안의 오브젝트가 어떤 클래스인지

YOLO Network

YOLO Network
NMS(Non-maximal suppression)

YOLO Network
bb15
dog 0.2
cat 0.2
bike 0.3

YOLO 사용예제
• https://github.com/nilboy/tensorflow-yolo

GAN
(Generative Adversarial Network)

GAN(Generative Adversarial Network)

< 목표1>
생성자
Generator
최대한
진짜같은
모조품을
만든다
생성 Data
실제 Data
구별자
Discriminator
진품과
모조품을
구분해낸다
< 목표2>
경쟁적 발전

D는 실수할 확률을 낮추기(mini) 위해
노력하고 반대로 G는 D가 실수할
확률을
높이기(max) 위해 노력하는데,
이를 "minimax two-player game
or minimax problem"이라 할 수 있다.

검은 점선 : data distribution
파란 점선 : discriminator distribution x = G(z) D(x) = 1/2
녹색 선 : generative distribution

DCGAN(Deep Convolutional GAN) 예제
• https://github.com/carpedm20/DCGAN-tensorflow

Image Deep Learning 실무적용

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Image Deep Learning 실무적용

Similar to Image Deep Learning 실무적용 (20)

Image Deep Learning 실무적용