Linear Regression(3) - ex.1

2020. 7. 28. 12:53

본 게시글은 필자가 강의, 책 등을 통해 개인적으로 학습한 것으로

본문의 모든 정보는 최하단 출처를 기반으로 작성되었습니다.

지금 까지 학습한 선형회귀를 실제로 구현해보자.

언어는 Python, 도구는 제목에 언급한것 처럼 Tensorflow(~~v1,~~ v2)를 사용한다.

tensorflow는 구글에서 배포한 오픈소스 라이브러리이며

Machine Learning구현을 도와준다.

tensorflow v2는 v1 보다 훨씬 간단하게 문제를 해결해 준다.

하지만 이 편리함이 어떻게 생겼는지 이해하지 못하면 스스로 학습 모델을 만들고

해결하는데 어려움을 겪을 것이므로 첫 예제를 통해 최대한 배경 지식을 이해하고 넘어가자.

이제 Linear Regression에 대한 기본적인 구현 방법을 쉬운 예제를 통해

빠르게 알아보자.

Data set이 다음과 같이 주어질 때 machine learning을 통해 예측선을 구해보려 한다.

x = [1,2,3,4,5,6]

y = [2,3,4,5,6,7]

전체 코드

import numpy as np
import tensorflow as tf

x_train = [1,2,3,4,5,6]
y_train = [2,3,4,5,6,7]

tf.model = tf.keras.Sequential()
# units == output shape, input_dim == input shape
tf.model.add(tf.keras.layers.Dense(units=1, input_dim=1))

sgd = tf.keras.optimizers.SGD(lr=0.01)  # SGD == standard gradient descendent, lr == learning rate
tf.model.compile(loss='mse', optimizer=sgd)  # mse == mean_squared_error, 1/m * sig (y'-y)^2

# prints summary of the model to the terminal
tf.model.summary()

# fit() executes training
tf.model.fit(x_train, y_train, epochs=200)

# predict() returns predicted value
y_predict = tf.model.predict(np.array([5, 4]))
print(y_predict)

우선, Keras는 tensorflow version이 2.x으로 바뀌면서 핵심 기능의 일부로 채택한 오픈소스 라이브러리 이다.

(https://ko.wikipedia.org/wiki/%EC%BC%80%EB%9D%BC%EC%8A%A4)

tf.model = tf.keras.Sequential()

# units == output shape, input_dim == input shape
tf.model.add(tf.keras.layers.Dense(units=1, input_dim=1))

Sequential model은 layer by layer 구조로 층을 추가하는 방식의 API이다.(https://keras.io/guides/sequential_model/)

일단은 우리의 학습을 도와줄 중간 다리라고 보면 된다.

이렇게 만든 Sequential model에 dense layer(=fully-connected layer : 전결합층)를 추가한다.

지금 다루고 있는 예제는 single-layer network이므로 위와 같은 하나의 layer만 사용한다.

이부분에서 모델의 layer, node 개수를 산정하는 기준이 궁금했는데 다음 링크들을 참고해보자.

https://machinelearningmastery.com/how-to-configure-the-number-of-layers-and-nodes-in-a-neural-network/

How to Configure the Number of Layers and Nodes in a Neural Network

Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network

machinelearningmastery.com

https://www.heatonresearch.com/2017/06/01/hidden-layers.html

The Number of Hidden Layers

This is a repost/update of previous content that discussed how to choose the number and structure of hidden layers for a neural network. I first wrote this material during the “pre-deep learning” era

www.heatonresearch.com

sgd = tf.keras.optimizers.SGD(lr=0.01)  # SGD == standard gradient descendent, lr == learning rate
tf.model.compile(loss='mse', optimizer=sgd)  # mse == mean_squared_error, 1/m * sig (y'-y)^2

keras의 optimizer들은 모델을 compile하는데 요구되는 algorithm 또는 method이다.

keras의 compile은 '학습 환경 설정'과 같다.

모델을 compile하기 위해서는

1. loss(cost) fuction

2. optimizer

3. metric(학습으로 확인 하려는 측정 결과를 설정 하는 것으로 추후에 Classification문제에 적용할 것이다.)

이렇게 세가지 파라미터를 입력 받을 수 있음을 알아두자.

현재 모델을 학습하기 위해 cost fuction으로 MSE, optimizer로 gradient descent를 사용함을 볼 수 있다.

# fit() executes training
tf.model.fit(x_train, y_train, epochs=200)

마지막으로 우리가 만든 model을 학습 시키는 부분이다.

input_training_set, output_training_set, batch size(가중치 계산 단위가 될 training set개수), epochs(총 학습 횟수)를

입력 받아 실행 된다.

(참고용)

Tensorflow version 1.x

import tensorflow as tf

# data set
x_set = [1,2,3,4,5,6]
y_set = [2,3,4,5,6,7]

# w(가중치), b(편항)
w = tf.Variable(tf.random_normal([1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

hypothesis = x_set * w + b

선언구이다.

우리가 구하고자 하는 값 $w, b$는 텐서플로우 변수로 지정한다.

이는 기존의 변수와는 조금 다르며 간단히 설명하면 학습으로 인해 지속적으로 변하는 값이라 생각하면 되겠다.

(random_normal([1])은 1차원 배열의 랜덤값을 생성한다고만 알아두자.)

cost = tf.reduce_mean(tf.square(hypothesis-y_set))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(cost)

cost function과 gradient descent algorithm을 tensorflow를 활용한 수식으로 변환한 것이다.

reduce_mean : 평균값 계산

square : 제곱

GradientDescentOptimizer : 경사하강법 알고리즘의 복잡한 수식을 대체해준다.(learning_rate은 학습의 조정값)

minimize : 원하는 값을 최소화한다.(구체적으로는 학습의 방향성을 이와 같이 한다는 뜻이다.)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

Session은 위에서 지정해온 노드들을 그래프로 실행하기 위한 class이다.

global_variables_initializer : 선언한 변수들을 사용하기 전에 초기화 해주는 것으로

학습 코드를 실행하기 전에 필수적으로 진행되어야 한다.

for step in range(2001):
    sess.run(train)
    if step % 200 == 0:
        print(step, sess.run(cost), sess.run(w), sess.run(b))

실질적인 학습이 발생하는 부분이다.

위에서 cost를 minimizing하는 train노드를 2000번 실행하며

200번 마다 한 번씩 학습에 따른 변수들의 변화를 볼 수 있도록 하였다.

실행 결과이다.

이렇게 가장 기본적인 Linear Regression 예제를 ML를 활용하여 해결해 보았다.

※Placeholder를 이용한 구현

data set을 Placeholder라는 저장 공간을 사용하여 선언한 뒤,

실행하는 당시에 원하는 set을 입력 할 수 있다.

import tensorflow as tf

X = tf.placeholder(tf.float32, shape=[None])
Y = tf.placeholder(tf.float32, shape=[None])
w = tf.Variable(tf.random_normal(([1])), name='weight')
b = tf.Variable(tf.random_normal(([1])), name='bias')

hypothesis = X*w + b
cost = tf.reduce_mean(tf.square(hypothesis-Y))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(cost)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for step in range(2001):
    cost_, w_, b_,train_ = sess.run([cost,w,b,train], feed_dict={X:[1,2,3,4,5,6], Y:[2,3,4,5,6,7]})
    if step % 200 == 0:
        print(step, cost_, w_, b_)

※경사하강법의 수학적 구현

# using gradient descent optimizer
"""
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(cost)
"""

# using gradient descent formula
learning_rate = 0.1
gradient = tf.reduce_mean((w * x_set - y_set) * x_set)
descent = w - learning_rate*gradient
update = w.assign(descent)

$W := W - \alpha\frac{1}{m}\sum_{i=1}^{m}(Wx_{i}-y_{i})x_{i}$

gradient descent 공식을 수식으로 풀어서 구현한 것이다.

cost에 대한 미분이 간단하기 때문에 가능하다고 생각하자.

참고 문헌 및 자료

1. Sung Kim Youtube channel : https://www.youtube.com/channel/UCML9R2ol-l0Ab9OXoNnr7Lw

Sung Kim

컴퓨터 소프트웨어와 딥러닝, 영어등 다양한 재미있는 이야기들을 나누는 곳입니다.

www.youtube.com

2. Andrew Ng Coursera class : https://www.coursera.org/learn/machine-learning

3. 조태호(2017). 모두의 딥러닝. 서울: 길벗

'IT study > 모두를 위한 딥러닝' 카테고리의 다른 글

Logistic Regression(1) (0)	2020.08.03
Reading csv Files in tensorflow(+ using google colab) (0)	2020.07.29
Linear Regression(4) - ex.2 (0)	2020.07.29
Linear Regression(2) - multi variable (0)	2020.07.27
Linear Regression(1) (0)	2020.07.22

작심삼일

Linear Regression(3) - ex.1

'IT study > 모두를 위한 딥러닝' 카테고리의 다른 글

+ Recent posts

티스토리툴바