2002年Torch->2011年Torch7 Lua语言制约了Torch发展 PyTorch于2018.12发布正式版本,采用CAFFE2后端 本教程使用的是2019.5发布的1.1
Google:theano -> Tensorflow1 -> Tensorflow2 + Keras Facebook:Torch7 -> Caffe -> PyTorch + Caffe2 Amazon : MXNet Microsoft : CNTK Chainer tensorflow是静态图优先的而PyTorch是动态图优先的 总结: 对于研究员,推荐使用PyTorch 对于工程师,推荐使用TensorFlow2
Python 3.7 + Anaconda 5.3.1 CUDA 10.0 Pycharm Community
loss:x2 * sin(x) 导数:2xsinx + x2cosx x’ = x - α * f’(x) α:学习率,控制梯度下降速度,一般设置为0.0001或0.01 由于噪声的存在,loss可以转化为(W * X + b - y)2的形式,当此loss函数取得极小值时,参数W和b即为所求参数
线性回归的结果y∈R 逻辑回归由于加了激活函数的限制,将最终结果限制到了[0, 1]中,变为一个概率问题 分类问题特点:每个结果的概率加起来等于1 回归问题实验代码
import numpy as np def compute_error_for_line_given_points(b, w, points): totalError = 0 for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] totalError += (y - (w * x + b)) ** 2 return totalError / float(len(points)) def step_gradient(b_current, w_current, points, learningRate): b_gradient = 0 w_gradient = 0 N = float(len(points)) for i in range(0, len(points)): x = points[i, 0] y = points[i, 1] b_gradient += -(2/N) * (y - ((w_current * x) + b_current)) w_gradient += -(2 / N) * x * (y - ((w_current * x) + b_current)) new_b = b_current - (learningRate * b_gradient) new_w = w_current - (learningRate * w_gradient) return [new_b, new_w] def gradient_descent_runner(points, starting_b, startint_w, learning_rate, num_iterations): b = starting_b w = startint_w for i in range(num_iterations): b, w = step_gradient(b, w, np.array(points), learning_rate) return [b, w] def run(): points = np.genfromtxt("data.csv", delimiter=",") learning_rate = 0.0001 initial_b = 0 initial_w = 0 num_iterations = 1000 print("Staring gradient descent at b = {0}, w = {1}, error = {2}" .format(initial_b, initial_w, compute_error_for_line_given_points(initial_b, initial_w, points)) ) print("Running...") [b, w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations) print("After{0} iterations b = {1}, w = {2}, error = {3}". format(num_iterations, b, w, compute_error_for_line_given_points(b, w, points)) ) if __name__ == 'main': run()结果: Starting gradient descent at b = 0, w = 0, error = 5565.107834483211 Running… After 1000 iterations b = 0.08893651993741346, w = 1.4777440851894448, error = 112.61481011613473
部分数据:
MNIST数据集:0~9手写数字识别 · 每个数字有7k张图片 · 训练集60k,测试集10k pred结果采用one-hot编码 线性相关模型难以解决具有非线性特征的问题,故引入非线性因素激活函数ReLU,然后运用梯度下降法将代价函数最小化,对于每一个新的X(手写数字),送到pred进行前向运算,取概率值最大的label为预测值
H1 = relu(XW1 + b 1) H2 = relu(H1W2 + b 2) H1 = f(H2W3 + b 3) 步骤:
加载数据建立模型训练测试