TensorFlow

Tensorflow2 -- MNIST

Tensorflow2.X和1.X有多了很多差別和使用方式，今天用tf2來實作MNIST分類問題 MNIST MNIST是一個很標準的手寫數字分類問題，數據集下載有很多方式，這次直接使用tf API提供的 28 * 28 且只有黑白的數據開發在local 起 jupyter lab 先看看GPU是否啟用 %matplotlib widget import matplotlib.pyplot as plt import tensorflow as tf import numpy as np # check gpu tf.config.list_physical_devices('GPU') tf.test.is_built_with_cuda() # output True 方法一繼承 tf.keras.model class MLP(tf.keras.Model): def __init__(self): super().__init__() self.flatten = tf.keras.layers.Flatten() self.dense1 = tf.keras.layers.Dense(units=100, activation=tf.nn.relu) self.dense2 = tf.keras.layers.Dense(units=20, activation=tf.nn.leaky_relu) self.dense3 = tf.keras.layers.Dense(units=10) @tf.function def call(self, inputs): # [batch_size, 28, 28, 1] flat1 = self.flatten(inputs) # [batch_size, 784] dens1 = self.dense1(flat1) # [batch_size, 100] dens2 = self.dense2(dens1) # [batch_size, 20] dens3 = self.dense3(dens2) # [batch_size, 10] output = tf.nn.softmax(dens3) return output 使用tf.GradientTape訓練 # @tf.function def one_batch_step(X, y, **kwargs): with tf.GradientTape() as tape: y_pred = model(X) loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred) loss = tf.reduce_mean(loss) tf.print(f"{batch_index} loss {loss}", [loss]) with summary_writer.as_default(): tf.summary.scalar("loss", loss, step=batch_index) grads = tape.gradient(loss, model.variables) optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables)) for epoch_index in range(num_epochs): for batch_index in range(num_batches): X, y = data_loader.get_batch(batch_size) one_batch_step(X, y, batch_index=batch_index) with summary_writer.as_default(): tf.summary.trace_export(name="model_trace", step=0, profiler_outdir=log_dir) tf.saved_model.save(model, f"saved/{model_name}") 方法二使用keras Pipeline來疊每一層要用的函數，彈性較低，但非常適合簡單的Model ...

LSTM

原文網址想像人在每次思考、閱讀一段文章時，不是從零開始，會保留過去的記憶；RNN就是來解決這方面的問題。每次訓練時，會保留過去的訊息，然後一直傳遞下去。LSTM則是一種特殊的RNN形式。 The Problem of Long-Term Dependencies 很多情況下，會需要更多的上下文訊息，他們可能距離非常遠，這就會產生梯度消失，或是梯度爆炸。 LSTM LSTM，稱為長短期記憶網絡（Long Short Term Memory networks），是一種特殊的RNN。不同於RNN在每個Cell裡只包含一個tanh，LSTM增加了input gate, output gate 和 forget gate，都是用來控制我們要怎麼操作這些資料；使用sigmoid 可以看做是記憶、讀取資料量的多寡，0代表不通過，1代表全部通過。詳細推倒部分可以看看原文，他也介紹了GRU–一種更高效的LSTM。值得注意的，現今我們從RNN得到的好結果，幾乎都是指LSTM~ 在Tensorflow中，LSTM叫出來用就可以了。下圖是用紅虛線去學習黑線(x*sin(x))的結果

Kaggle Titanic

Kaggle The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy. 從題目可以知道，這是一個 binary classification 最初想到SVM和perception 從題目給的數據，選擇Decision Tree 或 Random Forest可能是比較合理的想法不過這邊我想用 Logistic Regression 來試試(sigmoid + cross entropy) 把訓練資料的內容全部都變成0-1的數字，剩下的就交給NN去解決因為我們最後一層的active function是sigmoid 為了避免梯度消失，因次在做cross entropy時把最大最小值定為0.00001和0.99999 做每次的訓練時才不會有Nan的問題 ...

TENSORFLOW 練習4: word2vec

把字詞轉成word embedding 要在字詞中找到他們之間的某種關聯，而不是分散無意義的符號代表做這個問題的概念是假設兩個不同句子中的詞上下文相同，則代表兩個詞的語意相同今天要來使用skip-gram模型，一個類似二分法的方式(像或著不像) 一開始也同之前的問題，先做數據處理 [(most count word1, n1),(second word2, n2)] 計算出現數量文字轉成向量 The actual code for this tutorial is very short ([the, code], actual), ([actual, for], code), … skip-gram pairs (actual, the), (actual, code), (code, actual), … 在這之間都會給他編號，變成像是 (10,20),(10,30),(30,10),(30,40),.. 的形式用上nce loss 我還不熟，大概是我們讓目標的機率越高越好，其餘K個數的機率很低，negative samples king - queen = man - woman ==> king - queen + woman = man 給queen加上負號，並取不要的值，我想是這種感覺吧?? 結果會把相似的詞分的近些原版 tensorflow 有用上sklearn的TSNE 來做降維，在很多地方都比PCA好，讀了以後可以來試試 My Github ...

Tensorflow 練習3: 'FizzBuzz'

Joel Grus – FizzBuzz in tensorflow 從網路上看到的幽默問題算是一個很有趣的使用，適合在做完 Classification 後輸入資料處理和原版程式碼一樣，因為還蠻直觀的 1 – 000000001 – [0 0 0 0 0 0 0 0 1] 2 – 000000010 – [0 0 0 0 0 0 0 1 0] ……… 輸出則是用[1 0 0 0] [0 1 0 0] [0 0 1 0] [0 0 0 1]來代表四個分類輸入輸出都是一個矩陣的形式利用兩層hidden layer 分別是 512和256 激勵函數選擇relu 剩下的就交給tensorflow分類結果一開始一直分不出來都會卡再把每個資料都判定成同一類(0.533) 後來減低每次訓練丟進去的量就OK了 (忘記一開始做分類時也只丟一點點進去) 卡在0.533代表他受非5非3倍數的值影響很大，畢竟是機率最高的地方也看成是local minimum，要跳出去就是使用batch 這是有加入0.8 dropout 的結果，可以看到訓練跟測試差不多，而且很快就達到1.0的準確率 ...

Tensorflow 練習2: CNN

利用CNN 來預測數字(MNIST) 輸入圖形是一個28*28 灰階09的數字輸出是一個1*10的矩陣，裡面代表的是我們預測09的數字流程是輸入–convolution–pooling–convolution–pooling–hidden layer–output 用到[None,xx,xx]和[-1,XX,xx]，代表著我們忽略輸入的大小，他會跟隨著輸入改變 max pooling 是表示我們選擇的是那個kernel size裡的最大值並且也加入dropout 避免overfitting 詳細分析參照 Tensorflow 結果上圖是沒有dropout下圖是有dropout 就這個例子而言差別不大，但還是看的出來上面的訓練會比測試好準確率落在97%~99%之間(1000次訓練) (用GradientDescent，其他的應該會更好) 代碼 My GitHub

Tensorflow 使用GPUs

Tensorflow 支援使用 CPU 和 GPU 做運算 "/cpu:0": The CPU of your machine. "/gpu:0": The GPU of your machine, if you have one. "/gpu:1": The second GPU of your machine, etc. 用 with tf.device() 來分配這個語句下使用的設備 log.device_placement = True 紀錄我們使用device的情況 allow_soft_placement = True 避免指定的device不存在，讓他能自行分配到存在且可運行的地方我沒有多顆CPU，其他的語法先不試

Tensorflow 練習1 : Polynomial Regression

使用 Tensorflow 分析 Regression 的基礎練習 Nerual network 分析二維四次多項式先定義輸入輸出格式，None表示我們不限制它的Row 在 Tensorflow 中要定義它是常數、變數，或是從外部輸入，必須要分別指定成 tf.constant() tf.Variable() tf.placeholder(),他才會是那個形式；而想使用Tensorflow 的任何內容，必須要用sess.run()去啟動它，不然會是Tensor的格式。其中sess = tf.Session() 定義一個Y = W*x +b 的線性方程，在隱藏層中利用activation function 去改變它。評估模型好壞常用有square error和cross_entropy，這裡利用square error計算loss。選擇基本的梯度下降並最小化loss；optimizer是個小於1的值。設定要訓練的數值和函數(記得要有一定的雜訊) W shape = (in_dim, hidden_units) = (10,1) predictions shape = (200,1)*(1,10)*(10,1) = (200,1) 訓練1000次每50次看結果：視覺化和數據化 placeholder 給資料會是一個字典的形式 Session.run(*****,feed_dict={a:a_data,b:b_data,…..}) 最後結果 My GitHub