Python

Python Chunks

當我們要把list分成好幾個chunk時的幾種做法 yield def chunks1(input_list, n): for i in range(0, len(input_list), n): yield input_list[i:i + n] input_list = [i for i in range(0, 15)] print(list(chunks1(input_list, 4))) ## [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14]] 一行for迴圈 input_list = [i for i in range(0, 15)] n = 3 output_list = [input_list[i:i+ n] for i in range(0, len(input_list), n)] print(output_list) ## [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14]] iterable 針對任何iterable from itertools import islice def chunks2(input_iter, n): input_list = iter(input_iter) return iter(lambda: tuple(islice(input_list, n)), ()) input_list = [i for i in range(0, 15)] n = 4 print(list(chunks2(input_list, n))) ## [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14)] Numpy import numpy as np input_list = [i for i in range(0, 15)] np.array_split(input_list, 5) ## [array([0, 1, 2]), ## array([3, 4, 5]), ## array([6, 7, 8]), ## array([ 9, 10, 11]), ## array([12, 13, 14])] 上述幾種簡單的方式皆可達成 ...

Python 爬取每日股價(2)

上篇文章Python 爬取每日股價(1)學會了找到所需資料和爬取的方法。接下來資料要儲存成xlsx格式。台灣證券交易所先安裝pandas和xlsxwriter pip install pandas pip install xlsxwriter 如果是colab，使用!pip install xlsxwriter 藉由上篇找到的資料位置"data9"，以及觀察到資料是根據每天做儲存。因此我們使用基於每天的資料處理方式，把所需要的股票資料、開盤價、收盤價等等存放。 import requests import pandas as pd from pprint import pprint as pprint date = "20210827" url = f"https://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date={date}&type=ALLBUT0999&_=1630244648174" res = requests.get(url) data = res.json() data_list = data["data9"] columns = data["fields9"] df = pd.DataFrame(data_list, columns=columns) writer = pd.ExcelWriter('twse_data.xlsx', engine='xlsxwriter') df.to_excel(writer, sheet_name=date, index=False) writer.save() # pprint(data_list) f-strings in Python PEP 498 打開儲存的"twse_data.xlsx" 每日收盤行情我們可以依靠改變日期獲得過去的資料，存成不同分頁或是檔案。也可以依據未來需要的使用資料方式來改變儲存格式。

How to scrape Yahoo Finance stock data with Python

This time, we are going to learn the hands-on ability to scrape Yahoo financial data. Set-up python environment. Yahoo Finance page. For example, Alphabet Inc. (GOOG). Scrape & parse the page. The page we will be scrape. In the “network” page, we can’t find the json data. You can get all the data from the page source in script. View page source - script - root.App.main from bs4 import BeautifulSoup import re import json import requests response = requests.get("https://finance.yahoo.com/quote/GOOG?p=GOOG&.tsrc=fin-srch") soup = BeautifulSoup(response.text, "html.parser") ## print(soup.prettify()) script = soup.find('script', text=re.compile('root.App.main')).text data = json.loads(re.search("root.App.main\\s+=\\s+({.*})", script).group(1)) stores = data["context"]["dispatcher"]["stores"] print(stores) Response data ...

Python 爬取每日股價(1)

如何取得每日的股價資訊進入證交所每日收盤行情，選擇全部(不含…)，可以看到有許多選項可點。找到每日收盤行情 110.08.27每日收盤行情點擊F12進入開發者環境，再點選Network，觀察我們要的數據資訊 Python及時股價點選XHR找到傳送數據的Requests import requests url = "https://www.twse.com.tw/exchangeReport/MI_INDEX?response=json&date=20210827&type=ALLBUT0999&_=1630244648174" res = requests.get(url) res.json() 得到Json，並在data9找到全部的股票數據 {'alignsStyle1': [['center', 'center', 'center', 'center', 'center', 'center'], ... ... 'data9': [['0050', '元大台灣50', '16,875,047', '9,673', '2,328,482,421', '136.70', '138.50', '136.45', '138.15', '+', '1.15', '138.15', '4', '138.20', '103', '0.00'], ['0051', '元大中型100', '17,810', '63', '1,003,448', '56.20', '56.60', '56.20', '56.60', '+', '0.35', '56.55', '1', '56.60', '9', '0.00'], ... ... 'subtitle9': '110年08月27日每日收盤行情(全部(不含權證、牛熊證))'} 幾個重要的數據 ['0050', #股票代號 '元大台灣50', '16,875,047', #成交股數 '9,673', '2,328,482,421', '136.70', #開盤價 '138.50', #最高價 '136.45', #最低價 '138.15', #收盤價 '+', '1.15', '138.15', '4', '138.20', '103', '0.00'], 輕鬆完成，在爬取過程中還是非常簡單的， ...

Deep Reinforcement learning

Reinforcement learning (RL) is a framework where agents learn to perform actions in an environment so as to maximize a reward. It’s actually training an AI to learn through every mistake and find the correct path without any label. The two main components are the environment and the agent. Deep Reinforcement learning (DRL) combined with deep learning technology is even more powerful. AlphaGo, is a typical application of deep reinforcement learning. ...

Tensorflow2 -- MNIST

Tensorflow2.X和1.X有多了很多差別和使用方式，今天用tf2來實作MNIST分類問題 MNIST MNIST是一個很標準的手寫數字分類問題，數據集下載有很多方式，這次直接使用tf API提供的 28 * 28 且只有黑白的數據開發在local 起 jupyter lab 先看看GPU是否啟用 %matplotlib widget import matplotlib.pyplot as plt import tensorflow as tf import numpy as np # check gpu tf.config.list_physical_devices('GPU') tf.test.is_built_with_cuda() # output True 方法一繼承 tf.keras.model class MLP(tf.keras.Model): def __init__(self): super().__init__() self.flatten = tf.keras.layers.Flatten() self.dense1 = tf.keras.layers.Dense(units=100, activation=tf.nn.relu) self.dense2 = tf.keras.layers.Dense(units=20, activation=tf.nn.leaky_relu) self.dense3 = tf.keras.layers.Dense(units=10) @tf.function def call(self, inputs): # [batch_size, 28, 28, 1] flat1 = self.flatten(inputs) # [batch_size, 784] dens1 = self.dense1(flat1) # [batch_size, 100] dens2 = self.dense2(dens1) # [batch_size, 20] dens3 = self.dense3(dens2) # [batch_size, 10] output = tf.nn.softmax(dens3) return output 使用tf.GradientTape訓練 # @tf.function def one_batch_step(X, y, **kwargs): with tf.GradientTape() as tape: y_pred = model(X) loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred) loss = tf.reduce_mean(loss) tf.print(f"{batch_index} loss {loss}", [loss]) with summary_writer.as_default(): tf.summary.scalar("loss", loss, step=batch_index) grads = tape.gradient(loss, model.variables) optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables)) for epoch_index in range(num_epochs): for batch_index in range(num_batches): X, y = data_loader.get_batch(batch_size) one_batch_step(X, y, batch_index=batch_index) with summary_writer.as_default(): tf.summary.trace_export(name="model_trace", step=0, profiler_outdir=log_dir) tf.saved_model.save(model, f"saved/{model_name}") 方法二使用keras Pipeline來疊每一層要用的函數，彈性較低，但非常適合簡單的Model ...

python 爬取及時股價

如何取得即時的股價資訊進入證交所提供的基本市況報導網站，右上方輸入股票代號，以2330為例。看到當日的最高、最低、成交價量和最佳五檔等等。此時在網頁上右鍵點選Inspect打開DevTools切換到Network欄位並觀察爬蟲頁面發現會一直get某個網址，名稱開頭是getStockInfo，應該就是我們要的資訊了。 import requests url = "https://mis.twse.com.tw/stock/api/getStockInfo.jsp?ex_ch=tse_2330.tw" res = requests.get(url) res.json() 得到一個排列整齊的json {'queryTime': {'stockInfoItem': 4329, 'sessionKey': 'tse_2330.tw_20200908|', 'sessionStr': 'UserSession', 'sysDate': '20200908', 'sessionFromTime': -1, 'stockInfo': 2084673, 'showChart': False, 'sessionLatestTime': -1, 'sysTime': '12:05:35'}, 'referer': '', 'rtmessage': 'OK', 'exKey': 'if_tse_2330.tw_zh-tw.null', 'msgArray': [{'n': '台積電', 'g': '281_174_260_166_385_', 'u': '468.5000', 'mt': '060262', 'o': '428.0000', 'ps': '593', 'tk0': '2330.tw_tse_20200908_B_9998775018', 'a': '430.5000_431.0000_431.5000_432.0000_432.5000_', 'tlong': '1599537930000', 't': '12:05:30', 'it': '12', 'ch': '2330.tw', 'b': '430.0000_429.5000_429.0000_428.5000_428.0000_', 'f': '143_239_162_400_391_', 'w': '383.5000', 'pz': '428.0000', 'l': '427.5000', 'c': '2330', 'v': '16843', 'd': '20200908', 'tv': '-', 'tk1': '2330.tw_tse_20200908_B_9998774678', 'ts': '0', 'nf': '台灣積體電路製造股份有限公司', 'y': '426.0000', 'p': '0', 'i': '24', 'ip': '0', 'z': '-', 's': '-', 'h': '433.0000', 'ex': 'tse'}], 'userDelay': 5000, 'rtcode': '0000', 'cachedAlive': 7891} 在爬取網址時，不要亂刪後面的query parameters，除非你確認過差別是甚麼。如果不能爬，Request Headers就是你要注意的地方。理解和實驗精神比較一下哪個是我們要的資訊。 u: 漲停 v: 跌停 z: 當盤成交價，有時候會沒有 s: 當盤成交量，有時候也會沒有；整理數據時可以根據z和s的有無來過濾。 a: 賣出最佳五檔價 f: 賣出最佳五檔量 l: 當日最低 h: 當日最高 ….. 其他參數可以再自行看看，如果今天你想專注於某支股票的狀態；例如盤中是否有大量，那麼只需重複get url取得json做判斷；若想要得到更多支當下股票資訊以及儲存就需要用到dataframe。下面給個盤中抓取多隻股價的方式。 ...

Before Data processing: ELT

Before ELT : ETL ETL stands for Extract, Transform, and Load. Historically, ETL has been the best and most reliable way to migrate data from one database to another. In addition to move data from one database to another, it also converts databases into a single format that can be utilized in the final point. Extract: Collecting data from different database. Sometimes using a staging table. Transform: It’s critical. Converting recently extracted data into the correct form so that it can be placed into another database. Sometimes there are other types of transformation involved in this step. Load: Load data into the target database or storage. ...

Python Comments

開發時加入註釋有助於描述思考過程，並幫助自己和其他人了解意圖，可以更輕鬆地發現錯誤、改進程式，以及在其他地方做更多應用。單行註釋加入註釋以 # 開頭， # defining the start code startCode = 50 也可加在程式碼後方，會被忽略， startCode = 50 # defining the start code 注意不要加入無用的描述，如同變數命名時不要取沒意義的名稱。多行註釋當要註釋的內容很多，或是撰寫文件、功能之類的，可以使用這種方式。 PEP8中建議單行不要超過79個字，一般情況則是會照公司或是團隊的開發習慣決定。多行#開頭， # PythonComments version 1.0.3 # -a (--all): show all features # -h (--help): show the help # ..... 或是用""" 包住 """ PythonComments version 1.0.3 -a (--all): show all features -h (--help): show the help ..... """

Python f-string

python 3.6後，字串多了個處理方法 PEP 498 – Literal String Interpolation 下面直接用例子來比較f-string和我們之前常用的 %-formatting、str.format()語法不同之處 >>> # %-formatting ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print("%s, test numbers are %s and %s" % (text, number1, number2)) Hello, test numbers are 10 and 20 >>> # str.format() ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print("{}, test numbers are {} and {}".format(text, number1, number2)) Hello, test numbers are 10 and 20 >>> print("{0}, test numbers are {2} and {1}".format(text, number1, number2)) Hello, test numbers are 20 and 10 >>> # f-string ... >>> text = "Hello" >>> number1 = 10 >>> number2 = 20 >>> print(f"{text}, test numbers are {number1} and {number2}") Hello, test numbers are 10 and 20 F-string 看起來更python了，也解決了之前會遇到的問題；例如使用 %時的參數限制等等。在變數變多的情況下更易讀也易改。嘗試做更多操作 >>> f"{3 + 8}" '11' >>> text = "Literal String Interpolation" >>> f"{text.upper()}" 'LITERAL STRING INTERPOLATION' >>> f"{1/3:.2f}" '0.33' 也可以放入lambda表達式。 ...