12/29/2019

Prediction Machines: The Simple Economics of Artificial Intelligence

Ajay Agrawal, Joshua Gans, and Avi Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence, Harvard Business Review Press, 2018.
Artificial intelligence does the seemingly impossible, magically bringing machines to life--driving cars, trading stocks, and teaching children. But facing the sea change that AI will bring can be paralyzing. How should companies set strategies, governments design policies, and people plan their lives for a world so different from what we know? In the face of such uncertainty, many analysts either cower in fear or predict an impossibly sunny future. 

12/13/2019

台灣數據百閱

Re-lab團隊 ,臺灣數據百閱:100個重要議題,從圖表開啟對話、培養公民思辨力,時報,2019
我們都有這種經驗,在與朋友聊天時,一下覺得台灣是亞洲的燈塔好自由好開放,一下又覺得亡國感很重、生存好難。每件事情,每個議題,一個社會有這麼多不同的想法和立場,有沒有一種更客觀、全面理解社會的方式,能讓我們知道台灣到底是好是壞。
  
我們無法記得所有事情,但數據可以,數據是我們客觀討論問題的依據。
  
過去政府一直有在統計各種數據,但這些資料不是四散各部門,就是留在過時的網頁深處。別說是看懂,光是要取得就困難重重,一般人根本沒時間和力氣去看。
  
於是 Re-lab X 資訊改造實驗室,10個人,用自己的業外時間,主動爬梳各種公部門資料,發揮設計的專長進行簡化和視覺化。為了避免立場偏頗,更找來多位專家一同審定,在有限的篇幅內,撰寫精闢的導讀,完成了一本以台灣數據為主題的書。
  
我們想做的,不是一本讓你拿著資料去打臉他人的書。而是希望透過數據,引發大家對議題的好奇,深刻去理解為什麼數據會這樣呈現,有哪些問題藏在數據的背面,進行更多的討論與思考。
  
這本書也不會是終點,而是我們改造公部門數據的里程碑,為了能讓計畫走得更遠。
  
歡迎支持《數據百閱》計畫:https://redesigninfo.cc/

12/08/2019

出國留學:專業與敬業

單單從專業與敬業的角度來看,就值得去 (美國排名前面的學校) 留學。也可以瞭解這個國家能夠持續地強盛的一些原因。 

博班的第一學期,上 Prof. Tapley 的課程 (Kalman filtering) 之前,學長就已經告訴我們,他是美國工程學院 (National Academy of Engineering) 院士 (類似臺灣的中研院院士),主持的 Center for Space Research 管理好 (十?) 幾顆人造衛星,很期待他會說一些有的沒有的;結果,上了一學期的課,講的就是上課的內容,一句題外話也沒有!有一回,一位應徵控制組教職的博士來演講,系上的教授幾乎都出席了,包括幾位美國工程學院院士;還有我們這些控制組的博士班學生。Prof. Tapley 坐在我的後面,他屬於軌道力學組 (Orbital Mechanics),演講途中,舉手問了二三個問題。因為屬於不同組別的關係,有些問題對於控制組博班的學生很顯而易見;但是,當時已經六十多歲的 Prof. Tapley 還是一直問,這種求知精神至今令人難忘。

這不禁令我想起朋友的一位老師,他是台灣非常有名的文人,第一堂課就說:「上我的課是妳們的造化。」至於面試新老師的過程,真的很慘且不專業,所以不打算在此說明 

11/08/2019

Nine Algorithms That Changed the Future (改變世界的九大演算法)


陳正芬譯改變世界的九大演算法:讓今日電腦無所不能的最強概念經濟新潮社2014
本書所介紹的九大演算法是:搜尋引擎的索引(search engine indexing)、網頁排序(page rank)、公鑰加密(public-key cryptography)、錯誤更正碼(error-correcting codes)、模式辨識(pattern recognition,如手寫辨識、聲音辨識、人臉辨識等等)、資料壓縮(data compression)、資料庫(databases)、數位簽章(digital signature),以及一種如果存在的話將會很了不起的偉大演算法,並探討電腦能力的極限。 
作者將我們日常生活會用到的電腦功能 背後的道理,以淺顯易懂的方式介紹,不具備資訊科學的背景也可以了解。而且令人驚喜的是,每一種演算法,都是一個解決問題的創意與線索,也讓我們得以一窺 近代數學家、資訊科學家的努力探索成果。面對越來越科技化的現代生活與職場挑戰,這些基本原理和概念值得我們去了解、吸收,為未來世界做好準備。

中鋼用 VR 傳授老師傅經驗和成本撙節

一位員工正坐著,頭戴VR頭盔,一手操作搖桿,一手空中觸按,相當忙碌。從一旁螢幕可以看到他眼中的模擬環境,面對的是煉鋼廠熔漿流動的高溫環境,熔漿溫度隨時處於1500度需要精神專注。 
位訓練員正在利用VR學習如何用轉爐傾倒出鋼液,這是一道需要豐富經驗且機器自動化不易學習的製程,目前多由老經驗師傅操作,但隨老師傅陸續退休,新進人力就必須加以訓練,但溫度跟轉爐等現場情境不容易搬到教學場景施做,這套虛擬實境平台正好解決痛點。 
「煉鋼廠中轉爐的傾鋼作業是一項危險,且需要純熟操作技術的工作,」工研院表示,第一階段先透過中鋼提供資訊,開發出轉爐傾鋼的SOP(標準作業流程)VR訓練內容,一旦操作錯誤軟體會立刻提出警告,而2020年中工研院將開發出突發狀況應變模擬課程,第三階段目標是多人互動教學,也就是說,讓老師傅也能在模擬情境裡頭指導。...

11/03/2019

盲人律師 (Invisible Justice)

“ 一位沒有訴訟經驗的盲人律師李政鴻,要幫職災勞工打跨國求償官司。 ” 
李政鴻(張哲豪 飾)不甘眼盲而只能在法律事務所內被指派做文書處理的工作,他極力爭取參與訴訟,卻總是碰壁。 
某日,他終於接到一件民事求償訴訟,但沒想到這案子的辯護律師,竟是他法律事務所的老闆趙定邦(班鐵翔 飾)。他若幫了原告,那又該如何面對他的老闆? 
而更難的是,這案子不是一般的官司,它竟是件原告人數高達531人的跨國集體訴訟。李政鴻連單人官司都沒打過,又是一個盲人,他該怎麼打贏官司?
呂柔其,家道中落、天生全盲,落榜12次仍立志當律師…全台首位盲人律師靠不服輸鬥志撂倒跨國財團,風傳媒,2019-10-21 

11/01/2019

10 Applications of Machine Learning in Finance

K.C. Cheung, 10 Applications of Machine Learning in Finance, October 30, 2019
Portfolio Management – Robo-Advisors
Algorithmic Trading
High-Frequency Trading (HFT)
Fraud Detection
Loan/ Insurance Underwriting
Risk Management
Chatbots
Document Analysis
Trade Settlements
Money-Laundering Prevention
Future Applications of Artificial Intelligence in Finance
Nice overview of the problems, present technologies and trends, and some companies. 

10/19/2019

人工智慧在台灣

陳昇瑋、溫怡玲人工智慧在台灣:產業轉型的契機與挑戰天下雜誌2019
本書作者陳昇瑋是台灣少數跨界產業的科學家,擁有學術與產業的深厚背景,同時也是熱情的AI技術傳教士與人才播種者,以跨域者獨有的視野,致力於推動人工智慧在各產業的深化應用及創新轉型,對於製造、金融、零售與醫療等產業應用尤有獨到之處。 
2017年接受中央研究院廖俊智院長與孔祥重院士的邀請,一同帶領團隊在半年內成功幫助超過十家台灣企業,以AI解決或改善影響發展的重大難題,協助產業在人工智慧技術及應用全面升級,也看見產業導入AI的系統性問題。 
人才、資料、找問題,缺一不可
與其擔憂被取代,我們需要主動了解,立即行動以形塑未來 
他透過在地化的實作與顧問經驗,為台灣而創設台灣人工智慧學校,一年內已為台灣培育超過3,000位AI人才,期能解決AI人才不足的關鍵問題,為台灣產業面對的下一個挑戰舖好基礎。

10/15/2019

美國商學院強大的原因

張忠謀在清華大學的演講提到了「台灣理工科與美國相差不大,但商學院卻比美國差很多」。就某方面來說,台灣的商學院其實不算太差,但是跟台灣的理工學院在國際間的等級相比,商學院真的很差。或許我們應該要先問的是,商學院是什麼?... 

10/14/2019

電腦的計算速度和線性代數

為了說明電腦的計算速度,特別設計了一個內積的問題,不到一秒可以計算一千萬組數字內積。推薦系統有多種方法,方法之一使用內積 (inner product);另外,現代電腦可以快速地計算線性代數的問題,無形中培養計算思維 (Computational thinking),也可以廣泛地應用在國高中的教學。

import time
import numpy as np
t = time.time() # 現在系統時間
a = np.random.rand(10**7) # 產生 10**7 亂數
b = np.random.rand(10**7)
np.dot(a,b) # 內積
print("Jobs done in:", time.time()-t, " seconds") # 現在系統時間 減去 初始系統時間

配合 Google Colab,解決軟硬體不足的問題。


10/10/2019

Network science reveals the secrets of the world’s best soccer team

Emerging Technology from the arXiv, Network science reveals the secrets of the world’s best soccer team, Oct 4, 2019.
One of the best soccer teams in history is widely acknowledged to be the Barcelona side that played during the 2009-10 season. Under the inspirational leadership of manager Pep Guardiola, this team won six major competitions including the Spanish football league, known as La Liga, and the UEFA Champions League, the most prestigious competition in world football. No other team has accumulated so many trophies in such a short period.... 

10/09/2019

最佳化方法在財務金融的應用

Gerard Cornuejols, Javier Peña, and Reha Tutuncu, Optimization Methods in Finance, Cambridge University Press, 2nd Edition, 2018.

使用的方法有 Linear programming、Nonlinear programming、Quadratic programming、Conic optimization、Integer programming、Dynamic programming、Stochastic programming、和 Robust optimization。

10/08/2019

麥肯錫解決問題的方法

麥肯錫公司(英語:McKinsey & Company,簡稱麥肯錫)為一所由芝加哥大學會計系教授詹姆斯·麥肯錫創立於芝加哥的管理諮詢公司,營運重點是為企業或政府的高層幹部獻策、針對龐雜的經營問題給予適當的解決方案,有「顧問界的高盛」之稱。
高杉尚孝著鄭舜瓏譯麥肯錫問題分析與解決技巧:為什麼他們問完問題,答案就跟著出現了?大是文化2019
一、發現問題時,先分類,而非究責
二、將問題轉化成具體課題:
三、找出能解決課題的各種替代方案:
四、接下來運用情境分析,評價替代方案:
五、選出「最適合」(未必最佳)的解決策略,並採取行動(貫徹執行力)。

10/06/2019

張忠謀 「總經理的學習」演講

要做好總經理的職位,張忠謀認為不要忽視業務行銷的重要性,擁有敏銳的市場嗅覺、帶領團隊往正確的方向走;也需具備凝聚團隊能力的本事。... 
人才學習要「斜槓」,才能與國際接軌 
一開場張忠謀就點出台灣目前在理工方面的技術與 MBA 能力與國際間的關係。他認為台灣在理工技術上的實力與國際間的名校如 MIT、哈佛等相差無幾,但是MBA(工商管理碩士)的素質卻是差距頗大,除了我國沒有世界級企業的進駐,少了可讓校園人才有學習跟銜接的機會外,他走遍史丹佛、哈佛甚至是台大、政大的 MBA 演講並開放提問,台灣在提問深度也是他覺得與國際一流學校人才差一截的主因。...

10/02/2019

Soft skills: the software developer's life manual

John Sonmez, Soft Skills: The software developer's life manual, Manning Publications, 2014.
For most software developers, coding is the fun part. The hard bits are dealing with clients, peers, and managers, staying productive, achieving financial security, keeping yourself in shape, and finding true love. This book is here to help. 
Soft Skills: The software developer's life manual is a guide to a well-rounded, satisfying life as a technology professional. In it, developer and life coach John Sonmez offers advice to developers on important "soft" subjects like career and productivity, personal finance and investing, and even fitness and relationships. Arranged as a collection of 71 short chapters, this fun-to-read book invites you to dip in wherever you like. A Taking Action section at the end of each chapter shows you how to get quick results. Soft Skills will help make you a better programmer, a more valuable employee, and a happier, healthier person.

為什麼瞭解 AI 機器人必須從倉庫自動化開始

Bastiane Huang為什麼瞭解AI機器人必須從倉庫自動化開始?吐納商業評論10/01/2019
平均來說,一輛汽車會有一到兩萬多個獨立零件。如果這樣聽起來已經很多、很複雜,想像一下,一般倉庫中通常有上百萬種商品、以及各式各樣的包裝。 
這樣的多樣化程度,為機器手臂的自動化應用增加了許多難度。如果使用傳統的機器視覺及程式設計,意味著必須事先登錄好上百萬種商品、並且編寫程式教導機器人對各個商品做不同的處理,不但曠日費時,而且幾乎是不可能的任務。 
然而這個以往看來不可能的任務,現在卻因為深度強化學習(Deep Reinforcement Learning,DRL)的出現而出現契機;因為DRL可以協助機器識別、應對周圍環境,並自主學習處理多樣的產品及工作內容。 
有了足夠的資料與練習,DRL機器人就能自學新能力、逐漸進步;就像我們的學習方式一樣,經過嘗試、或是他人示範,機器也可以學著識別影像、打贏電玩遊戲,或是像Deep Mind研發的Alpha Go Zero一樣,利用DRL自我學習,最終戰勝世界棋王。
每次的抓取和試驗,都使機器人變得更聰明、更善於掌握任務內容;此外,雲端連線的機器人還能相互學習交流。這樣的巨大轉變,使得機器人解決方案更加靈巧、彈性、而且有效率。... 
倉庫中需要完成的任務往往十分相似,而且訂單揀貨佔了大多數倉庫營運成本的40%以上,人工成本佔倉庫總預算高達70%;因此在亞馬遜(Amazon)等電子商務公司降低成本、追求快速到貨的推波助瀾下,零售商無一不設法追求倉庫自動化,而這也讓倉儲自動化成為AI機器人的應用案例首選。

10/01/2019

Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom

Louis Deslauriers, Logan S. McCarty, Kelly Miller, Kristina Callaghan, and Greg Kestin, Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom, PNAS September 24, 2019 116 (39) 19251-19257; first published September 4, 2019 https://doi.org/10.1073/pnas.1821936116.
Despite active learning being recognized as a superior method of instruction in the classroom, a major recent survey found that most college STEM instructors still choose traditional teaching methods. This article addresses the long-standing question of why students and faculty remain resistant to active learning. Comparing passive lectures with active learning using a randomized experimental approach and identical course materials, we find that students in the active classroom learn more, but they feel like they learn less. We show that this negative correlation is caused in part by the increased cognitive effort required during active learning. Faculty who adopt active learning are encouraged to intervene and address this misperception, and we describe a successful example of such an intervention.


梁祝二胡協奏曲

陳鋼、何占豪《梁祝》二胡協奏曲
指揮:鍾耀光
二胡:楊雪
臺北市立國樂團


9/30/2019

Robust Classification by Bertsimas, et al.

Dimitris Bertsimas, Jack Dunn, Colin Pawlowski, and Ying Daisy Zhuo, Robust ClassificationINFORMS Journal on Optimization, Vol. 1, No. 1, Winter 2019, pp. 2–34.
Motivated by the fact that there may be inaccuracies in features and labels of training data, we apply robust optimization techniques to study in a principled way the uncertainty in data features and labels in classification problems and obtain robust formulations for the three most widely used classification methods: support vector machines, logistic regression, and decision trees. We show that adding robustness does not materially change the complexity of the problem and that all robust counterparts can be solved in practical computational times. We demonstrate the advantage of these robust formulations over regularized and nominal methods in synthetic data experiments, and we show that our robust classification methods offer improved out-of-sample accuracy. Furthermore, we run large-scale computational experiments across a sample of 75 data sets from the University of California Irvine Machine Learning Repository and show that adding robustness to any of the three nonregularized classification methods improves the accuracy in the majority of the data sets. We observe the most significant gains for robust classification methods on high-dimensional and difficult classification problems, with an average improvement in out-of-sample accuracy of robust versus nominal problems of 5.3% for support vector machines, 4.0% for logistic regression, and 1.3% for decision trees.
Complement to the previous paper Optimal classification trees: Table 10. Solver Time for Selected University of California Irvine Data Sets in Seconds

9/28/2019

Optimal classification trees (最佳分類樹)

D. Bertsimas and J. Dunn, Optimal classification trees, Machine Learning, July 2017, Volume 106, Issue 7, pp 1039–1082.
State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. In the last 25 years, algorithmic advances in integer optimization coupled with hardware improvements have resulted in an astonishing 800 billion factor speedup in mixed-integer optimization (MIO). Motivated by this speedup, we present optimal classification trees (1), a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits. We also show the richness of this MIO formulation by adapting it to give optimal classification trees with hyperplanes (2) that generates optimal decision trees with multivariate splits. Synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. We comprehensively benchmark these methods on a sample of 53 datasets from the UCI machine learning repository. We establish that these MIO methods are practically solvable on real-world datasets with sizes in the 1000s, and give average absolute improvements in out-of-sample accuracy over CART of 1–2 and 3–5% for the univariate and multivariate cases, respectively. Furthermore, we identify that optimal classification trees are likely to outperform CART by 1.2–1.3% in situations where the CART accuracy is high and we have sufficient training data, while the multivariate version outperforms CART by 4–7% when the CART accuracy or dimension of the dataset is low.

9/17/2019

The ML Test Score by Google

Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D. Sculley, The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction, Proceedings of IEEE Big Data, 2017.
Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt.
Hidden Technical Debt in Machine Learning Systems 的延續。分為  feature tests、model testsML infrastructure tests、和 production monitoring,並訪問了 36 個 Google 團隊,瞭解四個面向的執行程度

9/13/2019

AI transforming the enterprise by KPMG

Steve Hill, AI transforming the enterprise, KPMG, 2019. (四大pdf file)
We conducted the KPMG 2019 Enterprise AI Adoption Study to gain insight into the state of AI and automation deployment efforts at select large cap companies. This involved in-depth interviews with senior leaders at 30 of the world’s largest companies, as well as secondary research on job postings and media coverage. These 30 highly influential, Global 500 companies represent significant global economic value – collectively, they employ approximately 6.2 million people, with aggregate revenues of $3 trillion. Together, they also represent a significant component of the AI market.
The trends

  1. Rapid shift from experimental to applied technology
  2. Automation, AI, analytics and low-code platforms are converging
  3. Enterprise demand is growing
  4. New organizational capabilities are critical
  5. Internal governance emerging as key area
  6. The need to manage AI
  7. Rise of AI-as-a-service
  8. AI could shift the competitive landscape

9/11/2019

Python 初學者的好用工具 Google Colab

少數派,推薦Python初學者的好用工具:Google Colab,2019.03.20

上課使用的檔案 lec08 MNIST-GPU.ipynb,執行環境 GPU: NVIDIA GTX 1070, RAM 24450 MB, Win10 64 bits,我將之上傳到 Google drive

因為沒有上傳相關的圖檔,所以無法執行Image(filename='data/05-Chollet-MNIST-sample.jpg') 和 Image(filename='data/05-MNIST.png');設定的方法請參考新檔執行環境可以選 TPU,初始化需要點時間;可以比較本機執行和 Google cloud TPU 的運算時間 

9/10/2019

注定一戰?中美能否避免修昔底德陷阱

包淳亮注定一戰?中美能否避免修昔底德陷阱八旗文化 2018
Allison Graham, Destined for War: Can America and China Escape Thucydides’s Trap?, Mariner Books, 2018
◎從古希臘到美蘇冷戰,從兩千年人類戰爭史出發,預測美、中國不安的未來! 
  西元前五世紀的希臘史學家修昔底德記錄了摧毀整個希臘世界的「伯羅奔尼撒戰爭」,他將戰爭的起因總結為「雅典的崛起,以及斯巴達揮之不去的恐懼,使戰爭不可避免」。本書作者格雷厄姆・艾利森把當時斯巴達與雅典面臨的困境稱之為「修昔底德陷阱」:在原本的權力平衡面臨改變時,既有的統治強權可能為了捍衛地位而出手訓誡、扼殺後起的挑戰者,挑戰者也可能不甘屈居人下、試圖改變遊戲規則而「問鼎中原」。在過去500年中,崛起強權挑戰統治強權的案例有16起,其中12起爆發戰爭。「修昔底德陷阱」像幽魂一再地將大國推向毀滅的深淵。俾斯麥在普法戰爭中挑戰歐陸霸主法國,德皇威廉二世在一戰中挑戰英國海軍,日本自認應該享有平等的尊嚴而發動日俄戰爭,又因恐懼美國的經濟封鎖扼殺它的發展而襲擊珍珠港。種種盲目不理智的行為,都可以透過「修昔底德陷阱」得到解釋。 
  ◎南海衝突、台灣獨立、網路攻擊、北韓崩潰、貿易戰爭……誰將引爆美中大戰,又該如何避免? 
  21世紀初的中國與美國恰恰再度落入「修昔底德陷阱」的模式,彷彿難逃「注定一戰」。中國的飛速崛起為二戰後美國主導的國際秩序與美國的軍事霸權構成嚴重挑戰。二戰後的美國占全球經濟的50%,如今已下滑至16%。同一時期,中國的比例從1980年的2%飆升至2016年的18%。雪上加霜的是,標榜「中國夢」的習近平與「美國第一」的川普不僅都誓言恢復國家的偉大光榮,也都認為對方是實現目標的障礙。沒有另外兩個領導者比習、川更可能把美中帶向戰爭。 
  格雷厄姆・艾利森是全球知名的國際關係學者,憑1970年代對古巴飛彈危機的深刻研究奠定其不可動搖的大師地位。他透過對歷代戰爭提綱挈領地分析建立了「修昔底德陷阱」的理論基礎,並以此預測美中爆發衝突的各種可能途徑,在書中列舉了5種爆發戰爭的可能,以及12條趨吉避凶的和平線索,並針對美國政府提出懇切建言,一方面呼籲美國嚴肅看待中國崛起的事實與恢復民族光榮的決心,一方面諄諄勸誡美國外交決策圈應重拾美蘇冷戰時代的宏觀戰略思維,以面對從所未見的安全威脅。本書未上市就已在全球政治、學術、新聞界造成轟動,《注定一戰?》成為所有關心美中未來人士的話題。連習近平都親自表示:「我們都應該努力避免陷入修昔底德陷阱!」

9/06/2019

Globalization in transition: The future of trade and value chains

Susan Lund, James Manyika, Jonathan Woetzel, Jacques Bughin, Mekala Krishnan, Jeongmin Seong, and Mac Muir, Globalization in transition: The future of trade and value chains, McKinsey Global Institute, January 2019.
Although output and trade continue to increase in absolute terms, trade intensity (that is, the share of output that is traded) is declining within almost every goods-producing value chain. Flows of services and data now play a much bigger role in tying the global economy together. Not only is trade in services growing faster than trade in goods, but services are creating value far beyond what national accounts measure. Using alternative measures, we find that services already constitute more value in global trade than goods. In addition, all global value chains are becoming more knowledge-intensive. Low-skill labor is becoming less important as factor of production. Contrary to popular perception, only about 18 percent of global goods trade is now driven by labor-cost arbitrage. 

Learning Scheduling Algorithms for Data Processing Clusters

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, Mohammad Alizadeh, Learning Scheduling Algorithms for Data Processing Clusters, SIGCOMM '19 Proceedings of the ACM Special Interest Group on Data Communication, Pages 270-288. 
Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load.
Codes and more information. 

9/03/2019

Food Discovery with Uber Eats

Ferras Hamad, Isaac Liu, and Xian Xing Zhang, Food Discovery with Uber Eats: Building a Query Understanding Engine, Uber, June 10, 2018
Choice is fundamental to the Uber Eats experience. At any given location, there could be thousands of restaurants and even more individual menu items for an eater to choose from. Many factors can influence their choice. For example, the time of day, their cuisine preference, and current mood can all play a role. At Uber Eats, we strive to help eaters find the exact food they want as effortlessly as possible....

8/29/2019

Does democracy stifle economic growth? (民主會窒礙經濟增長嗎 ?)

Yasheng Huang (黃亞生), Does democracy stifle economic growth? (民主會窒礙經濟增長嗎 ?), TEDGlobal 2011, July 2011

當AI變同事

天下雜誌,680 期,2019-08-28


封面故事
波及663萬職場人口的大海嘯
當AI變同事
文—鍾張涵 研究—施逸筠 

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Yuanzhi Li, Tengyu Ma and Hongyang Zhang, Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations, COLT 18, Best Paper Award.
We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.  
Concretely, we show that given ˜O(dr2) random linear measurements of a rank r positive semidefinite matrix X⋆, we can recover X⋆ by parameterizing it by UU⊤ with U ∈ Rd×d and minimizing the squared loss, even if r ≪ d. We prove that starting from a small initialization, gradient descent recovers X⋆ in ˜O(√r) iterations approximately. The results solve the conjecture of Gunasekar et al. [16] under the restricted isometry property. 
The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

GOLF 學用接軌聯盟

顏和正,科技大老不談生意 打一場沒有輸家、只有贏家的「GOLF」,天下雜誌,680 期,2019-08-27
這不是場慈善球賽,而是友達、緯創資通與仁寶電腦3家科技大廠,在去年共同成立的「學用接軌聯盟」GOLF(Gap of Learning & Field)。這個提供學生線上專業課程先修、線下實習機會的O2O平台,整合了全台21家企業與42所大學,就是為了創造產學接軌、青年培力、擴大企業人才庫的「三贏」。 
「跟打高爾夫球無關,取名是希望凸顯『產學接軌』的概念。企業需求在平台上展現,學校與學生會對企業需求有預先認知,課程設計與未來就業比較容易,」彭双浪指出這場「產學高爾夫」的贏家,就是學校、學生與企業。

8/23/2019

自駕車革命

Hod Lipson and Melba Kurman, Driverless: Intelligent Cars and the Road Ahead, MIT Press, 2016.
 近年來,自動駕駛成為各大車廠、科技巨頭競逐的領域,從半自駕(先進輔助駕駛)到全自駕(完全無人駕駛),應用的科技包括傳感技術、機器人學、機器知覺、機器學習、人工智慧、演算法和智慧型運輸系統等等,原本在學術領域的知識逐漸實用化、商品化。

8/20/2019

Automated Machine Learning: Methods, Systems, Challenges

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, Editors, Automated Machine Learning: Methods, Systems, Challenges, The Springer Series on Challenges in Machine Learning, 2019.
This open access book presents the first comprehensive overview of general methods in Automated Machine Learning (AutoML), collects descriptions of existing systems based on these methods, and discusses the first series of international challenges of AutoML systems. The recent success of commercial ML applications and the rapid growth of the field has created a high demand for off-the-shelf ML methods that can be used easily and without expert knowledge. However, many of the recent machine learning successes crucially rely on human experts, who manually select appropriate ML architectures (deep learning architectures or more traditional ML workflows) and their hyperparameters. To overcome this problem, the field of AutoML targets a progressive automation of machine learning, based on principles from optimization and machine learning itself. This book serves as a point of entry into this quickly-developing field for researchers and advanced students alike, as well as providing a reference for practitioners aiming to use AutoML in their work. 

信用小白2招讓銀行給高分

據世界銀行2018年統計,台灣有47.9%的人未使用過信用卡或消費性信用貸款;對於「信用小白」如何培養、提升信用分數,專家建議除透過與銀行建立長期往來關係外,切記不要多次向不同銀行申請信用卡或貸款。 
所謂的「信用小白」指的就是沒有申辦過信用卡或貸款,以致於無法提供銀行任何信用紀錄的人;民營銀行消費金融主管表示,因銀行要放款或是提供優惠利率,都需要預先評估客戶的償債能力,除了穩定的收入來源外,還有提供還款紀錄作為參考。 
「信用小白」該如何培養、提升信用分數,Money101台灣董事總經理周純如於新聞稿中表示,首先,可以透過信用卡與銀行建立持續性的往來關係,當持有信用卡超過3個月,且都有刷卡消費,就會產生信用交易。不過,更重要的是,按時將卡費在期限內全額繳清,就能建立良好的信用軌跡。 
再者,周純如表示,因個人信用評分從200分到800分,影響評分的因素包含繳款行為、信用擴張程度、負債型態風險高低、信用歷史長度以及新業務的申請狀態共5項;若向銀行申請金融商品,卻無法100%確定銀行是否會審核通過以及適用利率,也不要多次向不同金融機構申請信用卡、貸款,反而降低了自己的信用評分。

8/19/2019

Best Subset Selection via a Modern Optimization Lens

Dimitris Bertsimas, Angela King, and Rahul Mazumder, Best Subset Selection via a Modern Optimization Lens, Annals of Statistics, 2016, Vol. 44, No. 2, 813–852.
In the period 1991–2015, algorithmic advances in Mixed Integer Optimization (MIO) coupled with hardware improvements have resulted in an astonishing 450 billion factor speedup in solving MIO problems. We present a MIO approach for solving the classical best subset selection problem of choosing k out of p features in linear regression given n observations. We develop a discrete extension of modern first-order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with n in the 1000s and p in the 100s in minutes to provable optimality, and finds near optimal solutions for n in the 100s and p in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than Lasso and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power.

8/15/2019

Scale AI 的資料標誌

一般訓練 AI 模型時,需要使用大量標記過的影像來訓練,總量動輒數十萬或百萬張,多半會由多位標誌師依據一套判斷規則分工標記。全部標記完後,再由一個人複查,找出符合標準的影像(Ground Truth 版資料),並用來來訓練 AI 模型。 
Scale AI 才創立三年,Google 子公司、通用汽車都是客戶 
看準這波 AI 模型訓練的需求,年僅 22 歲的 Alexandr Wang 在 2016 年成立一 Scale AI Inc.。致力於優化照片標記的過程。其所建構一套軟體系統,會先對圖像進行潤色、標記,如果無法辨識,再將此交給外包的資料標誌師處理。大幅縮短後續的作業時間。... 

8/11/2019

Visa Is Combatting Fraud at Nearly the Speed of Light

By using artificial intelligence (AI), Visa Inc. helped issuers prevent an estimated $25 billion in annual fraud, the company announced on June 17. The company accomplished this using Visa Advanced Authorization (VAA), a comprehensive risk management tool that monitors transaction authorization on the Visa global network, VisaNet, in real time. 

8/09/2019

針織橫編機的編程設計

談到從接單到生產的流程,蘇瀅歡表示,自己會根據訂單的急迫性,以及編程的複雜度來排程,再開始編程。編程結束後,進入調機、織造樣衣的環節,若樣衣出現瑕疵,就必須修正程式碼或調整織造參數,再重新織造,整個流程可能要重複2、3次,將花費3天到1周,接著寄送給客戶確認,再持續修正直到達成共識。 

8/07/2019

「3D列印」4個突破正徹底瓦解造物規則

積層製造跟傳統製造的差別細節在哪呢?帶你一次了解箇中差異。
1. 多樣化的材料選擇
2. 設計製造快速結合
3. 可以做更精細產品
4. 減少原料浪費

8/06/2019

國家對不起你們:高危險森林護管員月薪25K至33K

森林護管員的工作範圍除了查緝盜伐、跟山老鼠拚鬥,還得背上40公斤重的背包,徒步上山打火,搶救森林火災,每次任務超過5天或10天是家常便飯。工作內容甚至涵蓋育苗造林,任務繁雜,且危險度高。 
然而,衝在第一線保護台灣的山林資產,每個月卻只有2萬5000元到3萬3000元的薪水,「是國家對不起你們,」農委會主委陳吉仲昨日出席「森林護管員論壇」時公開表示,即使今年3月行政院核定通過「山地巡護作業費」,提高每月待遇3000元到7000元,仍與工作危險程度不成正比。並進一步承諾在年底前,要提交更完整的福利制度給行政院。 
「我來林務局真的是出生入死,」從事護管員工作已有14年經驗的屏東林區管理處森林護管員吳國禎,長期查緝林木盜伐,回憶有起一次,遇上持有武器的山老鼠,「對方拿著獵槍開了30幾槍。怎麼辦?」 他說,當時只有抱頭逃竄的份。經歷槍林彈雨的場面,才想起長官交代出勤時要帶上防彈背心,至今仍感到心驚膽顫。
科技可以解決部分問題,可參考無人機在人道主義救濟中的應用。 

8/03/2019

How To Improve Supply Chains With Machine Learning: 10 Proven Ways

Machine learning-based algorithms are the foundation of the next generation of logistics technologies, with the most significant gains being made with advanced resource scheduling systems. 
The wide variation in data sets generated from the Internet of Things (IoT) sensors, telematics, intelligent transport systems, and traffic data have the potential to deliver the most value to improving supply chains by using machine learning.  

電梯的預測性維護

王茜穎,電梯界的「關鍵報告」:在故障發生前,先下手為強,若水 Flow,2019/7
他們替電梯裝上感應器,收集即時資料,連上雲端,運用IBM Watson的物聯網平台和預測性維護軟體進行機器學習,比對數據資料庫裡的技術文件和維修記錄,尋找相關性,並從數據趨勢中建立預測模型,以求洞燭先機,預見哪台電梯何時可能故障,並在故障發生前,先下手為強,堪稱電梯界的「關鍵報告」。 
趟數十秒的電梯旅行,通力收集了起迄時間、開/關門時間、停止/加速、里程、溫度、噪音、震動、濕度、氣壓、燈光和用電量等超過200項即時資料。... 

8/02/2019

手機廠自製晶片

美國時間7月25日,英特爾(Intel)宣布將手機基頻晶片部門以10億美元賣給蘋果公司,這是蘋果「自己的手機晶片自己做」策略中,一塊重要拼圖。 
少有人注意的是,今年6月一則消息指出,三星正默默切入手機晶片銷售生意,除了供自家品牌使用,三星並已送樣給中國第二、第三大手機品牌Oppo與Vivo。IDC資深研究經理高鴻翔推估,最晚明年上半年,將看到Vivo推出內建三星晶片的手機。 
三星、蘋果這兩大手機廠,一家的晶片自製比重越來越高;另一家則不只自製晶片,還能外賣。根據半導體產業協會(SIA)統計,以手機為主的通訊應用,去年占了全球半導體總產值的32.4%,為最大宗需求。換句話說,當半導體業的最大客戶減少外購或自己做起晶片生意來,反映出半導體生態圈已產生關鍵新變化。

7/24/2019

識破萬臉偽裝靠女神卡卡濃妝照,臺大打敗美中俄奪全球第一

訓練設計上,分兩階段來完成,第一階段,先以其他資料集抽取人臉特徵,先訓練出一個深度卷積神經網路(DCNN)的模型,可以用來辨識人臉;接下來,再以大會提供的偽裝人臉資料,額外建立一個神經網路區塊,專門用於學習偽裝人臉的辨識。徐宏民進一步補充,設計網路時,不只使用深度學習技術,在進行特徵擷取時,也結合了早期的機器學習的 PCA(主成分分析)技術,將這些人臉資料先經轉換投射到一個新的特徵空間,來學習這些偽臉的主要特徵,再用學到的這些人臉結構、特徵去比對,找出眼前這個人是本人偽裝,還是別人假扮。經過他們訓練過的偽臉演算模型,連化濃妝後的 Lady Gaga 都能認。 
對於這次比賽成果,徐宏民表示,這更證明了臺灣人臉辨識研究實力,完全不輸國外頂尖大學,「甚至後來,遇到主辦者,連他都很好奇我們怎麼做,可以做到這麼高的辨識率。」他自豪地說。

AI 決勝關鍵在於晶片

工業技術與資訊月刊,AI 決勝關鍵在於晶片,2019-05-20  
AI運作約略可粗分為兩階段,分別是「學習」和「推論」,前者透過機器學習技術,利用大量樣本數據對演算法進行訓練;後者則執行演算法,在終端應用解讀現實的數據。 
觀諸國際大廠布局,雲端運算使用的CPU、GPU晶片已被國際大廠把持。吳志毅認為,台灣不一定要搶大廠擅長的高效能運算晶片設計市場,加上國際大廠未來也會借重台積電先進製程,突顯台積電在這場AI戰爭中的重要地位。此外,Google資料中心也採用不少台灣廠商元件和產品,台灣亦具發展優勢。 
吳志毅指出,台灣若要切入AI產業,潛在機會在於邊緣運算。隨著AI技術日趨演進,AI由雲端走向裝置端已成必然趨勢,裝置端AI的主要關鍵在於擁有高效能的AI晶片。「台灣在晶片、終端設備與系統具有優勢,也擁有高度的靈活度和彈性,如能配合軟體產業,就有很大的發展空間。」...

Solving the Rubik's Cube with Approximate Policy Iteration

Stephen McAleer, Forest Agostinelli, Alexander Shmakov, Pierre Baldi, Solving the Rubik's Cube with Approximate Policy Iteration, ICLR, 2019.
Recently, Approximate Policy Iteration (API) algorithms have achieved superhuman proficiency in two-player zero-sum games such as Go, Chess, and Shogi without human data. These API algorithms iterate between two policies: a slow policy (tree search), and a fast policy (a neural network). In these two-player games, a reward is always received at the end of the game. However, the Rubik’s Cube has only a single solved state, and episodes are not guaranteed to terminate. This poses a major problem for these API algorithms since they rely on the reward received at the end of the game. We introduce Autodidactic Iteration: an API algorithm that overcomes the problem of sparse rewards by training on a distribution of states that allows the reward to propagate from the goal state to states farther away. Autodidactic Iteration is able to learn how to solve the Rubik’s Cube without relying on human data. Our algorithm is able to solve 100% of randomly scrambled cubes while achieving a median solve length of 30 moves — less than or equal to solvers that employ human domain knowledge.
Forest Agostinelli, Stephen McAleer, Alexander Shmakov, Pierre Baldi. Solving the Rubik’s cube with deep reinforcement learning and search. Nature Machine Intelligence, 2019; DOI: 10.1038/s42256-019-0070-z (Data availability)

Airlines are finally fixing the middle seat

Mark Wilson, Airlines are finally fixing the middle seat, Fast Company, 07.18.19

Spotify是如何推薦新歌的?

極客公園,越聽越上癮,Spotify是如何推薦新歌的?,2019.07.18
音樂推薦在Spotify中並不是一個具象的功能,它被融入到了很多更細微的模組中。首先我們可以從時間緯度出發。Made For You是Spotify推薦歌單的一個集合,每個人的都不一樣,它會以每日、每週甚至是每年的頻率為你推薦音樂
Cold start
最後,如果你剛剛接觸Spotify,歌單/音樂庫應該還不夠豐富,這裡分享一個快速填充的方法:首先找到一首你最喜歡的音樂,右鍵選擇進入Song Radio,Spotify會基於這首歌推薦一個包含50首音樂的歌單,你可以選擇將這份歌單保存到自己的音樂庫中;對這份歌單進行篩選,將喜歡的音樂保存到音樂庫(紅心),同時繼續右鍵進入Song Radio,重複多次你的歌單/音樂庫就會豐富很多,Spotify也會更加瞭解你的音樂口味。

Mapping roads through deep learning and weakly supervised training

Saikat Basu, Derrick Bonafilia, James Gill, Danil Kirsanov, and David Yang, Mapping roads through deep learning and weakly supervised training, Facebook, July 23, 2019.
We collected our training data as a set of 2,048-by-2,048-pixel tiles, with a resolution of approximately 24 inches per pixel. We discarded tiles where fewer than 25 roads had been mapped, because we found that they often included only major roads (with no examples of smaller roads that would be more challenging to label correctly). For each remaining tile, we rasterized the road vectors and used the resulting mask as our training label. To work at the same resolution as the DeepGlobe data set, we randomly cropped each image to 1,024 by 1,024 pixels, thereby producing roughly 1.8 million tiles covering more than 700,000 square miles of terrain. The result was 1,000x more than the roughly 630 square miles that the DeepGlobe data set covered. To create segmentation masks from these road vectors, we simply rasterized each road vector to five pixels. Semantic segmentation labels tend to be pixel-perfect, but the labels we create with this heuristic are not. Roads vary in width and contour in ways that these rasterized vectors could not capture perfectly. Furthermore, roads in different regions around the globe are mapped from different satellite imagery sources and thus do not always align completely with the imagery we use for our training data. 

7/20/2019

損益平衡點

聊天得知同學畢業後想開餐飲店,問他會不算 (剛修過 的) 生管的損益平衡點 ,引發我出了一題作業。有次和在外工作多年的朋友聊天,他想開咖啡店,我用此方法算出的損益平衡點和朋友相近。

更詳盡的分析,請參見
果漾水豬,分享一下開了兩年咖啡店的心得 有心想開咖啡店的進來

7/06/2019

日本限令對韓國半導體業的影響

陳達誠編譯日本為何能掌握韓國半導體命脈?鉅亨網2019/07/03
日本政府在出口到韓國的半導體相關先進材料將進行管制,自 7 月 4 日起將取消韓國的最惠國待遇,在氟化聚醯亞胺、光阻劑、高純度氟化氫,這三種 OLED 面板及半導體生產上不可或缺的原料出口,將從原先的免申請出口許可,改為逐案審核。而在相關的審查作業上,最長將花費 90 個工作天。 
此消息一出,引起韓國一片嘩然。根據韓國媒體報導,該國半導體產業對日本的依存度極高,在製造機台方面,韓國國產品比例只有 2 成不到。在剩餘的 8 成左右,則都仰賴自日本、美國,以及荷蘭等國進口。 
韓國是全球最大的半導體生產國,然而生產原料及設備都極度依賴進口。韓國的半導體生產設備自製率只有 18.2%,而在原料方面則達到 50.3%(2017 年時)。而在半導體的相關原料上,有近 50% 依賴自日本進口。

7/02/2019

LIS 線上教學平台

LIS聲音的傳遞-波以耳2018年8月28日 (more)


一塊錢電池充手機




林秀姿戶頭剩1千元 他堅持理念要做一輩子的教育,聯合報2018/12/18
嚴天浩大學考進成大化學系 ... 
「我曾經連續吃了3個月的泡麵才有今天,」陳爸從LIS起步時就大力支持,是整個團隊的大恩人。嚴天浩聽了恩人的話後恍然大悟:「有道理,頂多吃泡麵而已,假如我們相信這個價值,假如這幾個月,我們幾個人可以影響很多很多的孩子,那吃泡麵也值得。」 
嚴天浩常想,假如求學階段,有人提供這樣的學習資源給他,他的學習過程會比較快樂,因為這股信念,他們拒絕補習班買斷,仍想把影片提供給學習弱勢的孩子,「很多人會說現在的社會,反映出的是20年前的教育,因為當初的學生,長大後就是我們社會現在的樣子。我們想改變的,就是20年後的社會。」
科學教材集資計畫 請大家幫忙

7/01/2019

AIQ 的時代

何玉方譯,AIQ:不管你願不願意,現在已是AIQ比IQ、EQ更重要的時代,商業周刊,2019
Nick Polson and James Scott, AIQ: How People and Machines Are Smarter Together, St. Martin's Press, 2018.
1. AI關鍵發展史上,7個人類智慧影響人工智慧的故事
2. 解讀促進AI發展的4大元素
3. 機器智慧(machine intelligence)新解!借助人工智慧之力「放大」人類智慧
透過歷史人物故事和基礎數學,說明人工智慧的現況與應用適合當成大學通識課程的課本,或是高中數學的課外讀物

6/30/2019

鬼島 (Ghost Island)

黃明志Namewee ft.大支Dwagie【鬼島 Ghost Island】@亞洲通話 Calling Asia



外交部給外國學生來台灣讀大學的獎學金

收到某 Line 群組的訊息
給外國來台灣讀大學的學生每年1250個名額。讀語言學校1年,每月25000元生活費,讀大學4年每月30000元一共可爽領5年,還有免費來回機票。   
第8頁是台灣優先口號下的台灣學生獎學金名額34人每人25000元 (一次性不是每月)。 
每年1250名外國學生拿5年,也就是說滿額的情況下有6250名外國學生在台灣每月領25000-30000元。(6250x12=75000) 
他們領的獎學生等於75000個外交部給台灣學生獎學金的名額。所以外交部對台灣學生的原則是75000:34 
我們的基本工資有25000-30000元嗎? 6250人/3萬=187,500,000
每月1.8億的錢就這樣養一群外國學生?而台灣學生要打工要背學貸!

6/27/2019

Flexport 要做跨境物流的大平台

Olima,國貿物流再進化!不只貨運界 Uber!Flexport 要做跨境物流的大平台,bnext,2017.03.08
Flexport 於 2013 年成立,總部座落於美國矽谷,另外在舊金山、紐約、阿姆斯特丹、香港及深圳均設有辦公室;Flexport 想提供的是「以人為本的國際貨運代理整合服務」,試圖透過科技與軟體即時追蹤、管理貨運運輸活動,協助顧客分析國際航線、運輸價格、倉儲成本、訂單履行、以及進出口報關等資料。該公司所研發的系統,能夠利用大數據來分析貨物處理,利用科技化設備即時追蹤貨物流向,以節省物流作業流程、降低交易成本;亦透過電子化流程來減少人工錯誤,以大幅提升效率。 
Flexport 目前的主要客群是上市公司或新創公司,他們通常會有將原物料、零件或半成品運送到其他國家的需求。Flexport 想做的是建立一個服務完善且資訊透明的平台,在此平台中整合貨運流程的所有環節,使用「一鍵式服務(one-click service;亦即可在平台裡點一點按鍵就能處理所有事情)」讓顧客得以簡單、清楚且收費合理地完成原本繁瑣的貨物進出口流程。試想從前這些繁複的國際貿易流程(包含報關、訂艙、拼櫃、稅務、保險等),都必須各自透過專業機構來進行,顧客也必須跑得焦頭爛額才能完成,現在只要透過 Flexport 的平台,竟然就像上 24 小時購物網站買東西一樣容易,坐在家裡按一按即可完成!

6/26/2019

異端的勇氣:韋政通的一生

韋政通異端的勇氣:韋政通的一生水牛2018
韋政通用九十一年的時間,展現一個人如何在各種壓力下,不僅貫徹自己的理念,同時人生竟可以活得如此精采。閱讀韋政通的一生,提醒著我們,時代仍在繼續走著,還要繼續走下去的未竟旅程,正是你我的人生。

6/25/2019

工業3.5

清華講座教授暨美光講座教授簡禎富,二十多年來他深入產學合作第一線,與台灣各產業龍頭合作,深耕智慧製造和大數據分析的研究結果,指出工業4.0革命的三大願景中,大數據與虛實整合系統只是基礎架構和工具目標,根本目標在於掌握彈性決策的核心能力。

6/19/2019

刷新未來

Satya Nadella, Hit Refresh: The Quest to Rediscover Microsoft’s Soul and Imagine a Better Future
領導人帶領企業蛻變轉型的內心筆記
  不同於一般CEO細數往日戰果,納德拉真誠的分享一位接棒專業經理人尋找組織靈魂、在不變的價值融入新思維、從高階團隊啟動變革的歷程與作法,以及他對於同理心與領導力的深刻體悟。
產業龍頭預示未來科技的趨勢解析
  避免重蹈覆轍,微軟搶先投資三大關鍵技術:人工智慧、混合實境、量子運算。這三大技術的力量將相互加成,突破摩爾定律、改變人與環境互動的方式,讓我們重新想像新的可能。 
面對未來科技衝擊的思辨架構
  所有資訊在雲端跨境流動,個人隱私與社會安全到底是誰的責任?
  人工智慧加速進入各個領域,工作汰舊換新無法避免,如何減輕機器替代的衝擊?
  科技帶來的經濟成長,如何讓更多人雨露均霑?

6/16/2019

6/06/2019

若水揭露標註師工作的秘訣

簡季婕說明,雖然原始資料量(raw data)越多越好,但真正關鍵的挑戰是蒐集到可用的資料,舉無人車針對路況做出反應的模型為例,一般正常路況的資料量多、好蒐集,但真正訓練模型做出反應的車禍資料,反而較難取得。因此,在數據的蒐集階段有兩項重點,一是要蒐集到足夠全面的資料(Variety),包括靜態、動態、不同環境的資料,二是要蒐集到不同複雜程度得資料(Complexity),無論是陰暗光線、雨水、被標註物件的數量或大小都會讓情境變複雜。

Alan Turing, Condemned Code Breaker and Computer Visionary

Alan Cowell, Overlooked No More: Alan Turing, Condemned Code Breaker and Computer Visionary, The New York Times, June 5, 2019.
His genius embraced the first visions of modern computing and produced seminal insights into what became known as “artificial intelligence.” As one of the most influential code breakers of World War II, his cryptology yielded intelligence believed to have hastened the Allied victory. 

LinkedIn 的求職推薦系統

LinkedIn 使用了一個包含三個因素的標準來描述搜索推薦模型需要實現的目標。 
1.關聯: 搜索結果不僅需要返回給相關的候選人,還需要顯示可能對目標職位感興趣的候選人。

6/03/2019

美股研究室

本書告訴你近60種選股方法,由1999年至2017年的投資報酬率、標準差、sharpe指標、年化超額報酬率及系統風險系數(β)。這是利用付費網站Portfolio123的資料,找出所有美股的資料,經過上千次的統計而得到的結論。
本書用資料告訴你,如果你只用本益比的單一條件選股,你可以得到15.8%的年化報酬率,而標準差是21.9%。如果你用彼得林區著名的選股方法PEG(本益成長比=本益比÷淨利潤增長率)來選股,那麼報酬率大約是7.6%,只比大盤強一點。而投報率最高的是用P/B、P/S、EY、ROE、ROC 5個因子選股,高達21%。 
美股是世界上最值得投資的市場,雖然偶有拉回,但是,平穩向上的趨勢不變。投資美股是聰明的,但要如何聰明地投資美股?本書提供最精準的參考依據。

6/02/2019

找回台灣經濟正義與活力

關心台灣的發展,此為值得一讀再讀的好書,可以當成通識教育,也要督促政治人物改變。

朱敬一,找回台灣經濟正義與活力,天下雜誌出版,2015
1. 兩岸經貿問題是政治問題,不能簡化為自由化、國際化問題!
2. 全球化不能迴避,但是簽 FTA 並不是經濟萬靈丹
3. 胡亂降稅無法吸引投資,反而拉大貧富差距,製造社會對立!

6/01/2019

十二年國教的狂想

看了許許多多的正反意見,讓我講幾個故事。

當年我的預官訓是在通訊學校完成,同連 100 多人,都是公私立大學和五專的電機系學生。每一週的放假需要有人輪流留守,所以大家事先講好,多出來的一週由抽到某些『較涼』單位的人負責留守。等待要兌現諾言前一週的協調會,舉手發言反悔的人都是台清交的,排名越好、反悔的人越多。最後在我發言後重抽。只重視單一考試價值的聯考下,培養出如此的人才是我們社會想要的嗎 (1)?

If You Don’t Know, Now You Know: 5G

The Daily Show with Trevor Noah, If You Don’t Know, Now You Know: 5G, 2019/5/22



5/26/2019

How IBM Watson Overpromised and Underdelivered on AI Health Care

Eliza Strickland, How IBM Watson Overpromised and Underdelivered on AI Health Care, IEEE Spectrum, 2 Apr 2019. 
In many attempted applications, Watson’s NLP struggled to make sense of medical text—as have many other AI systems. “We’re doing incredibly better with NLP than we were five years ago, yet we’re still incredibly worse than humans,” says Yoshua Bengio, a professor of computer science at the University of Montreal and a leading AI researcher. In medical text documents, Bengio says, AI systems can’t understand ambiguity and don’t pick up on subtle clues that a human doctor would notice. Bengio says current NLP technology can help the health care system: “It doesn’t have to have full understanding to do something incredibly useful,” he says. But no AI built so far can match a human doctor’s comprehension and insight. “No, we’re not there,” he says....

Reinforcement Learning and Optimal Control by Bertsekas

Dimitri P. Bertsekas,  Reinforcement Learning and Optimal Control, MIT, 2019.
The purpose of the book is to consider large and challenging multistage decision problems, which can be solved in principle by dynamic programming and optimal control, but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively referred to as reinforcement learning, and also by alternative names such as approximate dynamic programming, and neuro-dynamic programming.

Two Sigma 的避險基金

紀茗仁、譚偉晟、黃亞琪,光速撈上萬資料 避險基金靠它找標的,今周刊,2019-01-23
一五年十月,《富比世》雜誌曾經報導這家避險基金利用AI的海搜資料本事。當時,公司用於投資決策分析的海搜資料來源就已多達一萬個,動用七萬五千顆CPU(中央處理器);蒐集面向概略可分為四大層面,基本面、技術面之外,還有像是併購訊息等「特殊事件」類型的資訊;最特別的,則是被稱為「第一手資料」的消息。 
何謂「第一手資料」?就是各種看似與股價沒有直接關聯的消息。舉例來說,Two Sigma的AI系統會從推特等社群媒體的貼文,抓取關於某家零售商的相關抱怨,分析消費者的「怨氣」是否可能影響股價。

當然,還要搭配其他三種面向的分析,例如,即使消費者的怨氣不小,但若發現該零售商股價已從低點突破兩百日均線,且公司主管悄悄買進了更多自家股票,整體分析下來,仍可能做出買進結論。其實,Two Sigma用來分析股價的資料來源族繁不及備載,甚至包括天氣對個股的影響,都被收納在資料蒐集的範圍內。
INSIGHTS at  Two Sigma: Forecasting Factor Returns

5/24/2019

12 年國教的 AI 課程

Welcome to the ai4k12 wiki! This interim site is being used to organize the AI for K-12 initiative jointly sponsored by AAAI and CSTA. This page will help us get started on the dialog that will eventually result in (1) national guidelines for AI education for K-12, and (2) an online, curated Resource Directory to facilitate AI instruction. To join the AI for K-12 mailing list, send mail to ai4k12@aaai.org. To read about the initiative, see these slides.
Five Big Ideas in AI (page 37 - 42), Overview of the Resource Library (pages 59 - 77).

The Use of UAVs in Humanitarian Relief (無人機在人道主義救濟中的應用)

Raissa Zurli Bittencourt Bravo, Adriana Leiras, and Fernando Luiz Cyrino Oliveira, The Use of UAVs in Humanitarian Relief: An Application of POMDP-Based Methodology for Finding Victims, Production and Operations Management,  Vol. 28, No. 2, February 2019, pp. 421–440.
Researchers have proposed the use of unmanned aerial vehicles (UAVs) in humanitarian relief to search for victims in disaster-affected areas. Once UAVs must search through the entire affected area to find victims, the path-planning operation becomes equivalent to an area coverage problem. In this study, we propose an innovative method for solving such problem based on a Partially Observable Markov Decision Process (POMDP), which considers the observations made from UAVs. The formulation of the UAV path planning is based on the idea of assigning higher priorities to the areas that are more likely to have victims. We applied the method to three illustrative cases, considering different types of disasters: a tornado in Brazil, a refugee camp in South Sudan, and a nuclear accident in Fukushima, Japan. The results demonstrated that the POMDP solution achieves full coverage of disaster-affected areas within a reasonable time span. We evaluate the traveled distance and the operation duration (which were quite stable), as well as the time required to find groups of victims by a detailed multivariate sensitivity analysis. The comparisons with a Greedy Algorithm showed that the POMDP finds victims more quickly, which is the priority in humanitarian relief, whereas the performance of the Greedy focuses on minimizing the traveled distance. We also discuss the ethical, legal, and social acceptance issues that can influence the application of the proposed methodology in practice.

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Jonathan Frankle and Michael Carbin, The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, ICLR 2019 (best paper).
We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

利用人工智慧產生的多媒體資訊

James Vincent, This AI-generated Joe Rogan fake has to be heard to be believed, The Verge, May 17, 2019.



5/23/2019

英業達的智慧製造

今年三月,陳維超到美國NVIDIA 開發者大會上演講,談的就是這一年多來,英業達如何運用Edge AI(終端人工智慧)做工業瑕疵檢測。智慧製造的主要應用有兩層:一,流程自動化,包括自動測檢、生產排程;二,預測性分析,如訂單預估、預防性保養。

前線國際 (Frontier) 的布料搜尋引擎

王志鈞迎接科技紡織新紀元自由時報2019-05-22
Frontier能從供應端數位化布片圖樣,經由四具AI引擎分類轉為有價值的資料,大大簡化成衣品牌商及採購商的決策、採購過程,除可節省80%的布片尋找時間外,還可讓資料庫中長尾化的隱藏性布片,有效被設計師所快速撈出、選用。 
透過數位化系統,紡織業每一季的新樣衣開發流程,可以從約90~60天縮短到最快15天。因此,每個品牌每年可以推出更多季的衣服,創造更高的營業額,並且更精準地行銷生產。 

5/21/2019

把廢柴教到全部上大學 她寫鮮師傳奇

郝廣才把廢柴教到全部上大學 她寫鮮師傳奇,今周刊2019-05-15 
面對學生消極的高牆,柯林斯用「積極」來打破。她總是在上課的第一天,對學生說:「你們要樹立的是信心。我相信你們能成功,能承擔生活的責任。停止抱怨社會、老師和父母,幸福快樂就在自己身上!」她培養學生的長處,如同爸爸從小那樣激勵她。 

5/19/2019

台達電的轉型和智慧製造

在台達東莞、吳江廠內,智慧自動化模範生產線從人工插件、測試、包裝、鎖螺絲到點交,全部機台都整合到一氣呵成。「現在model line (模範線)每個廠都有做,都可以達到90%人力的取代,沒有問題,接下來難的是平行展開,」鄭平接受《天下》採訪時自信地說。...

5/18/2019

Are adversarial examples inevitable?

Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein, Are adversarial examples inevitable?, ICLR, 2019. (open review)
Theorem 1 (Existence of Adversarial Examples)
Theorem 2 (Adversarial examples on the cube)
Theorem 3 (Sparse adversarial examples)
Theorem 4 (Condition for existence of adversarial examples) 

5/13/2019

14 Grand Challenges for Engineering in the 21st Century

National Academy of Engineering, 14 Grand Challenges for Engineering in the 21st Century.
Make Solar Energy Economical
Provide Energy from Fusion
Develop Carbon Sequestration Methods
Manage the Nitrogen Cycle
Provide Access to Clean Water
Restore and Improve Urban Infrastructure
Advance Health Informatics
Engineer Better Medicines
Reverse-Engineer the Brain
Prevent Nuclear Terror
Secure Cyberspace
Enhance Virtual Reality
Advance Personalized Learning
Engineer the Tools of Scientific Discovery
PingWest,事關人類存亡的 14 大工程難題,要靠 AI 來搞定了,TechNews,2019 年 05 月 13 日

5/12/2019

Leveraging Comparables for New Product Sales Forecasting

Lennart Baardman, Igor Levin, Georgia Perakis, and Divya Singhvi, Leveraging Comparables for New Product Sales Forecasting, Production and Operations Management, Volume 27, Issue 12, 06 December 2018, Pages: 2340-2343. (First Prize of the 3rd POMS Applied Research Challenge)
This work develops an accurate, scalable and interpretable forecasting tool calibrated with our industry partners’ data. These characteristics are important to our two major industry partners, one being Johnson & Johnson Consumer Companies Inc., a consumer healthcare manufacturer, the other being a large fashion retailer. In building our tool we are motivated by an approach that has been used by industry practitioners: identify a set of products comparable to the new product, average their historical sales, and use this as a forecast. In line with this approach, we devise a model that uses analytics to jointly cluster products while estimating a regularized regression model for each cluster’s sales.... 
The joint cluster-while-regress model is formulated as a non-linear integer optimization problem that is proven to be NP-hard. However, we use the practical interpretation of our problem to devise a fast algorithm whose iterative steps mimic industry practice.... 
Working in collaboration with two large industry partners, we show that our algorithm results in a 20–70% MAPE improvement and 10–60% WMAPE improvement over several benchmarks used in practice.

5/10/2019

高中競相找大學教授模擬面試真的有用嗎?

顏聖紘,高中競相找大學教授模擬面試真的有用嗎?,03 Apr, 2018

企業實驗 (Business Experiments)

和一位任職於銀行大數據部門主管聊天,談到企業如何實施行銷的實驗,我建議幾篇論文可以參考

Eric Almquist and Gordon Wyner, Boost Your Marketing ROI with Experimental Design, Harvard Business Review, Oct 01, 2001.

Thomas H. Davenport, How to Design Smart Business Experiments,  Harvard Business Review, Feb 01, 2009.

Eric T. Anderson and Duncan Simester, A Step-By-Step Guide to Smart Business Experiments,  Harvard Business Review, Mar 01, 2011.

5/01/2019

杏一醫療的數據應用

王姿琳150 萬會員資料亂糟糟 杏一如何翻轉它?,商業周刊2019.04.17
精準預測市場需求
別人下架它備貨,避缺貨潮 
杏一醫療是國內少數可以做到預測市場需求,提高進貨、銷貨與存貨管理效率的業者。舉例來說,今年春天,多數廠商依循過往經驗,在2月就準備將暖暖包下架,避免天氣變暖,導致商品滯銷。但杏一靠著將會員數據、氣象數據與新聞時事結合,預測暖暖包在2月底、3月初將會有需求,提前備貨,成功避免缺貨影響銷售。... 

4/20/2019

Artificial Intelligence: A Modern Approach

Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3/e, Prentice Hall, 2009.

完整且經典的書。Python 碼

On a Formal Model of Safe and Scalable Self-driving Cars (自駕車)

Shai Shalev-Shwartz, Shaked Shammah, and Amnon Shashua, On a Formal Model of Safe and Scalable Self-driving Cars, arXiv:1708.06374, Mobileye, 2017.
In order to gain perspective over the typical values for such probabilities, consider public accident statistics in the United States. The probability of a fatal accident for a human driver in 1 hour of driving is 10^−6. From the Lemma above, if we want to claim that an AV meets the same probability of a fatal accident, one would need more than 10^6 hours of driving. Assuming that the average speed in 1 hour of driving is 30 miles per hour, the AV would need to drive 30 million miles to have enough statistical evidence that the AV under test meets the same probability of a fatal accident in 1 hour of driving as a human driver.... (*) 

4/19/2019

Risk-based policies for airport security checkpoint screening (機場安檢檢查站檢查)

L.A. McLay, A.J. Lee, and S.H. Jacobson, Risk-based policies for airport security checkpoint screening, Transportation Science, Volume 44, Issue 3, August 2010, pp. 333-349. (Informs 2018 Impact Prize) 
Passenger screening is an important component of aviation security that incorporates real-time passenger screening strategies designed to maximize effectiveness in identifying potential terrorist attacks. This paper identifies a methodology that can be used to sequentially and optimally assign passengers to aviation security resources. An automated prescreening system determines passengers' perceived risk levels, which become known as passengers check in. The levels are available for determining security class assignments sequentially as passengers enter security screening. A passenger is then assigned to one of several available security classes, each of which corresponds to a particular set of screening devices. The objective is to use the passengers' perceived risk levels to determine the optimal policy for passenger screening assignments that maximize the expected total security, subject to capacity and assignment constraints. The sequential passenger assignment problem is formulated as a Markov decision process, and an optimal policy is found using dynamic programming. The general result from the sequential stochastic assignment problem is adapted to provide a heuristic for assigning passengers to security classes in real time. A condition is provided under which this heuristic yields the optimal policy. The model is illustrated with an example that incorporates data extracted from the Official Airline Guide.

Temporal Big Data for Tactical Sales Forecasting in the Tire Industry

Yves R. Sagaert, El-Houssaine Aghezzaf, Nikolaos Kourentzes, and Bram Desmet, Temporal Big Data for Tactical Sales Forecasting in the Tire Industry, Interfaces, Volume 48, Issue 2, March-April 2018, pp. 121–129.
We propose a forecasting method to improve the accuracy of tactical sales predictions for a major supplier to the tire industry. This level of forecasting, which serves as direct input to the demand-planning process and steers the global supply chain, is typically done up to a year in advance. The product portfolio of the company for which we did our research is sensitive to external events. Univariate statistical methods, which are commonly used in practice, cannot be used to anticipate and forecast changes in the market; and forecasts by human experts are known to be biased and inconsistent. The method we propose allows us to automate the identification of key leading indicators, which drive sales, from a massive set of macroeconomic indicators, across different regions and markets; thus, we can generate accurate forecasts. Our method also allows us to handle the additional complexity that results from short-term and long-term dynamics of product sales and external indicators. For the company we study, accuracy improved by 16.1 percent over its current practice. Furthermore, our method makes the market dynamics transparent to company managers, thus allowing them to better understand the events and economic variables that affect the sales of their products.