2/02/2023

學習大數據 (big data) 的技能

一些工具或念個學位

可以參考 DS Examiner, Data Scientist Foundations: The Hard and Human Skills You Need, November 8, 2013

或者  Insight Data Science Fellows Program 說明了可能使用的工具
  1. Software Engineering Best Practices: Learn how to contribute to a large code-base and instrument a web application to collect data. Tools you may learn: Python, Git, LAMP web stack, Javascript, Flask.
  2. Storing and Retrieving Data: How to clean data, store it in the appropriate database or distributed data storage system and then run queries to retrieve the information needed for analysis. Tools you may will learn: MySQL, Hadoop, Hive.
  3. Statistical Analysis & Machine Learning: Learn industry best practices for doing basic and advanced statistical analysis on large data sets. Tools you may learn: R, NumPy & SciPy, Mahout.
  4. Visualizing and Communicating Results: Learn how to effectively communicate your findings visually and verbally. Tools you may learn: D3 Javascript library, visualization and presentation best practices. 
當然,如果大學時的基礎夠好,也可以自學,例如
可以念個碩士,以 UT Austin 為例,美國德州州民學雜費 $32,000 美元、其他人 (含國際學生) $38,000 美元 (註 2),一年修完 12 門課即可,例如
MIS 381N: Data Analytics Programming
STA 380.17: Introduction to Predictive Modeling
MIS 381N: Decision Analysis
BA 385T: Financial Management
MIS 381N.1: Introduction to Database Management
MIS 382N: Advanced Predictive Modeling
BA 191: Career Services Strategies- MSBA
MIS 381N: Stochastic Control & Optimization
STA 380.18: Learning Structures & Time Series
MIS 382N.11: Business Analytics Capstone
MIS 182N: Data Visualization
或者 UC Berkeley 的 Master of Information and Data Science 
Research Design and Application for Data and Analysis
Exploring and Analyzing Data
Storing and Retrieving Data
Applied Machine Learning
Visualizing and Communicating Data
Experiments and Experimentation with Data
Privacy, Security, and Ethics of Data
Really Big Data: Scaling up and Parallelism
Synthetic Capstone Course
或者 Columbia 的 Certification of Professional Achievement in Data Sciences
Algorithms for Data Science
Probability & Statistics
Machine Learning for Data Science
Exploratory Data Analysis and Visualization
或者 Stanford 的 Data Mining and Applications Graduate Certificate
STATS202 Data Mining and Analysis
STATS216 Introduction to Statistical Learning
STATS290 Paradigms for Computing with Data
STATS315B Modern Applied Statistics: Data Mining
或者 CMU 的 Master of Science in Machine Learning
Machine Learning (10-701)
Statistical Machine Learning (10-702)
Intermediate Statistics (10-705)
5 選 2:Multimedia Databases (15-826), Algorithms (15-750) or Algorithms in the Real World (15-853), Optimization (10-725), or Graphical Models (10-708).
3 門選修
或者 MIT 的 Master of Business Analytics 一年完成學費 75000 美金  
15.093J/6.255J: Optimization Methods
15.079: Introduction Applied Probability or 6.431 Applied Probability
15.062: Data mining: Finding the Data and Models that Create Value
15.572: Analytics Lab
15.071: The Analytics Edge
Analytics Project Course
three focused electives in E-Commerce, Finance, Managerial Economics, Marketing, OM, OR/Statistics 
或者到 Coursera 搜尋 data可以找到許多免費的課程例如 Introduction to Data ScienceWeb Intelligence and Big Data 等等MOOC 上有許多優良的課程個人推薦 edx 上 Dimitris Bertsimas 和 Allison O’Hair 的 The Analytics EdgeCoursera 上 Andrew Ng 的  Machine Learning和 Stanford 上 Trevor Hastie 和 Rob Tibshirani 的 Statistical Learning

(註 1) 以中央銀行新臺幣 / 美元銀行間收盤匯率的 29 計算,一門課的費用是 91,833 台幣 (38,000 * 29 / 12)。

沒有留言:

張貼留言