丘城桐:基础数学和AI, Big Data

AI and Big Data are Twins, their Mother is Math.

“AI 3.0“ today, although impressive in “DeepLearning“, is still using “primitive” high school Math, namely:

AI has not taken advantage of the power of post-Modern Math invented since WW II, esp. IT related, ie :

That is the argument of the Harvard Math Dean Prof ST Yau 丘城桐 (First Chinese Fields Medalist), who predicts the future “AI 4.0“ can be smarter and more powerful.


… Current AI deals with Big Data:

  1. Purely Statistical approach and experience-oriented, not from Big Data’s inherent Mathematical structures (eg. Homology or Homotopy).
  2. The Data analytical result is environment specific, lacks portability to other environments.

3. Lack effective Algorithms, esp. Algebraic Topology computes Homology or Co-homology using Linear Algebra (Matrices).

4. Limited by Hardware Speed (eg. GPU), reduced to layered-structure problem solving approach. It is a simple math analysis, not the REAL Boltzmann Machine which finds the most Optimum solution.


AI 1.0 : 1950s by Alan Turing, MIT John McCarthy (coined the term “AI”, Lisp Language inventor).

AI 2.0 : 1970s/80s. “Rule-Based Expert Systems” using Fuzzy Logic.

[AI Winter : 1990s / 2000s. Failed ambitious Japanese “5th Generation Computer” based on Prolog-based “Predicate” Logic]

AI 3.0 : 2010s – now. “DeepLearning” by Prof Geoffry Hinton using primitive Math (Statistics, Probability, Calculus Gradient Descent)

AI 4.0 : Future. Using “Propositional Type” Logic, Topology (Homology, Homotopy) , Linear Algebra, Category.

Math for AI : Gradient Descent

Simplest explanation by Cheh Wu:

(4 Parts Video : auto-play after each part)

The Math Theory behind Gradient Descent: “Multi-Variable Calculus” invented by Augustin-Louis Cauchy (19 CE, France)

1. Revision: Dot Product of Vectors


2. Directional Derivative

3. Gradient Descent (opposite = Ascent)


Deeplearning with Gradient Descent:

In Math We Trust: Quantum Computing, AI and Blockchain – The Future of IT

In memory of Prof Zhang Shoucheng 张首晟教授 who passed away on 1 Dec 2018.

Key Points :

  1. Quantum Computing with “Angel Particle” (no anti-particle) : [Analogy] Complex Number (a + i.b) , ‘Anti’ = Conjugate = a – i.b, ‘No anti’ = Real number = a
  2. A. I. Natural Language Algorithm : “Word To Vector” eg. King / Queen (frequently appear together) , etc.
  3. Data Privacy and Big Data Analytics with A. I. : Homomorphic Encryption, ie reveal data but not privacy. (eg. Millionaire Problem)

Machine Learning is Fun! – Adam Geitgey – Medium


(中文) :


Unsupervised learning is the future ML (Machine Learning) – of which AI is a branch – with the latest algorithm Deeplearning showing only 5% of its potential (more yet to be invented).

Singapore has recently launched an AI program to educate 10,000 students & workers. (Partnership with Microsoft and IBM, a 3-hour free lesson).

The world’s 4 AI gurus :

  1. (UK/Canada) Prof Geoffrey Hinton (*) , the inventor of DeepLearning, and
  2. his post-doctorate associate (France) Prof Yann Lecun ,
  3. The ex-Google & ex-Baidu AI Chief Prof Andrew NG 吴恩达,
  4. The AlphaGo creator Demis Hassabis

Andrew and Demis both studied in Singapore secondary schools (NG in Raffles Institution) before pursuing university in Stanford and Cambridge, respectively.

Note (*) : Prof Geoffrey Hinton was involved in the 80s Expert Systems where rule-based knowledge engine was the AI (2.0) . This AI failed because of fixed rules knowledge base under “supervised learning” from human domain experts, who each differed from another in opinions, to give an un-biased “weights” (rule probabilities from 0 to 1). Prof Hinton continued the AI research by moving from UK to Canada, where he developed the Deeplearning algorithm with unsupervised learning from Big Data Training feed to calculate the “Costs” (ie deviations of AI result versus actual result, using Cauchy’s Calculus eg. “Gradient Descent”, etc).


Best AI Programming Languages


The above author recommends 5 best AI languages:

  1. Python
  2. Java (compatible: Google Kotlin*)
  3. Lisp (#modern ‘clone’: Clojure*)
  4. Prolog (#)
  5. C++

Note 1: I have reservation for # 3) & #4) which are the 1970s / 1980s obsolete languages due to the issues in performance and no “practical” platforms (in mobile phone age), besides lack of major SW/HW vendor support (Google, Oracle, Microsoft, etc), and the small user community unlike the other three languages.

Note 2: Functional Programming (FP) is the modern AI language MUST-HAVE feature – only Kotlin & Clojure are “FP”.

Top 20 Math Books on Machine Learning / AI


    The first 4 books (by Strang, Lang, etc) are the Masterpieces.

    • Linear Algebra by Strang. He writes math like few folks do, no endless paragraphs of definitions and theorems. He tells you why something is important. He wears his heart on his sleeve. If you want to spend a lifetime doing ML, sleep with this book under your pillow. Read it when you go to bed and wake up in the morning. Repeat to yourself: “eigen do it if I try.”

      Strang’s MIT OpenCourse:



      • Introduction to Applied Math by Strang. You’ll need to understand differential equations at some point, even to understand the dynamics of deep learning models, so you’ll benefit from Strang’s tour de force of a survey through a vast landscape of ideas, from numerical analysis to Fourier transforms.
      • Algebra by Lang. This legendary Yale professor has written more “yellow jacketed” tomes in math in the Springer series than any one else. I secretly think he’s a fictitious person actually made up of the entire Yale math faculty. Yes, it’s a long book. Yes, it’s hard going. No, it’s about as far from Strang as you can get. You want out. Be my guest. Mount Everest cannot be climbed by everyone. Here’s a nice phrase : “Today we Strang. Tomorrow we will Lang”. Meaning ML today uses basic linear algebraic ideas like eigenvectors, singular value decomposition etc. in the coming decades, the far more powerful machinery in Lang’s book will come into use. Want to be a leader in the ML of tomorrow? This is what it might require.
      • Computational Homology by Kaczynski. Many of the books above cover some basic topology, the abstract study of shapes. You know, the subfield of math that shows why a coffee cup is the same as a doughnut. Most ML methods assume smoothness of the underlying space. Can one learn anything in a space that has no smoothness metrics defined on it? This subfield of topology studies how to extract geometric structure from datasets without assuming any continuity or smoothness.

      • In All Likelihood by Yudi Pawitan. Fisher’s concept of likelihood is the most important idea in statistics you need to understand and no book I’ve read explains this core idea better than this gem of a book by Yudi. Likelihoods are not probabilities. Repeat to yourself. Yudi wisely avoids complex examples and sticks to simple 1 dimensional examples for the most part. You’ll come away with a much deeper appreciation of statistics from this fine book.
      • Convex Optimization by Boyd. Much of modern ML is couched in the language of optimization. The separating line between tractable and intractable problems is not linear vs. nonlinear but convex vs. nonconvex. Boyd leaves out a lot of important modern ideas but he covers the basics well. Hint: his Stanford lecture notes cover a lot of what is not in the book.
      • Optimization in Vector Spaces by Luenberger. At some point in reading ML papers, you’ll start encountering phrases like “inner product spaces” or “Hilbert” spaces. The latter was popularized by the founder of computer science John von Neumann to formalize quantum mechanics. The joke is he gave a talk at Göttingen on Hilbert spaces and the great mathematician David Hilbert was in the audience. He asked a colleague after the talk: what in the world are these so-called Hilbert spaces? Luenberger covers optimization in infinite dimensional spaces. He explains the most important and profound theorem in optimization: the Hahn Banach theorem. Why do neural nets with sigmoid nonlinear activations represent any smooth function? The HB theorem is the reason. Slim book but a tough one to master.
      • Causal Representations in Statistics by Judea Pearl. For the past 25 years, Pearl has single handedly pursued this problem. To anyone who listens, he will tell you why above all, causality is the most important idea after likelihood in statistics, which however cannot be expressed in the language of probabilities. For all its power, probability theory cannot express such a basic concept like diseases causes symptoms, not the other way. Correlation is symmetric. Causality is fundamentally asymmetric. Pearl explains when and whether one can go from the former to the latter. Pearl is the Isaac Newton of modern AI.
      • Group Representations in Probability and Statistics by Persi Diaconis. Persi is a world famous mathematician who started his career as a magician. He ran away from home when he was young and joined a traveling circus, inventing some very cool card tricks that caught the attention of none other than Martin Gardner who used to write the famous “puzzle column” in Scientific American. When Persi decided to learn math more seriously so he could invent better tricks, he had a problem that he barely had what anyone would call an education. Martin Gardner wrote him a recommendation to Harvard that simply read: “here’s a magician who wants to be a mathematician” and explained why Persi would one day be a famous one. Harvard took the chance and the rest is history. In this slim book, Persi elegantly explains why the mathematics of symmetries — group theory and group representations— can shed deeper light into statistics.
      • Linear Statistical Models by C. R. Rao. For most of you who haven’t heard of this “living god” of statistics, your statistics professor’s PhD advisor likely learned statistics from this book. The famous Rao-Blackwell theorem is at the heart of the foundational concept of sufficient statistics. The equally famous Rao-Cramer theorem relates the ability to learn effectively from samples to the curvature of the likelihood function. In a dazzling paper written in his 20s, he showed that the space of probability distributions was not Euclidean, but a curved Riemannian manifold. This idea shows up in machine learning in a hundred different ways currently. Rao invented multivariate statistics as a young postdoctoral researcher at Cambridge. Hard to believe, but this “Gauss” of statistics is still alive, in his 90s, teaching at a university in India named after him.
      • Convex Analysis by Rockafellar. Unlike Boyd’s book, this one has no pictures. You can instantly tell the difference from a serious math book from a more elementary one. The serious one has no pictures. You want to dig deep into the geometry of convex functions and convex sets, Rockafellar is your guide.
      • The Symmetric Group by Sagan. Group theory comes in two flavors: finite groups and continuous infinite groups. Sagan digs deep into finite groups and their linear algebraic representations in this slim beautiful tome. Think you really understand linear algebra. Reading the first few pages of this book will have you scurrying back to Strang when you realize what you haven’t yet mastered. You might read this along with Persi’s more chatty and less refined presentation. The beautiful concept of the character of a group is explained here. Unlike their linear algebraic cousins, group representations are basis independent (like the trace of a matrix, which is the same in any basis).
      • Analysis of Incomplete Mulitivariate Data by J. L. Shafer. The book to learn EM from, the famous expectation maximization algorithm presented in the way statisticians developed them, not the confusing way it is presented in ML textbooks using mixture models and HMMs. General advice: the statistics you need to learn for ML is best learned from statistics books, not ML textbooks.
      • Neurodynamic Programming by Tsitsiklis and Bertsekas. Still the most authoritative treatment of reinforcement learning. Valuable in many other ways, including a superb treatment of nonlinear function approximation by neural network models. The most enjoyable bus ride of my life was in the company of these two eminent MIT professors a decade ago going to a workshop in a remote region of Mexico. If you really want to understand why Q-learning works, this is your salvation. You’ll quickly discover how weak your math background is, and why you need to understand the deep concept of martingales, which capture the notion of a fair betting game.
      • Non-cooperative Games by John Nash. Yes, the guy who Russell Crowe plays in The Beautiful Mind. This slim 25-page Princeton math PhD thesis earned its author the well deserved Nobel prize in economics. Legend has it von Neumann dismissed this idea when he heard of it as “just another fixed point theorem”. Von Neumann’s own massive tome on games and economic decisions focused entirely on simpler weaker models of games. Nash’s concept has proved more enduring. If you want to understand GAN models more deeply, you need to understand Nash equilibria.
      • Best Approximation in Inner Product Spaces by Deutsch. If you want to see how mathematicians think of machine learning, you need to read this book. Mathematicians tend to think in generalities. This book captures beautifully the way mathematicians think of learning from data, e.g. least squares methods as projections in Hilbert spaces. Even more beautiful ideas like von Neumann’s famous algorithm using alternating projections, the most rediscovered and reinvented algorithm in history, is explained here. Yes, you’ll find that many ideas you thought that came from ML or statistics can all be viewed as special cases of von Neumann’s work (EM, non-negative matrix approximation, and a dozen other ideas). This book teaches you the power of abstraction.
      • The “Lord of the Rings” trilogy on manifolds by Lee. I’m getting to the end of my list of 20 math books for ML, and like most humans, I’m going to start cheating by including “course packs”. You need to really grok manifolds at some point in your quest to study the foundations of ML. Lee’s trilogy on “Topological Manifolds”, “Smooth Manifolds” and “Riemannian manifolds” is the definitive modern guide to understanding curved spaces, like space time (four dimensions), string theory, and probability spaces.
      • Set Theory and Measure Theory by Paul Halmos. PH wasn’t a great mathematician, but he was a great writer. ML is deeply based on being able to measure distances between objects and measure theory is the abstract theory of how to define metrics on sets. Ultimately, probability is just a measure on a set with some special properties.
      • Probability Theory: Independence, Exchangeability, Martingales by Chow and Teicher. Yes, probability is just a measure on sets, but this tour-de-force of a book explains the unique measure-theoretic properties of probability. This book shows you how mathematicians think of probability. I’m guessing you know all about independent random variables. Do you know about exchangeability? Ever used bag of words representations in NLP or computer vision. Why do they work? Why does Q-learning converge? You need to understand the other two foundations of probability theory.
      • For my last book, I’ll choose The Topology of Fiber Bundles by Steenrod. These are ways of parameterizing spaces, and manifolds and Euclidean geometry are special types of fiber bundles. Let’s take the Earth’s surface as a fiber bundle. At each point on the surface, the set of tangents form a second space. The first space, the surface of the Earth, parameterizes the second space of tangents at each point. Ergo, we have a tangent bundle, a special case of fiber bundles. Today’s ML heavily uses the concept of manifolds. Tomorrow’s ML will likely build on fiber bundles.

      AI – DeepLearning – Machine Learning

      3 Waves of AI Evolution:

      1st Wave (1950s) : Alan Turing “The Father of AI” and his Princeton Prof Alonzo Church (Lambda Calculus). MIT Prof Malvin Minksy’s “Lisp” Functional Programming (a.k.a. Symbolic or Declarative) Language.

      2nd Wave (1980s – 1990s) : Knowledge-Based Rule Engine Expert Systems.
      Failed because knowledge acquisition process is too difficult with limited rigid rules.

      3rd Wave (2010s -): DeepLearning is the latest AI tool for Machine Learning, famous after 2016 “AlphaGo” game by a former Funan-center UK Kid Demis Hassabis (UK/Greek father & Singapore Chinese mom teacher) beat 2 “Go” World Champions (Korean Lee Sedol 李世乭 and China 柯洁).

      Great Books Recommended

      1. Learn Everything in 《Deep Learning》:

      • Math (eg. Gradient Descent – by French GrandMaster Cauchy 1847),
      • Linear Algebra (eg. Matrix, Eigen-decomposition),
      • Probability (eg. Bayesian, etc),
      • Key Deep Learning techniques.

      Note: Available at Singapore National Library (LKC Reference #006.31).

      Order at Amazon: https://www.amazon.com/gp/aw/d/0262035618/ref=mp_s_a_1_3?ie=UTF8&qid=1516412628&sr=8-3&pi=AC_SX236_SY340_FMwebp_QL65&keywords=deep+learning

      2. 《The Master Algorithm》 (Book or Audio)


      5 “Tribes” of Machine Learning, all with 3 layers (Representation, Evaluation/Scoring, Optimisation) :

      1. Connectionists (“Deep Learning”, Neural Network)
      2. Bayesians (Probability, Inference Rule)
      3. Analogizers (Similar Pattern)
      4. Symbolists (Logic)
      5. Evolutionaries (Survival the fittest )


      1. Bill Gates recommends this excellent book for 2018 reading, also found it on Chinese President’s Xi JingPing’s Office Bookshelf in 1 Jan 2018 New Year Speech.


      2. All available copies of this book in the Singapore National Library Board have been loaned out !! (Unusual in low-readership Singapore). Please reserve it via online queue.

      3. Audio version (10 CDs) is excellent for in-car listening while driving, or travelling on plane/train/bus for busy persons.


      In 15 years, AI driven driverless car will change the transport/work/environment landscape… it is true not futuristic… behind AI is advanced math which teaches computer to learn without a fixed algorithm but by analysing BIG DATA patterns using Algebraic Topology !


      現在因為人工智能(AI)的發展,配合更高速度的積體電路,科技正在加快速度的進展。據悉,在很短的5 -10年後,医療健保、自駕汽車、教育、服務業都將面臨被淘汰的危機。

      1. Uber 是一家軟體公司,它沒有擁用汽車,卻能夠讓你「隨叫隨到」有汽車坐,現在,它已是全球最大的Taxi公司了。
      2. Airbnb 也是一家軟體公司,它沒有擁有任何旅館,但它的軟體讓你能夠住進世界各地願出租的房間,現在,它已是全球最大的旅館業了。
      3. 今年5月,Google的電腦打敗全球最厲害的南韓圍棋高手,因為它開發出有人工智能(AI)的電腦,使用能夠「自己學習」的軟體,所以它的AI能夠加速度的進步,達到比專家原先預期的、提前10年的成就。
      4. 在美國,使用IBM 的Watson電腦軟體,你能夠在幾秒內,就有90%的準確性的法律顧問,比較起只有70% 準確性的人為律師,既便捷又便宜。
      5. Watson 也已經能夠幫病人檢驗癌症,而且比醫生正確4 倍。
      6. 臉書也有一套AI的軟體可以比人類更準確的鑒察(辨識)人臉,而且無所不在。
      7. 到了2030年,AI的電腦會比世界上任何的專家學者還要聰明。 
      8. 2017年起,會自動駕駛的汽車就可以在公眾場所使用。


      9. 未來的世界,你再也不必擁有車,或花時間加油、停車、排隊去考駕照、交保險費,尤其是城市,將會很安靜,走路很安全,因為90%的汽車都不見了,以前的停車場,將會變成公園。
      10. 現在,平均每10萬公里就有一次車禍,造成每年全球有約120萬人的死亡。


      11. 大部份的傳統汽車公司會面臨倒閉。Tesla、 Apple、及 Google 的革命性軟體,將會用在每一部汽車上。

      據悉,Volkswagen 和 Audi 的工程師非常擔心Tesla革命性的電池和人工智能軟體技術。
      12. 房地產公司會遭遇極大的變化。

      13. 電動汽車很安靜,會在2020變成主流。所以城市會很變成安靜,而且空氣乾淨。
      14. 太陽能在過去30年也有快速的進展。 去年,全球太陽能的增產超過石油的增產。


      15. 健保:今年醫療設備商會供應如同「星球大戰」電影中的 Tricorder,讓你的手機做眼睛的掃瞄,呼吸氣體及血液的化學檢驗:用54個「生物指標」,就可檢驗出你是否有任何疾病的徵兆。

      16. 立體列印(3D printing):預計10 年內,3D列印設備會由近20000美元減到400美元,而速度增加100倍快。

      17. 今年底,你的手機就會有3D掃瞄的功能,你可以測量你的腳送去做「個人化」鞋子。據悉,在中國,他們已經用這種設備製造了一棟6 層樓辦公室,預計到2027年時, 10% 的產品會用3D的列印設備製造。
      18.  產業機會:

      a. 工作:20年內,70-80% 的工作會消失,即使有很多新的工作機會,但是不足以彌補被智能機械所取代的原有工作。
      b. 農業:將有 $100 機械人耕作,不必吃飯、不用住宅、及支付薪水,只要便宜的電池即可。在開發國家的農夫,將變成機械人的經理。溫室建築物可以有少量的水。

      c. 到2020年時 ,你的手機會從你的表情看出,與你說話的人是不是說「假話」? 是否騙人的? 政治人物(如總統候選人)若說假話,馬上會被當場揭發。
      d. 數位時代的錢,將是Bitcoin ,是在智能電腦中的「數據」。
      e. 教育:最便宜的智能手機在非州是$10美元一隻。
      f. 到2020年時,全球70%的人類會有自己的手機,所以能夠上網接受世界級的教育,但大部份的老師會被智能電腦取代。所有的「小學生」都要會寫 Code,你如果不會,你就是像住在Amazon森林中的原住民,無法在社會上做什麼。你的國家,你的孩子準備好了嗎?


      參考一下;這也是矽谷 VC, Innovators,Entrepreneurs … 談的資料。

      Math Education Evolution: From Function to Set to Category

      Interesting Math education evolves since 19th century.

      “Elementary Math from An Advanced Standpoint” (3 volumes) was proposed by German Göttingen School Felix Klein (19th century) :
      1)  Math teaching based on Function (graph) which is visible to students. This has influenced  all Secondary school Math worldwide.

      2) Geometry = Group

      After WW1, French felt being  behind the German school, the “Bourbaki” Ecole Normale Supérieure students rewrote all Math teachings – aka “Abstract Math” – based on the structure “Set” as the foundation to build further algebraic structures (group, ring, field, vector space…) and all Math.

      After WW2, the American prof MacLane & Eilenburg summarised all these Bourbaki structures into one super-structure: “Category” (范畴) with a “morphism” (aka ‘relation’) between them.

      Grothendieck proposed rewriting the Bourbaki Abstract Math from ‘Set’ to ‘Category’, but was rejected by the jealous Bourbaki founder Andre Weil.

      Category is still a graduate syllabus Math,  also called “Abstract Nonsense”! It is very useful in IT  Functional Programming for “Artificial Intelligence” – the next revolution in “Our Human Brain” !