6/06/2026

Use LLM to learn *** and its impact

Research and development

  • Marina Favaro and Jack Clark, When AI builds itselfAnthropic (new)
    • As of May 2026, more than 80% of the code we merge into Anthropic’s codebase was authored by Claude.
    • In the second quarter of 2026, the typical engineer was merging 8× as much code per day as they were in 2024.
    • On the most open-ended tasks, Claude’s success rate reached 76% in May 2026, up 50 percentage points in six months.
    • In this world, the pace of progress in AI development becomes determined entirely by the availability of compute (or the speed of discovering various efficiencies in algorithmic training or inference) for AI systems. Humans play a substantially diminished role in their development, likely moving most of our effort towards oversight, validation, and verification of an expanding “virtual lab” run by AI systems. We expect that systems capable of automated AI research and development would have skills that would transfer to the rest of science, allowing them to begin to revolutionize other fields.
  • Dimitris Bertsimas and Georgios Margaritis, Robust and Adaptive Optimization under a Large Language Model Lens, arXiv:2501.00568. 
  • Tony Feng et al., Aletheia tackles FirstProof autonomously, arXiv:2602.21201. (Prompt)
  • Don Knuth, Claude’s Cycles, Stanford Computer Science Department (28 February 2026; revised 04 March 2026) (Introduction by Valeriy Manokhin) 
  • Thang Luong and Vahab Mirrokni, Accelerating Mathematical and Scientific Discovery with Gemini Deep Think, Google DeepMind, February 11, 2026.
  • David P. Woodruff et al., Accelerating Scientific Research with Gemini: Case Studies and Common Techniques, arXiv:2602.03837. (8.3 Machine Learning Optimization)

Be careful

Courses and information 

I study the following code in the book Mastering Reinforcement Learning with Python by E. Bilgin:  

def first_visit_return(returns, trajectory, gamma):

G = 0

T = len(trajectory) - 1

for t, sar in enumerate(reversed(trajectory)):

s, a, r = sar

G = r + gamma * G

first_visit = True

for j in range(T - t):

if s == trajectory[j][0]:

first_visit = False

if first_visit:

if s in returns:

returns[s].append(G)

else:

returns[s] = [G]

return returns

I type in "Please comment the code" (請用繁中註解) and here is the (amazing) result.

沒有留言:

張貼留言