Code and Results for Chapter 5:

Introduction:
These are results and code for the problems and examples found in Chapter 5 of this famous book.

Various Figures and Problems:

Computing the State-Value function for the using the First-Visit Monte Carlo Method (for blackjack):
- cmpt_bj_value_fn.m (Monte-Carlo compuation of the state-value function)
- determineReward.m (computes the reward for this play)
- shufflecards.m (returns a shuffled deck of cards)
- handValue.m (returns the value (12-21) of a hand of cards)
- stateFromHand.m (returns the state given a hand of cards)
- sample output using the above code (results obtained when running cmpt_bj_value_nf.m)
Exploring Starts to compute the optimal policy for Blackjack:
- mc_es_bj_Script.m (Monte-Carlo compuation of the optimal policy)
- sample output using the above code (results obtained when running mc_es_bj_Script.m)
Soft Policy Evaluation to compute the optimal policy for Blackjack:
- soft_policy_bj_Script.m (Monte-Carlo compuation of the optimal policy using soft policy evalation)
- sample output using the above code (results obtained when running soft_policy_bj_Script.m)
Exercise 5.4 (The Race Track Example):
- ex_5_4_Script.m (Monte-Carlo compuation of the optimal policy using soft policy evalation)
- sample mk_rt.m creates a race track
- sample gen_rt_episode.m generates a race track episode
- sample init_unif_policy.m initialize the initial policy
- sample mcEstQ.m update the estimate of the action value function
- sample rt_pol_mod.m modified the race track problems policy
- sample velState2PosActions.m returns the possible velocity actions we could take from a given velocity state
- sample output using the above code (results obtained when running ex_5_4_Script.m)

John Weatherwax