The final optimal state value function and deterministic policy obtained when
  using exploring starts monte carlo approximation.  In this implementation I
  took the "dealer card" showing component of the state to be a number between 1
  and 13, while the book choose to collapse this to between 1 and 10.  The
  results are equivalent since the cards 11-13 all have the same face value (of
  ten).  The optimal policy and value function correspondingly have the same
  values over those ranges (as they should).  The optimal policy agrees with
  that shown in the book.  These results were obtained with 5 million monte
  carlo trials.
  We first present the optimal state value function and then the optimal policy.
  The optimal state value function looks like
      
  
   
  
  
   
  
  
   
  
  
   
  
  The optimal policies look like
  
   
  
  
   
  
  
  John Weatherwax