Packages

case class Gradient[Obs, A, R, T, S[_]](config: Config[R, T], valueFn: ActionValueFn[Obs, A, Item[T]])(implicit evidence$1: Equiv[A], evidence$2: ToDouble[R], evidence$3: ToDouble[T]) extends Policy[Obs, A, R, Cat, S] with Product with Serializable

This thing needs to track its average reward internally... then, if we have the gradient baseline set, use that thing to generate the notes.

T is the "average" type.

Source
Gradient.scala
Linear Supertypes
Serializable, Serializable, Product, Equals, Policy[Obs, A, R, Cat, S], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Gradient
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. Policy
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Gradient(config: Config[R, T], valueFn: ActionValueFn[Obs, A, Item[T]])(implicit arg0: Equiv[A], arg1: ToDouble[R], arg2: ToDouble[T])

Type Members

  1. type This = Policy[Obs, A, R, Cat, S]
    Definition Classes
    Policy

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def aToDouble(obs: Obs): ToDouble[A]

    Let's try out this style for a bit.

    Let's try out this style for a bit. This gives us a way to convert an action directly into a probability, using our actionValue Map above.

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def choose(state: State[Obs, A, R, S]): Cat[A]
    Definition Classes
    GradientPolicy
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. val config: Config[R, T]
  9. def contramapObservation[P](f: (P) ⇒ Obs)(implicit S: Functor[S]): Policy[P, A, R, Cat, S]
    Definition Classes
    Policy
  10. def contramapReward[T](f: (T) ⇒ R)(implicit S: Functor[S]): Policy[Obs, A, T, Cat, S]
    Definition Classes
    Policy
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. def learn(sars: SARS[Obs, A, R, S]): This
    Definition Classes
    GradientPolicy
  16. def mapK[N[_]](f: FunctionK[Cat, N]): Policy[Obs, A, R, N, S]

    Just an idea to see if I can make stochastic deciders out of deterministic deciders.

    Just an idea to see if I can make stochastic deciders out of deterministic deciders. We'll see how this develops.

    Definition Classes
    Policy
  17. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  19. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  20. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  21. val valueFn: ActionValueFn[Obs, A, Item[T]]
  22. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Policy[Obs, A, R, Cat, S]

Inherited from AnyRef

Inherited from Any

Ungrouped