The agent Walverine that participated in the 2003 TAC Classic tournament is an incremental revision of the agent by the same name that played in TAC-02. Walverine 2002 is thoroughly described in an article appearing in Decision Support Systems. Here we describe the changes instituted for 2003.
  1. Expected holdings. In calculating marginal value for hotel rooms, Walverine takes into account current holdings of flights, closed hotels, and entertainment tickets. For 2003, we added to this list the expected holdings of open hotels. The expected holding for a particular hotel is simply the number of units for which we have outstanding bids at or above our current price estimate for that hotel. By construction, our estimate is always at least the ASK quote. 1a. Walverine 2003 also takes account of expected holdings in its interim price predictions for open hotels. In calculating the equilibrium price, we bound Walverine's own demand for the hotel to be at least the expected holdings.
  2. Hedging. Walverine 2002 generated point predictions for hotel prices, then "hedged" this estimate by assuming some probability that each individual hotel will be an outlier and cost much more. In 2003, we omitted this hedging function.
  3. Delayed flight purchases. In 2002, Walverine always bought a full set of flights for all clients at the very beginning of the game. With sufficiently accurate hotel price prediction, this is optimal since flight prices go up in expectation. However, there is intrinsic uncertainty about hotel prices which gets partially resolved when hotels start closing. The benefit of retaining flexibility in flight decisions in the face of updated hotel price information often exceeds the cost of delaying flight purchases. In 2003, Walverine delayed some flights based on a rough estimate of the tradeoff between flexibility and increased cost. First, it picks at most two flights to delay, among those that result in three or more clients staying on one night and for which there is a different flight that would still satisfy the corresponding client and reduce this demand. This has the effect of creating the flexibility to shorten client trips to reduce our demand for potentially expensive hotels. Once Walverine decides to delay a flight, it needs to monitor the flight cost to decide how long to delay it. Roughly, a rapidly increasing flight price is evidence that the rate of increase will get even steeper. If the price has remained flat then it will stay flat in expectation, suggesting a continued delay to maintain flexibility. Walverine employs a simple heuristic rule to resolve this tradeoff. Specifically, before the first hotel closing, Walverine aborts its plan to delay any flight which increases in price by 75 or more, checking once per minute. Once the first hotel closes (when we have better hotel price estimates), Walverine continues to delay any flights as long as their prices are less than 20 above their initial prices. This typically results in Walverine sticking with its initial choices of flights to delay, until after the first hotel closing when it then buys them.
  4. Entertainment values. In its equilibrium calculations, Walverine employs an estimated entertainment value for a given trip, based on average prices for entertainment tickets. In year Y (Y=2002 and 2003), Walverine used the averages from the Y-1 TAC finals.
  5. Entertainment trading policy. Walverine employs Q-learning to induce an entertainment trading policy from game experience. The actions are bid prices for buying and selling a single unit, defined in terms of offset from marginal values. In 2002, Walverine considered 15 candidate actions, extended to 30 in 2003. The 2002 agent represented value functions as explicit Q-tables based on a coarse discretization of the state space. Walverine 2003 learned an implicit representation of the value function encoded in a set of 30 neural networks, one for each action. Walverine's 2002 training data included experience from a mixture of preliminary-round TAC-02 tournament games and self-play. The 2003 policy was derived solely from preliminary-round TAC-03 tournament games.
  6. Hotel bid shading. Walverine determines an "optimal" bid shading based on a decision-theoretic calculation involving its own marginal values and a model of other agents' bids. In 2002, Walverine bid these shaded values in every game. In 2003, Walverine employed bid shading in a game with probability 0.11, and chose to bid unshaded marginal values with probability 0.89.
  7. Pricelines. Walverine chooses optimal initial trips based on initial flight prices and predicted hotel prices. In 2002, the optimization assumed linear prices for hotels (i.e., each additional unit costs the same amount). In 2003, Walverine employed a "priceline" model of nonlinear hotel prices. Specifically, if the predicted price for a hotel is p, its initial trip optimization assumed that the respective units actually will cost [p, p+1, p+10, p+1500, p+2500, p+3500, p+4500, p+5500]. Note that with linear prices and no holdings, the optimization can be solved client-by-client. With nonlinear prices the agent's optimization is not separable by client. We employed an iterative optimization algorithm based on fictitious play to calculate approximately optimal trips with respect to the pricelines.

    Walverine Team, representing the University of Michigan AI Laboratory