In this paper propose a distributed adaptive opportunistic routing algorithm (d-AdaptOR) that minimizes the expected average per-packet cost for routing a packet from a source node to a destination. This is achieved by both sufficiently exploring the network using data packets and exploiting the best routing opportunities. This reinforcement learning framework allows for a low-complexity, low-overhead, distributed asynchronous implementation. The significant characteristics of d-AdaptOR are that it is oblivious to the initial knowledge about the network, it is distributed, and it is asynchronous. In particular, this learning framework leads to a stochastic routing scheme that optimally “explores” and “exploits” the opportunities in the network.