One of the great unsolved challenges in the cognitive and neural sciences is understanding how human listeners achieve phonetic constancy (seemingly effortless perception of a speaker's intended consonants and vowels under typical conditions) despite a lack of invariant cues to speech sounds. Models (mathematical, neural network, or Bayesian) of human speech recognition have been essential tools in the development of theories over the last forty years. However, they have been little help in understanding phonetic constancy because most do not operate on real speech (they instead focus on mapping from a sequence of consonants and vowels to words in memory). The few models that work on real speech borrow elements from automatic speech recognition (ASR), but do not achieve high accuracy and are arguably too complex to provide much theoretical insight. Over the last two decades, however, advances in deep learning have revolutionized ASR, using neural networks that emerged from the same framework as those used in cognitive models. These models do not offer much guidance for human speech recognition because of their complexity. Our team asked whether we could borrow minimal elements from deep learning to construct a simple cognitive neural network that could work on real speech. The result is DeepListener, a neural network model trained on 1000 words produced by 10 talkers. It learns to map spectral slice inputs to sparse "pseudo-semantic" vectors via recurrent hidden units. The element we have borrowed from deep learning is to use "long short-term memory" (LSTM) nodes in the hidden layer. LSTM nodes have internal "gates" that allow nodes to become differentially sensitive to variable time scales. DeepListener achieves high accuracy and moderate generalization, and exhibits human-like over-time phonological competition. Analyses of hidden units – based on approaches used in human electrocorticography – reveal that the model learns a distributed phonological code to map speech to semantics. I will discuss the implications for cognitive and neural theories of human speech learning and processing.