This paper tries to address “the cocktail party problem”, focusing on multi-talker speech separation and recognition-tasks where current deep learning methods still lag behind human performance.