TheTransformernetworkconsistsofmultiplelayers,eachwithseveralAttentionHeads(andadditionallayers),usedtolearndifferentrelationshipsbetweentokens.AsinmanyNLPmodels,theinputtokensarefirstembeddedintovectorsThisslideThangLuongfromgooglebraintotalkaboutdetailtransfomernetwork