èÑÛÓ°Ï·
![]() |
ɽ¶«ÊÖ»ú±¨
èÑÛÓ°Ï·
¹«¹²Íø¹Ù·½Î¢ÐÅ
¹«¹²Íø¹Ù·½Î¢²©
¶¶Òô
ÈËÃñºÅ
È«¹úµ³Ã½Æ½Ì¨
ÑëÊÓÆµ
°Ù¼ÒºÅ
¿ìÊÖ
Í·ÌõºÅ
ßÙÁ¨ßÙÁ¨
èÑÛÓ°Ï·
吉米·基梅尔秀
ÊÖ»ú¼ì²ì
¡¡¡¡Ã¨ÑÛÓ°Ï·¼ÇÕß 冯彦 ±¨µÀw3u7903ejky2ywls
»úе֮Ðı¨µÀ
±à¼£ºPanda
OpenAI ·¢ÂÛÎĵįµÂÊÊÇÔ½À´Ô½µÍÁË¡£
Èç¹ûÄã¿´µ½ÁËÒ»·ÝÀ´×Ô OpenAI µÄРPDF Îļþ£¬ÄÇ´ó¶¼Ò²ÊÇÐÂÄ£Ð͵Äϵͳ¿¨»òÏà¹ØÔö²¹Îļþ»ò»ù×¼²âÊÔ£¬ºÜÉÙÓÐеÄÑо¿ÂÛÎÄ¡£
ÖÁÓÚÔÒòÂÈøù«Ë¾×Ô¼ÒµÄ ChatGPT À´Ëµ°É£º¡¸½ØÖÁĿǰ£¬OpenAI ÔÚ 2025 ÄêÔÚ arXiv ÉϹûÕæÐû²¼µÄÂÛÎÄÊýÁ¿Ïà¶Ô½ÏÉÙ£¬¿ÉÄÜ·´Ó¦ÁËÆä¶ÔÑо¿½á¹û¹ûÕæÕ½ÂԵĽ÷É÷̬¶È£¬¿ÉÄܳöÓÚÉÌÒµ±£ÃÜ»òÄþ¾²¿¼ÂÇ¡£¡¹
²»¹ý½üÈÕ£¬OpenAI ҲȷʵÐû²¼ÁËÒ»·ÝÍêÈ«ÓÉ×Ô¼ºÈ˼ÓÈëµÄ¡¢Êµ´òʵµÄÑо¿ÂÛÎÄ£¬ÆäÖÐÌá³öÁËÒ»ÖÖÓÃÓÚ¸ßЧÕÅÁ¿Ó³ÉäµÄͳһ´úÊý¿ò¼ÜLinear Layouts¡£ÕâÊÇÒ»ÖÖʹÓöþÔªÏßÐÔ´úÊý¶ø·Ç±ÈÌØÌåÏÖ£¨bit representation£©µÄÕÅÁ¿½á¹¹µÄͨÓôúÊýÐÎʽ£¬½â¾öÁË Triton µÈÉî¶Èѧϰ±àÒëÆ÷Öкã¾Ã±£´æµÄÄÑÌâ¡£
ÂÛÎÄÌâÄ¿£ºLinear Layouts: Robust Code Generation of Efficient Tensor Computation Using ?ÂÛÎĵص㣺https://arxiv.org/pdf/2505.23819.pdf
ÒªÀí½âÕâÏîÑо¿µÄÒâÒ壬Ê×ÏÈÐèÒªÏÈÀí½âÒ»ÏÂʲôÊÇÕÅÁ¿½á¹¹£¨tensor layouts£©
¼òµ¥À´Ëµ£ºÕÅÁ¿½á¹¹ = Âß¼ÕÅÁ¿ÓëÓ²¼þ×ÊÔ´£¨ÀýÈçÄÚ´æ¡¢Ï̡߳¢ÏòÁ¿µ¥Î»£©Ö®¼äµÄÓ³Éä¹ØÏµ¡£ÏÂͼ¸ø³öÁËÁ½¸ö½á¹¹Ê¾Àý¡£
¹ØÓÚÏÖ´úÉî¶ÈѧϰÊÂÇé¸ºÔØ¶øÑÔ£¬ËùÐèÒªµÄÕÅÁ¿½á¹¹ÐèÒªÂú×㼸¸öÒªÇó£º
¸ßЧ£¨ÎªÁËÐÔÄÜ£©¡£Áé»î£¨ÒÔÖ§³Ö¶àÖÖËã×Ó£©¡£¿É×éºÏ£¨ÎªÁ˱任ºÍÓÅ»¯£©¡£
È»¶ø£¬Ä¿½ñµÄ½á¹¹ÏµÍ³È´ÄÑÒÔ³ä·ÖÂú×ãÕâЩÐèÇ󣬶øÊÇÍùÍù£º
ÐèҪƾ¾Ýʵ¼ÊÐèÇóÉè¼Æ£¬²¢ÇÒÍùÍùÊÇÓ²±àÂëµÄ£¨ÐèÒªÊÖ¶¯±àд¹æÔò£©¡£²»¿ÉÀ©Õ¹£¨Ã¿Ò»¶Ô½á¹¹¶¼ÐèÒª¶þ´Î×éºÏ£©¡£ÈÝÒ×ÍÉ»¯£¬ÓÈÆäÊÇÔÚÏñ Triton ÕâÑùµÄµÍ²ã¼¶µÄºó¶ËÖÐ ¡ª¡ª ½ØÖÁĿǰ£¬Triton µÄ GitHub ¿âÖÐÌá½»µÄ 12% µÄ Bug Óë½á¹¹Óйء£
ÁíÍ⣬Éî¶ÈѧϰӲ¼þ£¨Èç GPU£©µÄÈÕÒæÅÓ´óÒ²µ¼ÖÂÕÅÁ¿½á¹¹ÈÕÒæÅÓ´ó¡£
ÀýÈ磬ΪÁËʵÏÖ¸ßЧµÄ¾ØÕó³Ë·¨£¬Ó¢Î°´ïÔÚ Ampere¡¢Hopper ºÍ Blackwell µÈ²î±ð´ú¼ÊµÄ GPU ÉϽÓÄÉÁ˲î±ðµÄʹÓà Tensor Core µÄ½á¹¹£¬²¢ÇÒÿÖֽṹÔÚʹÓòî±ðÊý¾ÝÀàÐÍʱ¶¼Óвî±ðµÄ±äÌå¡£AMD ºÍÓ¢ÌØ¶ûµÈÆäËü GPU ¹©Ó¦ÉÌÔÚÀûÓÃÆäÀàËÆ Tensor Core µÄ¼¼Êõ½øÐмÓËÙʱ£¬Ò²Ê¹ÓÃÁ˲î±ðµÄ½á¹¹¡£Òò´Ë£¬Ó²¼þ¼Ü¹¹µÄ¿ìËÙÉú³¤ºÍ¶àÑù»¯µÄÉî¶ÈѧϰģÐÍÐèÒªÒ»ÖÖеÄÕÅÁ¿½á¹¹½¨Ä£ÒªÁì¡£
Ϊ´Ë£¬ÐèÒª½â¾öһЩ¼¼ÊõÄÑÌ⣺
ÔÚ½«ÕÅÁ¿Ó³Éäµ½Ó²¼þ×ÊÔ´·½Ã棬ÐèÒªÒ»ÖÖͨÓÃÇÒ¿É×éºÏµÄÌåÏÖÒªÁì¡£½á¹¹×ª»»Ó¦¸ÃÓÃͳһµÄÐÎʽÀ´±í´ï£¬ÉõÖÁÐèÒª°üÀ¨ÖîÈçÊý¾Ý½»»»£¨data swizzling£©µÈÅÓ´ó±ä»»¡£ÕâÖÖÌåÏÖ±ØÐëÓë³õ¼¶Ó²¼þÓÅ»¯Î޷켯³É£¬ÒÔÈ·±£¸ßЧµÄÊý¾Ý»á¼ûºÍÅÌËã¡£
²»¹ý£¬ÔÚ½éÉÜ OpenAI ÕâÆªÂÛÎĵÄТ¾´Ö®Ç°£¬ÎÒÃÇÐèÒªÏÈÁ˽âһЩ»ù´¡¿´·¨¡£
Ïà¹ØÅ侰֪ʶ
GPU ¼Ü¹¹
ÔÚÉè¼ÆÉÏ£¬ÏÖ´ú GPU µÄÄ¿±êÊÇͨ¹ý°üÀ¨¶à²ãÓ²¼þ×ÊÔ´µÄ·Ö²ãÖ´ÐÐÄ£ÐÍÀ´³ä·ÖÀûÓò¢ÐÐÐÔ¡£
ÆäÒªº¦Ö´Ðе¥Î»°üÀ¨Ð×÷Ïß³ÌÕóÁÐ (CTA)¡¢Warp ºÍÏ̡߳£Ã¿¸ö GPU Ï̶߳¼¿ÉÒÔ»á¼û˽ÓмĴæÆ÷ ¡ª¡ª ÕâЩ¼Ä´æÆ÷Ìṩ×îµÍÑӳٵĴ洢¿Õ¼ä£¬µ«ÈÝÁ¿ÓÐÏÞ¡£Í¨ÀýÖ¸Áî¿ÉÒÔÓɸ÷¸öÏ̶߳ÀÁ¢Ö´ÐС£È»¶ø£¬Ä³Ð©ÌØÊ⹦Чµ¥Î»±ØÐëÔÚ¸ü¸ßµÄÁ£¶È¼¶±ðÉÏÖ´ÐС£
ÀýÈ磬Ӣΰ´ïµÄ mma£¨¾ØÕó³Ë·¨ÀÛ¼Ó£©Ö¸ÁîÀûÓà Tensor Core µÄ·½·¨ÊDz¢ÐÐÖ´ÐÐÓɸ÷¸ö Warp ·¢³öµÄ¶à¸ö³Ë¼ÓÔËËã¡£¶ø wgmma£¨Warp ×龨Õó³Ë·¨ÀÛ¼Ó£©µÈ¸ß¼¶±äÌåÔòÊÇͨ¹ýÔÚ¶à¸ö Warp ÉÏͬʱִÐоØÕó³Ë·¨¶ø¶ÔÕâЩ¹¦Ð§½øÐÐÁËÀ©Õ¹¡£AMD Ò²ÒýÈëÁËÀàËÆµÄÔÓÀýÈç mfma£¨¾ØÕóÈںϳ˼ӣ©Ö¸Áî¡£
Çë×¢Ò⣬ÕâЩָÁîÒªÇóÊý¾ÝÂþÑÜÔÚÏß³ÌºÍ Warp Ö®¼ä£¬»òÕßÒÔÌØÊâ½á¹¹×¤ÁôÔÚ¹²ÏíÄÚ´æ»òÌØÊâÄڴ浥루ÀýÈç Blackwell É쵀 Tensor Memory£©ÖУ¬²Å»ª±¬·¢ÕýÈ·µÄ½á¹û¡£
È»¶ø£¬ÕâЩ½á¹¹Í¨³£²»»áΪ¼ÓÔØ / ´æ´¢µÈÆäËû²Ù×÷´øÀ´×î¼ÑÐÔÄÜ£¬²¢ÇÒ²¢·Ç×ÜÊÇ¿ÉÒÔʹÓÃÌØ¶¨Ö¸ÁÊý¾ÝÖ±½Ó´ÓÈ«¾ÖÄÚ´æ¸´ÖÆµ½ÌØÊâÄڴ浥λ¡£
Òò´Ë£¬Í¨³£±ØÐë¶ÔÊý¾Ý½øÐÐÖØÐÂÅÅÁУ¬ÒԱ㽫ÓÃÓÚÄÚ´æ»á¼ûµÄ½á¹¹×ª»»ÎªÅÌË㵥λƫºÃµÄ½á¹¹¡£
¼ò¶øÑÔÖ®£¬ÒªÊµÏÖ·åÖµÐÔÄÜ£¬²»µ«ÐèÒªÀûÓÃÕâЩרÓõ¥Î»£¬»¹ÐèÒª¾«ÐÄÉè¼ÆÕÅÁ¿½á¹¹ºÍת»»¡£
Triton ÓïÑԺͱàÒëÆ÷
Triton ÊÇÒ»ÖÖÀàËÆÓÚ Python µÄÓÃÓÚÌØ¶¨ÁìÓòµÄÓïÑÔ£¬ÆäÉè¼ÆÄ¿±êÊÇÌṩÓÃÓÚ±àд¸ßÐÔÄÜÉî¶ÈѧϰÔÓïµÄÁé»î½Ó¿Ú¡£Triton µÄ±àÒëÆ÷ºó¶ËʹÓÃÁË MLIR£¬Ö§³Ö¶àÌõÀíÁýͳ±í´ï¡£
¾¿Æä½¹µã£¬Triton ÄÚºË×ñѵ¥³ÌÐò´ó¶¼¾Ý (SPMD) Ä£ÐÍ£¬ÆäÖÐÅÌËã±»»®·ÖΪ¶à¸öÁýͳµÄ Triton ³ÌÐòʵÀý¡£ÕâÖÖÉè¼ÆÔÊÐí¿ª·¢ÕßÖ÷Òª¹Ø×¢ CTA ¼¶±ðµÄ²¢ÐÐÐÔ¼´¿É¡£ÔÚ Triton ÖУ¬¡¸ÕÅÁ¿¡¹Ò»´ÊÖ¸µÄÊÇ´ÓÔʼ PyTorch ÕÅÁ¿ÖÐÌáÈ¡µÄ¿é£¬ËüÃÇÓÃ×÷ GPU ºËµÄÊäÈëºÍÊä³ö¡£
ÔÚ±àÒëÀú³ÌÖУ¬Triton µÄ Python ´úÂëÊ×Ïȱ»·Òë³É Triton ·½ÑÔ (tt)£¬È»ºó½øÒ»²½·Òë³É TritonGPU ·½ÑÔ (ttg)¡£ÔÚ´ËÀú³ÌÖУ¬Ã¿¸öÕÅÁ¿¶¼ÓëÌØ¶¨µÄ½á¹¹Ïà¹ØÁª£¬ÒÔ³ä·ÖÀûÓÃÏÖ´ú GPU ÉÏ¿ÉÓõÄÓ²¼þ¹¦Ð§µ¥Î»¡£ÀýÈ磬µ±Óöµ½ dot ÀàËã×Ó£¨ÀýÈç tt.dot ºÍ tt.dot_scaled£©Ê±£¬»á½ÓÄÉ mma ½á¹¹²¢Ê¹Óà Tensor Core ºÍÀàËÆµÄµ¥Î»¡£
¹Å°å½á¹¹
ͼ 2 ÁгöÁË Triton ÖÐËùÓпÉÓõĽṹ¡£
ÔÚ×î¸ß²ã¼¶£¬½á¹¹·ÖΪÂþÑÜʽ£¨Distributed£©½á¹¹ºÍÄڴ棨£¨Memory£©½á¹¹¡£Ç°ÕßÊÇÖ¸ÕÅÁ¿ÔªËØÂþÑÜÔÚ²î±ðµÄÖ´Ðе¥Î»ÖУ¬´ËºóÕßÊÇÖ¸ÕÅÁ¿ÔªËØ´æ´¢ÔÚÌØ¶¨µÄÌØÊâÄÚ´æÖС£
ÂþÑÜʽ½á¹¹ÓֿɽøÒ»²½·ÖΪ Blocked¡¢Sliced¡¢MMA ºÍ MMA Input ½á¹¹µÈÀàÐÍ£¬¶øÄÚ´æ½á¹¹ÓֿɽøÒ»²½·ÖΪ Unswizzled ºÍ Swizzled ½á¹¹¡£
Blocked ½á¹¹Í¨³£ÓÃÓÚÁ¬ÐøµÄÄÚ´æ»á¼û¡£MMA ºÍ MMA ÊäÈë½á¹¹ÓÃÓÚ¾ØÕó³Ë·¨ÔËË㣨ÀýÈç tt.dot£©µÄÊä³öºÍÊäÈë¡£MMA ½á¹¹¿ÉÒÔÆ¾¾ÝÆäÓ³Éäµ½µÄÓ²¼þÖ¸Áî½øÒ»²½·ÖÀ࣬ÀýÈçӢΰ´ï GPU É쵀 mma ºÍ wgmma£¬»ò AMD GPU É쵀 mfma¡£Sliced ½á¹¹ÊÇ´ÓÆä¸¸½á¹¹ÖÐÌáȡһ¸öά¶È£¬ÓÃ×÷¹ã²¥»òij¸ö¹éÔ¼ÔËËãµÄÊä³ö¡£
¹Å°å Triton ½á¹¹ÏµÍ³ÒªÇóÿ¸ö½á¹¹½ç˵×Ô¼ºµÄ½Ó¿ÚÒªÁ죬ÀýÈçÿ¸öÏ̵߳ÄÔªËØÊýÁ¿ºÍÁ¬ÐøÔªËصÄÊýÁ¿¡£±ðµÄ£¬±ØÐëΪÿ¸ö½á¹¹ÏÔʽʵÏÖ¶ÔÕÅÁ¿ÔªËصÄË÷ÒýÒÔ¼°½á¹¹Ö®¼äµÄת»»¡£ÕâÖÖÒªÁìµ¼Ö½ṹ½á¹¹ºÍת»»³£·ºÆð bug¡£
Linear Layouts£¨ÏßÐԽṹ£©
ÏÂÃæ½«¼òµ¥½éÉÜÏßÐԽṹµÄ½ç˵¡¢Ò»Ð©»ù±¾µÄÏßÐԽṹËã×Ó¡¢´´Á¢ÖÖÖÖ Triton ½á¹¹ÒÔ×÷ΪÏßÐԽṹʵÀý£¬ÒÔ¼°Ó¦ÓÃÓÚ Triton µÄͨÓýṹÒýÇæ¡£
Ò»¸öʾÀý
ÔÚ GPU ±à³ÌÖУ¬´ó´ó¶¼²ÎÊý¶¼ÊÇ 2 µÄÃÝ£ºÒ»¸ö Warp ÓÉ 32 »ò 64 ¸öÏß³Ì×é³É£¬Ò»¸ö Warp ×é°üÀ¨ 4 ¸ö Warp£¬¾ØÕó³Ë·¨ÄÚÁªº¯Êý£¨ÀýÈç mma ºÍ wgmma£©ÒªÇó Tile ³ß´çΪ 16 ¡Á £¬ÆäÖÐ ¡Ý 1¡£
±ðµÄ£¬ÔÚ Triton µÄ±à³ÌÄ£ÐÍÖУ¬ÕÅÁ¿µÄά¶ÈÒÔ¼°Óëÿ¸öÕÅÁ¿Ïà¹ØµÄ½á¹¹×Ó²¿·Ö£¨ÀýÈçÿ¸öÏ̵߳ļĴæÆ÷ºÍÏß³ÌÊýÁ¿£©¶¼±»ÏÞÖÆÎª 2 µÄÃÝ¡£ÔÚͼ 1 ÖУ¬½á¹¹ A ÓÐÒ»¸ö 16 ¡Á 16 µÄÕÅÁ¿£¬ÆäʹÓÃÁ˶à¸ö 2 ¡Á 2 µÄ¼Ä´æÆ÷¡¢4 ¡Á 8 µÄÏß³ÌºÍ 2 ¡Á 1 µÄ Warp¡£
ÓÉÓÚÕâЩÁ¿¶¼ÊÇ 2 µÄÃÝ£¬Òò´ËʹÓÃÆä×ø±êµÄ±ÈÌØÌåÏÖ£¬¿ÉÒÔÖ±¹ÛµØ¿ÉÊÓ»¯½á¹¹ A ÖÐÔªËØµÄÂþÑÜ£¨Èçͼ 1 Ëùʾ£©¡£ËùÓÐÏ̵߳ļĴæÆ÷ 0 (_0) ¶¼Î»ÓÚ×ø±ê (, )£¬ÆäÖÐ ºÍ µÄ×îºó¼¸Î»£¨bit£©¾ùΪ 0¡£ÀýÈ磬Ïß³Ì _1 µÄ _0 λÓÚ (0, 2) = (000, 010)¡£×÷Ϊ±ÈÕÕ£¬_1 ÔªËØµÄ×ø±êÖУ¬ µÄ×îºóһλʼÖÕΪ 0£¬¶ø µÄ×îºóһλʼÖÕΪ 1¡£ÀýÈ磬_9 µÄ _1 λÓÚ (2, 3) = (010, 011)¡£
±ðµÄ£¬¹ØÓÚÈκÎżÊýÏß³Ì _£¬ µÄ×îºóһλÓë _0 ÖÐ µÄµ¹ÊýµÚ¶þλƥÅ䣬 µÄµ¹ÊýµÚ¶þλÓë _0 ÖÐ µÄµ¹ÊýµÚÈýλƥÅä¡£ÀýÈ磬_10 = _01010 µÄ _0 λÓÚ (2, 4) = (010, 0100)¡£ÕâÖÖϵͳÐÔ¶ÔÆëÁ¬Ðø±£´æ£¬±êÃ÷¶þ´ÎÃݽṹ×ãÒÔÇåÎúµØ¾ö¶¨ÁËÿ¸öÏß³ÌÔªËØµÄÂþÑÜ¡£
×ÛÉÏËùÊö£¬¼ÙÉèÒ»¸ö¾ÞϸΪ 8 µÄÏòÁ¿ ÌåÏÖÒ»¸ö Warp ÖÐÏ̵߳ÄÒ»¸öÔªËØ£¬ÆäÖÐǰ 2 λÌåÏּĴæÆ÷ (Reg)£¬½ÓÏÂÀ´µÄ 5 λÌåÏÖÏß³Ì (Thr)£¬×îºóһλÔòÌåÏÖ Warp (Wrp)£¬Ôò¿ÉÒÔÈç´Ë½ç˵½á¹¹ £º
µ±ÐèÒª´ÓÂß¼ÕÅÁ¿µÄ×ø±êÖлָ´Ó²¼þË÷Òýʱ£¬ÐèҪʹÓÃÇóÄæÔËËã¡£
¶ÔÏßÐԽṹµÄ¸üÏêϸÍ걸ÐÔ˵Ã÷Çë»á¼ûÔÂÛÎÄ£¬ÆäÖÐÉæ¼°µ½ËµÃ÷·Ö¿é½á¹¹¡¢mma ºÍ wgmma µÄÊäÈëºÍÊä³ö½á¹¹¡¢ÏßÐԽṹµÄ slice¡¢Ã¿¸öÂþÑÜʽ½á¹¹¡¢MMA swizzled ½á¹¹¡¢ÄÚ´æ½á¹¹¶¼ÊÇÏßÐԽṹ¡£ÁíÍ⣬OpenAI Ò²ÔÚ Triton ˵Ã÷ÁËÈçºÎʵÏֽṹת»»ÒÔ¼°ÐÎ×´²Ù×÷¡£
²»µ«Èç´Ë£¬OpenAI ÌåÏÖ£¬ÏßÐԽṹΪÔÚÓïÑÔǰ¶ËºÍ±àÒëÆ÷ºó¶Ë¿ª·¢Ëã·¨ÌṩÁ˽ṹ»¯µÄ»ù´¡¡£ËûÃÇÒ²ÔÚÂÛÎÄÖиø³öÁËһЩҪº¦Ê¾Àý£¬ÕâÀï¾Í²»¹ý¶àÕ¹¿ª¡£½ÓÏÂÀ´¼òµ¥¿´¿´ÐÂÌá³öµÄÏßÐԽṹµÄʵ¼ÊÌåÏÖ¡£
ÆÀ¹À
OpenAI ½«ÓÅ»¯°æ Triton£¨¼¯³ÉÁË»ùÓÚÏßÐԽṹµÄÓÅ»¯£¬¼´ Triton-Linear£©Óëδ¼¯³ÉÕâЩÓÅ»¯µÄ»ù×¼ Triton ½øÐÐÁ˱Ƚϡ£Triton ºÍ TritonLinear Ö®¼äµÄÖ÷񻂿±ðÈçÏ£º
Triton ʹÓùŰåµÄÊý¾Ý½á¹¹£¬²»Ö§³ÖÈÎÒâÂþÑÜʽ½á¹¹µÄʵÓóÌÐò»òËüÃÇÖ®¼äµÄת»»£¬Òò´ËÈÝÒ×·ºÆð bug¡£Triton δ½ÓÄÉÂÛÎÄÖÐÃèÊöµÄÓÅ»¯´úÂëÉú³É¡£ÀýÈ磬½á¹¹×ª»»Ê¼ÖÕͨ¹ý¹²ÏíÄÚ´æ½øÐУ¬¶Ô¸ßЧӲ¼þÔÓïµÄʹÓÃÓÐÏÞ¡£
¼ÓÈëÆÀ¹ÀµÄÓ²¼þƽ̨¼û±í 1¡£
ΪÁË±È½Ï Triton ºÍ Triton-Linear µÄÐÔÄÜ£¬¸ÃÍŶӹ¹½¨ÁËһЩºÏ³É΢»ù×¼À´½øÐвâÊÔ£¬Õâ·½ÃæµÄ½á¹ûÇë»á¼ûÔÂÛÎļì²ì¡£ÕâÀï½ö¿´¿´ËüÃÇÔÚʵ¼Ê»ù×¼²âÊÔÖÐÌåÏÖ¡£
ÔÚÈý¸ö²î±ðµÄƽ̨ÉÏ£¬OpenAI ÔËÐÐÁË TritonBench ÖÐµÄ 18 ¸ö»ù×¼²âÊÔ¡£Í¼ 7¡¢Í¼ 8 ºÍͼ 9 ÖÐչʾÁË Triton-Linear ÔÚÈý¸öƽ̨ÉϵÄÐÔÄÜÌáÉý¡£
ÓÉÓÚÿ¸ö»ù×¼²âÊÔ°üÀ¨¶à¸öÊäÈ룬×Ü¼Æ 420 ¸ö°¸Àý£¬Òò´ËËûÃÇʹÓÃÁËÎó²îÏߣ¨error bars£©À´ÌåÏÖÿ¸ö»ù×¼²âÊÔµÄ×îСºÍ×î´ó¼ÓËÙ¡£
ÐèҪעÒâµÄÊÇ£¬ÓÉÓÚÓ²¼þÏÞÖÆ£¬²¢·ÇËùÓлù×¼²âÊÔ¶¼ÊÊÓÃÓÚÿ¸öƽ̨¡£ÀýÈ磬ijЩ»ù×¼²âÊÔÐèÒª½öÔÚ GH200 ÉϲÅÓеĴóÐ͹²ÏíÄڴ棬¶øÒ»Ð©ºËʹÓõÄÕÅÁ¿ÃèÊö·ûÒÀÀµÓÚ TMA ÒýÇæ£¬¶ø RTX4090 ºÍ MI250 ÉϾù²»Ö§³Ö TMA ÒýÇæ¡£
¿ÉÒÔ¿´µ½£¬ÔÚ GH200 ÉÏ£¬ËûÃÇʵÏÖÁË 0.92 ±¶µ½ 1.57 ±¶²»µÈµÄ¼ÓËÙ£¬ËùÓлù×¼²âÊÔµÄÆ½¾ù¼ÓËÙ¾ùÁè¼Ý 1.0 ±¶¡£¼ÓËÙ×îÏÔÖøµÄ»ù×¼²âÊÔÊÇ int4_gemm¡¢ops_gemm ºÍ streamk_gemm¡£
¿ÉÒÔÊӲ쵽£¬¸ßЧµÄÓ²¼þÔÓÀýÈç ldmatrix ºÍ stmatrix£©ÔÚÕâЩºËÖб»¹ã·ºÓÃÓڽṹת»»ÒÔ¼°¹²ÏíÄÚ´æµÄ¼ÓÔØºÍ´æ´¢²Ù×÷¡£ÖµµÃ×¢ÒâµÄÊÇ£¬layer_norm ʵÏÖÁË´Ó 0.99 ±¶µ½ 1.57 ±¶µÄ¼ÓËÙ ¡ª¡ª ÔÚ²î±ðÐÎ×´Ö®¼äÌåÏÖ³öÁËÏÔÖø²î±ð¡£¹ØÓÚijЩÊäÈëÐÎ×´£¬Triton-Linear Äܹ»¼ì²â¡¸µÈЧ¡¹½á¹¹Ö®¼äµÄת»»£¬´Ó¶ø½«×ª»»Àú³Ì½µµÍΪ no-op£¨ÎÞ²Ù×÷£©¡£ÕâÖÖÓÅ»¯Ôھɰæ½á¹¹ÏµÍ³ÖÐÎÞ·¨ÊµÏÖ£¬ÒòΪËüÎÞ·¨Ö±½Ó±È½Ï²î±ðÀàÐ͵Ľṹ£¨ÀýÈ磬Blocked ½á¹¹ºÍ Sliced ½á¹¹£©¡£
ÔÚ RTX4090 ÉÏ£¬ÐÂÒªÁìʵÏÖÁË 1.00 ±¶µ½ 1.51 ±¶µÄ¼ÓËÙ¡£ÓÉÓÚ mma (RTX4090) ºÍ wgmma (GH200) Ö¸ÁîÖ®¼äµÄ²î±ð£¬ËûÃÇÔÚ template_attention ÉÏʵÏÖÁ˸ü¸ßµÄ¼ÓËÙ¡£ÔÚ±¾ÀýÖУ¬tt.dot ÔËËãµÄ×ó²Ù×÷ÊýÔÚÑ»·Íⲿ½ç˵£¬»áÖØ¸´´ÓͬһµØµã¼ÓÔØÊý¾Ý£¬Òò´Ë ldmatrix ºÍͨÀý¹²ÏíÄÚ´æÖ¸Áî¾ù¿ÉʵÏÖ¸ßÍÌÍÂÁ¿¡£ËäÈ»ÓÒ²Ù×÷ÊýÔÚÿ´Îµü´úÖж¼»á¸üУ¬µ« wgmma »áÖ±½ÓÔÚ¹²ÏíÄÚ´æÖлá¼ûËü£¬Ö»ÓÐÔÚ RTX4090 ÉÏ£¬¾¹ýÓÅ»¯ºó£¬Ëü²Å»á±»½µ¼¶µ½ ldmatrix ÖС£Òò´Ë£¬ÔÚ GH200 ÉÏʵÏֵļÓËÙÏà¶Ô½ÏµÍ¡£ÔÚ MI250 ÉÏ£¬ÐÂÒªÁìʵÏÖÁË 0.98 ±¶µ½ 1.18 ±¶µÄ¼ÓËÙ¡£
×ÜÌå¶øÑÔ£¬ÓÉÓÚȱ·¦ ldmatrix µÈ¸ßЧµÄÓ²¼þÔÓTriton-Linear ÔÚ AMD GPU ÉÏʵÏֵļÓËÙµÍÓÚÔÚӢΰ´ï GPU µÄ¡£
¹ØÓÚ OpenAI Open µÄÕâ¸öÑо¿£¬ÄãÓÐʲô¿´·¨ÄØ
??ʱÊÂ1£ºwww成人🔞色情美女露
??06ÔÂ07ÈÕ,两岸经贸受台选举影响? 国台办:继续促进经济交流合作,
??06ÔÂ07ÈÕ,美国波特兰一座教堂发生火灾致两人死亡,
¡¡¡¡µÚÈý£¬ÄãÐèÒªÔÚʧ°ÜÖв»¾ø×ܽá³öеÄÊʺÏ×Ô¼ºµÄѧϰҪÁ죬²¢Öƶ¨Ò»Ì×ÑϽ÷µ«¿É²Ù×÷µÄѧϰ¼Æ»®£¬Õâ»áÈÃÄãµÄŬÁ¦¸ü¸»ÓÐʵЧ¡£
,美女扒开腿❌裸体网站蓝莓,www.17c.com喷水少妇,绝母动漫第一季更新内容¡£??ʱÊÂ2£º国产农村妇女XXXⅩ性高湖
??06ÔÂ07ÈÕ,江苏无锡首条市域轨道交通线开通运营,
¡¡¡¡¡°°¡¡¡¡±±»·ÏµÄÖÐÄêÈËÏñÊÇ×öÁËÒ»³¡¶ñÃΣ¬¾ªÐѹýÀ´£¬´ó½ÐµÀ£º¡°×峤£¬ÄãÒªÌæÎÒÅê»÷°¡£¡¡±
,速播成人网站黄黄美女裸色软件,初学生疯狂❌喷水自慰网站,欧美做受高潮9吃奶¡£??06ÔÂ07ÈÕ,迟福林:中国的发展和改革对世界是重大利好,
¡¡¡¡ÕâÒ»ÈÕ£¬²»ÉÙÈË·×·×ÍÑÊÖ£¬¸÷µØÍ¨µÀßËßË×÷Ï죬µ«ËûÃǽÔï¡Óð¶ø¹é£¬È«¶¼Ê§°ÜÁË¡£
,美女免费视频,老婆~才一根手指就受不了了作文,小心🐤伸入女人的🍑视频¡£??ʱÊÂ3£º打屁股 和 鞭打 网站
??06ÔÂ07ÈÕ,【这个城市有点潮】福建武夷山:从茗茶“大红袍”中探寻茶文化,
¡¡¡¡ÏÖÔÚËû¾ÅËê¶à£¬¿ìÊ®ËêÁË£¬²»±ØÏëÒ²ÖªµÀ£¬×¢¶¨ÈçÒ»ÂÖèè²ÉñÑô°ã£¬ÐüÓڹŹúÉϿգ¬¹âÏßÒ«ÑÛ£¬ÍòÖÚÖõÄ¿¡£
,黄片一级aaa区午夜wv,911在线无码精品㊙️软件,亚洲A片人獸交XXOO¡£??06ÔÂ07ÈÕ,广州今年夏季天数打破当地最长夏天纪录,
¡¡¡¡(Ò»)¸ßÆðµã¼Æ»®¡£¼Æ»®ÊǶ¼»á½¨ÉèµÄ¸ÙÒª£¬Àֳɵļƻ®¿ÉÒÔ½ÚÔ¼´ó×ÚµÄ×ÊÔ´£¬¿ÉÒÔʹ¶¼»á»ñµÃÁ¬ÐøÉú³¤µÄ¶¯Á¦¡£ÎªÁ˸ãºÃÀúÊ·ÎÄ»¯Ãû³Ç±£»¤£¬½ñÄêÎÒÃÇ¿ÉÄÜÒª×Ý�²ðǨһЩÑÏÖØÓ°ÏìÊÐÈÝ·çòµÄ½¨Öþ¡£´ÓÀúÊ·ÎÄ»¯Ãû³Ç±£»¤µÄ½Ç¶È¿´£¬±ØÐëÕâÑù×ö¡£ÎÒÃǼƻ®²ðµÄÕâЩ½¨ÖþÎ120xxÄêÒÔÉϵÄûÓУ¬50ÄêÒÔÉϵÄҲûÓУ¬´ó´ó¶¼Êǽü30ÄêÀ´Â½Ðø²»°´¼Æ»®½¨ÉèµÄ¡£ÎÒÃǵͼ»á»¯Éú³¤×î´óµÄ벡¾ÍÔÚÓÚÎÒÃÇ×Ô¼ºÔì¾ÍÁËÒ»´óÅúµÄÀ¬»ø½¨Öþ£¬ÕâÒ²ÊÇÎÒÃǶ¼»á»¯½¨Éè²»µÃ²»Ö§¸¶µÄ¼ÛÇ®ºÍѧ·Ñ¡£ËùÒÔ£¬ÎÒÃÇÍÆ½øÐÂÐͳÇÕò»¯£¬Èç¹û»¹²»ÖØÊӼƻ®£¬»¹Ôڼƻ®ÉÏ·¸ÕâÑùÄÇÑùµÄ¹ýʧ£¬ÌرðÊÇÔڼƻ®ÉϼÌÐø·¸ÍÆ·ÐÔºÍϵͳÐԵĹýʧ£¬ÄÇôÎÒÃǵľö²ßÕ߾ͻá³ÉΪÀúÊ·µÄ×ïÈË¡£Òò´Ë£¬¶¼»á½¨Éè±ØÐë¸ßÆðµã¼Æ»®£¬±ØÐë°ü¹Ü½Ï¸ßÌõÀí¡£ÕÅÒ´µÄ¶¼»á¼Æ»®£¬Òª×¢ÖØÎå¸öÌõÀí¡£µÚÒ»¸öÌõÀí£¬ÊÇÁ½¸ö¡°1+5¡±µÄ¶¼»á¿ò¼ÜÌåϵ£¬Ò²¾ÍÊÇÒ˾ÓÒËÓÎÉú̬¶¼»á½¨Éè¼Æ»®ºÍÊÐÇøÖ÷³ÇÇø¼Ó5¸ö¹¦Ð§Çø¼Æ»®¡£Õâ¸ö¼Æ»®ÏÖÔÚÒѾ¸ÅÂÔÐγɣ¬Ö÷ÒªÊÇÍêÉÆÖ´ÐеÄÎÊÌâ¡£µÚ¶þ¸öÌõÀí£¬ÊǶ¼»áµÄ×ÜÌ弯»®£¬Ö÷ÒªÊÇÖ¸ÊÐÇøºÍÎåÏØÏØ³Ç¡£ÕâÏîÊÂÇéÏÖÔÚÒ²¸ÅÂÔÍê³ÉÁË£¬ËùÊ£µÄÈÎÎñδ¼¸£¬ÓÐһЩ¼Æ»®Ëæ×ÅÐÎÊÆµÄÉú³¤¿ÉÄÜÐèÒªÖØÐÂÐޱࡣµÚÈý¸öÌõÀí£¬Æ¬ÇøµÄ¿Ø¹æºÍ½¨ÉèÐÔÏê¹æ¡£Õâ¸ö¼Æ»®ÊÇÎÒÃÇÏÖÔÚ×ÈõµÄ»·½Ú£¬Ò²ÊDzî±ð×î´óµÄµØ·½£¬¸üÊǽ«À´Ðγɶ¼»á·çòµÄ×îÒªº¦µÄ»·½Ú¡£¶ÔÕâ¸ö¼Æ»®£¬ÎÒÃÇÆÕ±éÖØÊÓ²»·ó£¬Ö´Ðв»Ñϸñ£¬Ë®Æ½ºÍÌõÀí±È½ÏµÍ£¬ÐèÒª×ÅÁ¦ÔöÇ¿£¬½ñÄêÄÚÊÐÇøµÄÆ¬Çø¿Ø¹æºÍ½¨ÉèÏê¹æÒª»ù±¾Íê³É£¬¸÷ÏØµÄÏê¹æºÍ¿Ø¹æ×îÍíÒ²ÒªÓÚÃ÷ÄêÄêµ×ǰÍê³É£¬Ê¹¶¼»á½¨ÉèÓÐÕ¿ÉÑ¡£µÚËĸöÌõÀí£¬Ð¡³ÇÕò¼Æ»®¡£ÎÒÃǵÄ65¸öÏçÕò£¬Ò»²¿·Ö×öÁ˼ƻ®£¬Ò»²¿·Ö»¹Ã»×ö£¬ÕâÒ²ÊÇÎÒÃÇÏÂÒ»²½Ø½´ýÔöÇ¿µÄÖØµã»·½Ú¡£µÚÎå¸öÌõÀí£¬Ïç´å¼Æ»®¡£ÎÒÃÇÕ⼸Äê¸ãÁË´ó×ڵġ°ËÄ»¯¡±Ð´壬ƫÏòÊÇÕýÈ·µÄ£¬Ð§¹ûÒ²ÊÇÃ÷ÏԵ쬵«ÎÊÌâÒ²ÊÇÍ»³öµÄ£¬ÒòΪÐí¶à´å×ÓûÓÐ×öµ½¼Æ»®ÏÈÐУ¬ÁôÏÂÁËÐí¶àÒź¶¡£Òò´Ë£¬ÎÒÃÇҪƾ¾Ý¸÷ÏØÇø²î±ðµÄ¹¦Ð§¶¨Î»ºÍÊÂÇéʱÐò£¬¼ÓËÙ×ܹæÐޱ࣬ÃÖ²¹Ïê¹æºÍ¿Ø¹æµÄȱʧ£¬Ôúʵ×öºÃ´åÕòÁ½¼¶¼Æ»®¡£¸÷ÏØÇøÔÚÐÞ±àÉ󶨼ƻ®µÄʱºò£¬Òª»ý¼«ÓëÈ«¹úÖªÃûµÄ¼×¼¶×ÊÖÊÉè¼Æµ¥Î»ÐγÉÁªÃË£¬ÓÉËûÃÇÀ´×ö¿´·¨ÐԼƻ®ºÍ×ÜÌ彨É裬ȻºóÓÉÎÒÃǵÄÉè¼Æµ¥Î»À´×öÊ©¹¤Í¼Ö½Éè¼Æ¡£
,银狼和布洛妮娅是一个人吗,扒开老师❌狂揉❌桃花岛动漫,性一一交一一乳一一乱睡觉¡£??ʱÊÂ4£º上课突然硬了女同桌帮我自慰
??06ÔÂ07ÈÕ,“中国龙虾之都”江苏盱眙民俗巡游文化味浓,
¡¡¡¡Ô¶·½£¬ÆäËû¸»¼ÒµÄÇ¿ÕßÒ²¶¼µ¹ÎüÁËÒ»¿ÚÀ䯸£¬ºÃÇ¿µÄ´óħÉñ£¬ÁîÈ˾ªã¤£¬Ò»±ðÊ®¼¸Äê¶ø¹é£¬Ôٴκ³¶¯»Ê¶¼¡£
?µÚ¶þ°ÙÆßʮՠ»Ê¶¼¶Ô¾ö,斗罗大陆❌18禁成人游戏,扶住周慧敏翘臀挺进去第几集,男同Gay做受Gay片¡£??06ÔÂ07ÈÕ,暑运客流稳中有增 “流动的中国”彰显活力,
¡¡¡¡Ð¡²»µãÁ¢¼´º®Ã«µ¹Êú£¬¼¹±³Éú³öÒ»¹ÉÁ¹Æø£¬ÄǽðÉ«µÄ¹âµã¿É²»ÊÇÓêË®£¬¶øÊǼÀÁéµÄ½ðÉ«ÁÛÆ¬£¬Ëü¾¹ÓÚһ˲¼äÍÑÏÂÂúÉíÁÛ¼×£¬·¢³öÁËÕâµÈ¿Ö²ÀµÄ¹¥»÷¡£
,美女挤自己奶㊙️视频,老婆被快递员干了一下午,兔女郎爆乳❌永久不会删除链接¡£Ôð±à£º詹文
ÉóºË£º郑和
Ôð±à£º崔日用
Copyright (C) 2001- dzwww.com. All Rights Reserved
ÐÂÎÅÐÅϢЧÀÍÐí¿ÉÖ¤ - ÒôÏñÖÆÆ·³öÊéÐí¿ÉÖ¤ - ¹ã²¥µçÊÓ½ÚÄ¿ÖÆ×÷¾ÓªÐí¿ÉÖ¤ - ÍøÂçÊÓÌýÐí¿ÉÖ¤ - ÍøÂçÎÄ»¯¾ÓªÐí¿ÉÖ¤
ɽ¶«Ê¡»¥ÁªÍø´«Ã½¼¯ÍÅÖ÷°ì ÁªÏµµç»°£º0531-85193202 Î¥·¨²»Á¼ÐÅÏ¢¾Ù±¨µç»°£º0531-85196540
Copyright (C) 2001- Dzwww ³ICP±¸09023866ºÅ-1