{"id":5203,"date":"2026-02-16T13:32:47","date_gmt":"2026-02-16T10:32:47","guid":{"rendered":"https:\/\/baum.ru\/blog\/?p=5203"},"modified":"2026-02-16T13:36:06","modified_gmt":"2026-02-16T10:36:06","slug":"nvidia-inference-context-memory-platform-platforma-kontekstnoj-pamyati-dlya-inferensa","status":"publish","type":"post","link":"https:\/\/baum.ru\/blog\/nvidia-inference-context-memory-platform-platforma-kontekstnoj-pamyati-dlya-inferensa\/","title":{"rendered":"NVIDIA Inference Context Memory Platform (\u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0434\u043b\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430)"},"content":{"rendered":"<h2><strong>\u0410\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0438 \u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b Context Memory Storage<\/strong><\/h2>\n<p><b>NVIDIA Inference Context Memory Storage Platform<\/b><span style=\"font-weight: 400;\"> \u2013 \u044d\u0442\u043e \u043d\u043e\u0432\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445, \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u0430\u044f \u0434\u043b\u044f \u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u043a\u0440\u0443\u043f\u043d\u044b\u0445 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u0437\u0430 \u0441\u0447\u0435\u0442 \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0439 \u0440\u0430\u0431\u043e\u0442\u044b \u0441 <\/span><i><span style=\"font-weight: 400;\">\u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u044c\u044e<\/span><\/i><span style=\"font-weight: 400;\"> (inference context). \u0420\u0435\u0447\u044c \u0438\u0434\u0435\u0442 \u043e \u0434\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f\u0430 <\/span><i><span style=\"font-weight: 400;\">Key-Value (KV) cache<\/span><\/i><span style=\"font-weight: 400;\">, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0445\u0440\u0430\u043d\u044f\u0442 \u0438\u0441\u0442\u043e\u0440\u0438\u044e \u0438 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u043c\u043e\u0434\u0435\u043b\u0438 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0434\u043b\u0438\u043d\u043d\u044b\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0435 \u043e\u043a\u043d\u0430 \u0432 \u0434\u0438\u0430\u043b\u043e\u0433\u0430\u0445, \u043f\u043e\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u0438 \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u0440\u0430\u0431\u043e\u0442\u044b \u0430\u0433\u0435\u043d\u0442\u043d\u044b\u0445 \u0441\u0438\u0441\u0442\u0435\u043c)<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=In%20transformer,be%20shared%20across%20inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[1]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Jensen%20Huang%20explicitly%20framed%20this,be%20designed%20for%20AI%20systems\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[2]<\/span><\/a><span style=\"font-weight: 400;\">. \u0423\u0432\u0435\u043b\u0438\u0447\u0435\u043d\u0438\u0435 \u0434\u043b\u0438\u043d\u044b \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 (\u0447\u0438\u0441\u043b\u0430 \u0442\u043e\u043a\u0435\u043d\u043e\u0432) \u043f\u0440\u0438\u0432\u043e\u0434\u0438\u0442 \u043a \u043b\u0438\u043d\u0435\u0439\u043d\u043e\u043c\u0443 \u0440\u043e\u0441\u0442\u0443 \u0442\u0440\u0435\u0431\u0443\u0435\u043c\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0434\u043b\u044f KV-\u043a\u044d\u0448\u0430, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043b\u0435\u0433\u043a\u043e \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u0435\u0442 \u0442\u0435\u0440\u0430\u0431\u0430\u0439\u0442\u043e\u0432 \u0438 \u0432\u044b\u0445\u043e\u0434\u0438\u0442 \u0437\u0430 \u043f\u0440\u0435\u0434\u0435\u043b\u044b \u043e\u0431\u044a\u0435\u043c\u0430 \u043f\u0430\u043c\u044f\u0442\u0438 \u043e\u0434\u043d\u043e\u0433\u043e GPU<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,also%20the%20context%20window%20length\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[3]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,a%20single%20GPU%E2%80%99s%20local%20memory\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[4]<\/span><\/a><span style=\"font-weight: 400;\">. \u0411\u0435\u0437 \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u043f\u0440\u0438\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u0432\u044b\u0431\u0438\u0440\u0430\u0442\u044c \u043c\u0435\u0436\u0434\u0443 \u0440\u0430\u0437\u043c\u0435\u0449\u0435\u043d\u0438\u0435\u043c \u044d\u0442\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u0432 \u0434\u0435\u0444\u0438\u0446\u0438\u0442\u043d\u043e\u0439 \u0441\u0432\u0435\u0440\u0445\u0431\u044b\u0441\u0442\u0440\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 GPU (HBM) \u043b\u0438\u0431\u043e \u043d\u0430 \u043e\u0431\u044b\u0447\u043d\u044b\u0445 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u0445, \u043d\u0435 \u0440\u0430\u0441\u0441\u0447\u0438\u0442\u0430\u043d\u043d\u044b\u0445 \u043d\u0430 \u044d\u0444\u0435\u043c\u0435\u0440\u043d\u044b\u0435, \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u2013 \u043e\u0431\u0430 \u0432\u0430\u0440\u0438\u0430\u043d\u0442\u0430 \u0432\u043b\u0435\u043a\u0443\u0442 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b: \u043b\u0438\u0431\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430, \u043b\u0438\u0431\u043e \u0440\u043e\u0441\u0442 \u043b\u0430\u0442\u0435\u043d\u0442\u043d\u043e\u0441\u0442\u0438, \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438 \u0438 \u0437\u0430\u0442\u0440\u0430\u0442<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20increases%20pressure%20on%20existing,and%20leaving%20expensive%20GPUs%20underutilized\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[5]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[6]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><b>\u041d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b<\/b><span style=\"font-weight: 400;\"> \u2013 \u0441\u043e\u0437\u0434\u0430\u0442\u044c \u043d\u043e\u0432\u044b\u0439 \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u043f\u0430\u043c\u044f\u0442\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430, \u0440\u0430\u0441\u0448\u0438\u0440\u044f\u044e\u0449\u0438\u0439 \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u0443\u044e \u043f\u0430\u043c\u044f\u0442\u044c GPU \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u0432\u0441\u0435\u0433\u043e \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0430 \u0438 \u0441\u043d\u0438\u043c\u0430\u044e\u0449\u0438\u0439 \u00ab\u0441\u0442\u0435\u043d\u0443 \u043f\u0430\u043c\u044f\u0442\u0438\u00bb \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430. \u042d\u0442\u0430 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0432\u0432\u043e\u0434\u0438\u0442 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0439 \u0441\u043b\u043e\u0439 (<\/span><i><span style=\"font-weight: 400;\">context memory tier<\/span><\/i><span style=\"font-weight: 400;\">, \u0443\u0441\u043b\u043e\u0432\u043d\u043e \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u043c\u044b\u0439 \u0443\u0440\u043e\u0432\u043d\u0435\u043c G3.5) \u043c\u0435\u0436\u0434\u0443 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u043c\u0438 \u0440\u0435\u0441\u0443\u0440\u0441\u0430\u043c\u0438 \u0443\u0437\u043b\u0430 (HBM, DRAM, SSD) \u0438 \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u044b\u043c\u0438 \u0441\u0435\u0442\u0435\u0432\u044b\u043c\u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u043c\u0438<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=ImageNvidia%20diagram%20showing%20KV%20cache,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[8]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0438\u0442\u043e\u0433\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a <\/span><i><span style=\"font-weight: 400;\">\u043f\u0435\u0440\u0432\u043e\u043a\u043b\u0430\u0441\u0441\u043d\u044b\u0439 \u0440\u0435\u0441\u0443\u0440\u0441<\/span><\/i><span style=\"font-weight: 400;\"> \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b, \u0440\u0430\u0437\u0434\u0435\u043b\u044f\u0435\u043c\u044b\u0439 \u043c\u0435\u0436\u0434\u0443 GPU \u0443\u0437\u043b\u043e\u0432, \u0430 \u043d\u0435 \u043a\u0430\u043a \u043f\u043e\u0431\u043e\u0447\u043d\u044b\u0439 \u043f\u0440\u043e\u0434\u0443\u043a\u0442 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0439<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20Inference%20Context%20Memory%20Storage,a%20transient%20byproduct%20of%20computation\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[9]<\/span><\/a><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Inference%20has%20become%20stateful,and%20becomes%20a%20platform%20requirement\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[10]<\/span><\/a><span style=\"font-weight: 400;\">. NVIDIA \u043f\u043e\u0434\u0447\u0435\u0440\u043a\u0438\u0432\u0430\u0435\u0442, \u0447\u0442\u043e \u043d\u043e\u0432\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 \u0434\u043e <\/span><b>5-\u043a\u0440\u0430\u0442\u043d\u043e\u0433\u043e \u0443\u0432\u0435\u043b\u0438\u0447\u0435\u043d\u0438\u044f \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u043e\u0439 \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u0438 (tokens per second)<\/b><span style=\"font-weight: 400;\"> \u0438 <\/span><b>\u0432 5 \u0440\u0430\u0437 \u043b\u0443\u0447\u0448\u0443\u044e \u044d\u043d\u0435\u0440\u0433\u043e\u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u044c<\/b><span style=\"font-weight: 400;\"> \u043f\u0440\u0438 \u0434\u043b\u0438\u043d\u043d\u043e\u043c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0435 \u043f\u043e \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044e \u0441 \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u044b\u043c\u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u043c\u0438<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Powered%20by%20the%20NVIDIA%20BlueField,power%20efficient%20than%20traditional%20storage\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[11]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=manage%20flash,the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[12]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u0435\u0442\u0441\u044f \u0442\u0435\u043c, \u0447\u0442\u043e \u201c\u0433\u043e\u0440\u044f\u0447\u0438\u0439\u201d \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0431\u043e\u043b\u044c\u0448\u0435 \u043d\u0435 \u0432\u044b\u0442\u0435\u0441\u043d\u044f\u0435\u0442\u0441\u044f \u043d\u0430 \u043c\u0435\u0434\u043b\u0435\u043d\u043d\u044b\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u0432\u044b\u0437\u044b\u0432\u0430\u044f \u043f\u0440\u043e\u0441\u0442\u043e\u0438 GPU, \u0430 \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u044b\u043c \u0432\u044b\u0441\u043e\u043a\u043e\u0441\u043a\u043e\u0440\u043e\u0441\u0442\u043d\u044b\u043c \u0443\u0440\u043e\u0432\u043d\u0435\u043c \u043f\u0430\u043c\u044f\u0442\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20reliable%20prestaging%2C%20backed%20by,oF%20and%20object%2FRDMA%20protocols\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[13]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[14]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><b>\u041c\u043d\u043e\u0433\u043e\u0443\u0440\u043e\u0432\u043d\u0435\u0432\u0430\u044f \u043f\u0430\u043c\u044f\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430.<\/b><span style=\"font-weight: 400;\"> \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043e\u043f\u0438\u0440\u0430\u0435\u0442\u0441\u044f \u043d\u0430 \u0438\u0435\u0440\u0430\u0440\u0445\u0438\u044e \u043f\u0430\u043c\u044f\u0442\u0438 \u0438\u0437 \u0447\u0435\u0442\u044b\u0440\u0435\u0445 \u043e\u0441\u043d\u043e\u0432\u043d\u044b\u0445 \u0443\u0440\u043e\u0432\u043d\u0435\u0439 (G1\u2013G4) \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f KV-\u043a\u044d\u0448\u0430, \u0434\u043e\u043f\u043e\u043b\u043d\u044f\u0435\u043c\u0443\u044e \u043d\u043e\u0432\u044b\u043c \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u043c \u0441\u043b\u043e\u0435\u043c G3.5<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=The%20technology%20targets%20a%20specific,penalties%20that%20degrade%20inference%20efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[15]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412\u0435\u0440\u0445\u043d\u0438\u0435 \u0443\u0440\u043e\u0432\u043d\u0438 \u2013 \u044d\u0442\u043e \u0441\u0432\u0435\u0440\u0445\u0431\u044b\u0441\u0442\u0440\u0430\u044f \u043f\u0430\u043c\u044f\u0442\u044c GPU (<\/span><i><span style=\"font-weight: 400;\">G1: HBM<\/span><\/i><span style=\"font-weight: 400;\">, \u043d\u0430\u043d\u043e\u0441\u0435\u043a\u0443\u043d\u0434\u043d\u0430\u044f \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430) \u0434\u043b\u044f \u0430\u043a\u0442\u0438\u0432\u043d\u043e\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430, \u0437\u0430\u0442\u0435\u043c <\/span><i><span style=\"font-weight: 400;\">G2: DRAM<\/span><\/i><span style=\"font-weight: 400;\"> \u0443\u0437\u043b\u0430 (\u0434\u0435\u0441\u044f\u0442\u043a\u0438 \u043d\u0430\u043d\u043e\u0441\u0435\u043a\u0443\u043d\u0434) \u0434\u043b\u044f \u0431\u0443\u0444\u0435\u0440\u0438\u0437\u0430\u0446\u0438\u0438 \u0438 \u0441\u043f\u0438\u043b\u043b-\u043e\u0432\u0435\u0440\u0430 \u0438\u0437 HBM. \u041d\u0438\u0436\u0435 \u2013 <\/span><i><span style=\"font-weight: 400;\">G3: \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0435 SSD\/NVMe<\/span><\/i><span style=\"font-weight: 400;\"> \u043d\u0430 \u0443\u0437\u043b\u0435 (\u043c\u0438\u043a\u0440\u043e\u0441\u0435\u043a\u0443\u043d\u0434\u043d\u044b\u0435 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0438) \u0434\u043b\u044f \u00ab\u0442\u0435\u043f\u043b\u043e\u0433\u043e\u00bb \u043a\u044d\u0448\u0430, \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u043c\u043e\u0433\u043e \u0432 \u043a\u0440\u0430\u0442\u043a\u043e\u0441\u0440\u043e\u0447\u043d\u043e\u0439 \u043f\u0435\u0440\u0441\u043f\u0435\u043a\u0442\u0438\u0432\u0435, \u0438 \u043d\u0430\u043a\u043e\u043d\u0435\u0446 <\/span><i><span style=\"font-weight: 400;\">G4: \u043e\u0431\u0449\u0438\u0435 \u0441\u0435\u0442\u0435\u0432\u044b\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430<\/span><\/i><span style=\"font-weight: 400;\"> (\u043c\u0438\u043b\u043b\u0438\u0441\u0435\u043a\u0443\u043d\u0434\u043d\u044b\u0435 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0438) \u0434\u043b\u044f \u00ab\u0445\u043e\u043b\u043e\u0434\u043d\u044b\u0445\u00bb \u0434\u0430\u043d\u043d\u044b\u0445 \u2013 \u0434\u043e\u043b\u0433\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0438\u0441\u0442\u043e\u0440\u0438\u0438, \u043b\u043e\u0433\u043e\u0432 \u0438 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u043e\u0432, \u0442\u0440\u0435\u0431\u0443\u044e\u0449\u0438\u0445 \u0441\u043e\u0445\u0440\u0430\u043d\u043d\u043e\u0441\u0442\u0438<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=orchestration%20frameworks%2C%20such%20as%20NVIDIA,context%20across%20these%20storage%20tiers\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[16]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=storage%20hierarchy\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[17]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u0440\u043e\u0431\u043b\u0435\u043c\u0430 \u0432 \u0442\u043e\u043c, \u0447\u0442\u043e \u043f\u043e \u043c\u0435\u0440\u0435 \u0440\u043e\u0441\u0442\u0430 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0435\u043c\u043a\u043e\u0441\u0442\u0438 G1\u2013G3 \u0431\u044b\u0441\u0442\u0440\u043e \u043d\u0435 \u0445\u0432\u0430\u0442\u0430\u0435\u0442, \u0430 \u043f\u0435\u0440\u0435\u043d\u043e\u0441 \u0430\u043a\u0442\u0438\u0432\u043d\u043e\u0433\u043e \u043a\u044d\u0448\u0430 \u043d\u0430 \u0443\u0440\u043e\u0432\u0435\u043d\u044c G4 \u0432\u0435\u0434\u0435\u0442 \u043a \u0431\u043e\u043b\u044c\u0448\u0438\u043c \u043d\u0430\u043a\u043b\u0430\u0434\u043d\u044b\u043c \u0440\u0430\u0441\u0445\u043e\u0434\u0430\u043c \u0438 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430\u043c<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=G1%20is%20optimized%20for%20access,both%20cost%20and%20power%20consumption\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[18]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=the%20highest%20efficiency%2C%20making%20it,overhead%20that%20reduces%20overall%20efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[19]<\/span><\/a><span style=\"font-weight: 400;\">. \u041d\u043e\u0432\u044b\u0439 <\/span><i><span style=\"font-weight: 400;\">Context Memory<\/span><\/i><span style=\"font-weight: 400;\"> \u0441\u043b\u043e\u0439 (G3.5) \u0432\u0441\u0442\u0440\u0430\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043c\u0435\u0436\u0434\u0443 G3 \u0438 G4, \u0437\u0430\u043f\u043e\u043b\u043d\u044f\u044f \u044d\u0442\u043e\u0442 \u0440\u0430\u0437\u0440\u044b\u0432 \u2013 \u043e\u043d \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 <\/span><b>\u043a\u043b\u0430\u0441\u0442\u0435\u0440\u043d\u044b\u0439 (pod-level) \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u043f\u0430\u043c\u044f\u0442\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/b><span style=\"font-weight: 400;\">, \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0435\u043c\u043a\u0438\u0439 \u0434\u043b\u044f \u043c\u043d\u043e\u0433\u043e\u0442\u0435\u0440\u0435\u0431\u0430\u0439\u0442\u043d\u044b\u0445 KV-\u043a\u044d\u0448\u0435\u0439, \u043d\u043e \u0433\u043e\u0440\u0430\u0437\u0434\u043e \u0431\u044b\u0441\u0442\u0440\u0435\u0435 \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u0441\u0435\u0442\u0435\u0432\u044b\u0445 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=ImageNvidia%20diagram%20showing%20KV%20cache,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[8]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5206\" src=\"https:\/\/baum.ru\/blog\/wp-content\/uploads\/2026\/02\/2026-02-16-15.20.56.jpeg\" alt=\"\" width=\"1000\" height=\"387\" title=\"\" srcset=\"https:\/\/baum.ru\/blog\/wp-content\/uploads\/2026\/02\/2026-02-16-15.20.56.jpeg 1000w, https:\/\/baum.ru\/blog\/wp-content\/uploads\/2026\/02\/2026-02-16-15.20.56-300x116.jpeg 300w, https:\/\/baum.ru\/blog\/wp-content\/uploads\/2026\/02\/2026-02-16-15.20.56-768x297.jpeg 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p><i><span style=\"font-weight: 400;\">\u0420\u0438\u0441. 1: \u0418\u0435\u0440\u0430\u0440\u0445\u0438\u044f \u043f\u0430\u043c\u044f\u0442\u0438 \u0434\u043b\u044f KV-\u043a\u044d\u0448\u0430 \u043f\u0440\u0438 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0435 (\u0443\u0440\u043e\u0432\u043d\u0438 G1\u2013G4). \u0412\u0435\u0440\u0445\u043d\u0438\u0435 \u0443\u0440\u043e\u0432\u043d\u0438 (HBM GPU, DRAM) \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044e\u0442 \u043d\u0430\u043d\u043e\u0441\u0435\u043a\u0443\u043d\u0434\u043d\u044b\u0439 \u0434\u043e\u0441\u0442\u0443\u043f \u0434\u043b\u044f \u0430\u043a\u0442\u0438\u0432\u043d\u043e\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430, \u043d\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u044b \u043f\u043e \u0435\u043c\u043a\u043e\u0441\u0442\u0438. \u041d\u0438\u0436\u043d\u0438\u0435 (\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0435 SSD, \u043e\u0431\u0449\u0438\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430) \u0434\u0430\u044e\u0442 \u0431\u043e\u043b\u044c\u0448\u0435 \u043c\u0435\u0441\u0442\u0430, \u043d\u043e \u0441 \u0432\u043e\u0437\u0440\u0430\u0441\u0442\u0430\u044e\u0449\u0435\u0439 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u043e\u0439 (\u043c\u0438\u043a\u0440\u043e- \u0438 \u043c\u0438\u043b\u043b\u0438\u0441\u0435\u043a\u0443\u043d\u0434\u044b) \u0438 \u0441\u043d\u0438\u0436\u0435\u043d\u043d\u043e\u0439 \u044d\u043d\u0435\u0440\u0433\u043e\u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u044c\u044e<\/span><\/i><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=58%20Image%3A%20A%20four,token%20overhead%20increase\" target=\"_blank\" rel=\"noopener\"><i><span style=\"font-weight: 400;\">[20]<\/span><\/i><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=the%20highest%20efficiency%2C%20making%20it,overhead%20that%20reduces%20overall%20efficiency\" target=\"_blank\" rel=\"noopener\"><i><span style=\"font-weight: 400;\">[19]<\/span><\/i><\/a><i><span style=\"font-weight: 400;\">.<\/span><\/i><\/p>\n<p><b>\u0420\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b.<\/b><span style=\"font-weight: 400;\"> \u041a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u0430\u044f \u043f\u0430\u043c\u044f\u0442\u044c G3.5 \u043e\u0440\u0433\u0430\u043d\u0438\u0437\u043e\u0432\u0430\u043d\u0430 \u043a\u0430\u043a \u0441\u043e\u0432\u043e\u043a\u0443\u043f\u043d\u043e\u0441\u0442\u044c <\/span><i><span style=\"font-weight: 400;\">Ethernet-\u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0435\u043d\u043d\u044b\u0445 flash-\u043d\u043e\u0434<\/span><\/i><span style=\"font-weight: 400;\">, \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0435\u043c\u044b\u0445 \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c\u0438 DPU-\u043f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440\u0430\u043c\u0438 NVIDIA BlueField-4<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20introduces%20an%20intermediate%20%E2%80%9CG3,for%20KV%20cache%20data%20characteristics\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[21]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u0438 \u043d\u043e\u0434\u044b \u043d\u0430\u0437\u044b\u0432\u0430\u044e\u0442\u0441\u044f <\/span><b>Inference Context Memory Storage (ICMS) targets<\/b><span style=\"font-weight: 400;\"> \u0438 \u043f\u043e \u0441\u0443\u0442\u0438 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0442 \u0441\u043e\u0431\u043e\u0439 \u0432\u044b\u0441\u043e\u043a\u043e\u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 flash-\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u0432\u0441\u0442\u0440\u043e\u0435\u043d\u043d\u044b\u0435 \u0432 \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0443 AI-\u043f\u043e\u0434\u0430. \u041e\u043d\u0438 \u0440\u0430\u0431\u043e\u0442\u0430\u044e\u0442 \u0432 \u0435\u0434\u0438\u043d\u043e\u0439 \u043d\u0438\u0437\u043a\u043e-\u043b\u0430\u0442\u0435\u043d\u0442\u043d\u043e\u0439 \u0441\u0435\u0442\u0438 \u0441 GPU-\u0443\u0437\u043b\u0430\u043c\u0438 \u0438 \u043f\u0440\u0435\u0434\u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u044b \u0438\u0441\u043a\u043b\u044e\u0447\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u0434\u043b\u044f \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0438 \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u043d\u0438\u044f KV-\u043a\u044d\u0448\u0430 (\u043d\u0438\u043a\u0430\u043a\u0438\u0445 \u0438\u043d\u044b\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u0442\u0430\u043c \u043d\u0435 \u0445\u0440\u0430\u043d\u0438\u0442\u0441\u044f)<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=In%20other%20words%2C%20this%20infrastructure,It%20doesn%E2%80%99t%20do%20anything%20else\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[22]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=A%20KV%20cache,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[23]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u043e\u0439 \u201c\u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0439 \u0431\u0443\u0444\u0435\u0440\u201d \u043d\u0430 \u0444\u043b\u044d\u0448-\u043f\u0430\u043c\u044f\u0442\u0438, \u0431\u0443\u0434\u0443\u0447\u0438 \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0431\u043b\u0438\u0437\u043a\u043e \u043a GPU \u043f\u043e \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430\u043c, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">\u043f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u043f\u043e\u0434\u0433\u0440\u0443\u0436\u0430\u0442\u044c (prestage)<\/span><\/i><span style=\"font-weight: 400;\"> \u043d\u0443\u0436\u043d\u044b\u0435 \u0431\u043b\u043e\u043a\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043e\u0431\u0440\u0430\u0442\u043d\u043e \u0432 GPU\/DRAM (G1\/G2) \u043f\u0435\u0440\u0435\u0434 \u0433\u0435\u043d\u0435\u0440\u0430\u0446\u0438\u0435\u0439 \u0442\u043e\u043a\u0435\u043d\u043e\u0432, \u0442\u0435\u043c \u0441\u0430\u043c\u044b\u043c \u0438\u0437\u0431\u0435\u0433\u0430\u044f \u043f\u0430\u0443\u0437 \u043d\u0430 \u0434\u0435\u043a\u043e\u0434\u0435\u0440\u0435 \u043c\u043e\u0434\u0435\u043b\u0438<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Inference%20frameworks%20like%20NVIDIA%20Dynamo,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[24]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20reliable%20prestaging%2C%20backed%20by,oF%20and%20object%2FRDMA%20protocols\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[13]<\/span><\/a><span style=\"font-weight: 400;\">. NVIDIA \u0437\u0430\u044f\u0432\u043b\u044f\u0435\u0442, \u0447\u0442\u043e \u044d\u0442\u043e \u043d\u0430\u0434\u0435\u0436\u043d\u043e\u0435 \u0443\u043f\u0440\u0435\u0436\u0434\u0430\u044e\u0449\u0435\u0435 \u043f\u043e\u0434\u043a\u0430\u0447\u0438\u0432\u0430\u043d\u0438\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0441\u043d\u0438\u0436\u0430\u0435\u0442 \u043f\u0440\u043e\u0441\u0442\u043e\u0438 \u0438 \u0443\u0432\u0435\u043b\u0438\u0447\u0438\u0432\u0430\u0435\u0442 \u0443\u0441\u0442\u043e\u0439\u0447\u0438\u0432\u0443\u044e \u0441\u043a\u043e\u0440\u043e\u0441\u0442\u044c \u0433\u0435\u043d\u0435\u0440\u0430\u0446\u0438\u0438 \u0442\u043e\u043a\u0435\u043d\u043e\u0432 \u0434\u043e <\/span><b>5\u00d7 \u043d\u0430 \u0434\u043b\u0438\u043d\u043d\u044b\u0445 \u0434\u0438\u0430\u043b\u043e\u0433\u0430\u0445 \u0438 \u0430\u0433\u0435\u043d\u0442\u043d\u044b\u0445 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0430\u0445<\/b><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20reliable%20prestaging%2C%20backed%20by,oF%20and%20object%2FRDMA%20protocols\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[13]<\/span><\/a><span style=\"font-weight: 400;\">. \u041a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a <\/span><i><span style=\"font-weight: 400;\">\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0439 \u0438 \u0440\u0435\u043a\u043e\u043d\u0441\u0442\u0440\u0443\u0438\u0440\u0443\u0435\u043c\u044b\u0439<\/span><\/i><span style=\"font-weight: 400;\"> (\u0435\u0441\u043b\u0438 \u043f\u043e\u0442\u0435\u0440\u044f\u043d, \u0435\u0433\u043e \u043c\u043e\u0436\u043d\u043e \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u044c \u0437\u0430\u043d\u043e\u0432\u043e), \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0435 \u043d\u0435 \u043d\u0443\u0436\u043d\u044b \u00ab\u0442\u044f\u0436\u0435\u043b\u044b\u0435\u00bb \u0441\u043b\u0443\u0436\u0431\u044b \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f (\u0440\u0435\u043f\u043b\u0438\u043a\u0430\u0446\u0438\u044f, \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u043c\u0435\u0442\u0430\u0434\u0430\u043d\u043d\u044b\u0435, \u0440\u0435\u0437\u0435\u0440\u0432\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435) \u2013 \u0437\u0430 \u0441\u0447\u0435\u0442 \u044d\u0442\u043e\u0433\u043e \u0441\u043d\u0438\u0436\u0435\u043d\u044b \u043d\u0430\u043a\u043b\u0430\u0434\u043d\u044b\u0435 \u0440\u0430\u0441\u0445\u043e\u0434\u044b, \u044d\u043d\u0435\u0440\u0433\u043e\u043f\u043e\u0442\u0440\u0435\u0431\u043b\u0435\u043d\u0438\u0435 \u0438 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0438, \u043f\u0440\u0438\u0441\u0443\u0449\u0438\u0435 Enterprise-\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u043c<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Power%20availability%20is%20the%20primary,for%20ephemeral%2C%20reconstructable%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[25]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=KV%20cache%20fundamentally%20differs%20from,purpose%20storage%20approaches\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[26]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0412 \u0446\u0435\u043b\u043e\u043c, <\/span><b>Inference Context Memory Platform<\/b><span style=\"font-weight: 400;\"> \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442 \u0440\u043e\u043b\u044c <\/span><i><span style=\"font-weight: 400;\">\u201c\u0434\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438\u201d<\/span><\/i><span style=\"font-weight: 400;\"> AI-\u0430\u0433\u0435\u043d\u0442\u043e\u0432 \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u0434\u0430\u0442\u0430\u0446\u0435\u043d\u0442\u0440\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=chatbots%20to%20complex%2C%20multiturn%20agentic,services%20and%20revisited%20over%20time\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[27]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=sessions%20and%20be%20shared%20across,inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[28]<\/span><\/a><span style=\"font-weight: 400;\">. \u041e\u043d\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0438 \u0440\u0430\u0437\u0434\u0435\u043b\u044f\u0442\u044c \u0431\u043e\u043b\u044c\u0448\u043e\u0439 \u043e\u0431\u044a\u0435\u043c \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441-\u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 (\u0438\u0441\u0442\u043e\u0440\u0438\u044e \u0434\u0438\u0430\u043b\u043e\u0433\u043e\u0432, \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044f \u0430\u0433\u0435\u043d\u0442\u043e\u0432, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u0445 \u0448\u0430\u0433\u043e\u0432) \u043c\u0435\u0436\u0434\u0443 \u043c\u043d\u043e\u0433\u0438\u043c\u0438 GPU \u0438 \u0443\u0437\u043b\u0430\u043c\u0438, \u0438 \u043c\u043d\u043e\u0433\u043e\u043a\u0440\u0430\u0442\u043d\u043e \u0435\u0433\u043e \u043f\u0435\u0440\u0435\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c, \u0432\u043c\u0435\u0441\u0442\u043e \u0442\u043e\u0433\u043e \u0447\u0442\u043e\u0431\u044b \u043a\u0430\u0436\u0434\u044b\u0439 \u0440\u0430\u0437 \u043f\u0435\u0440\u0435\u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u0441 \u043d\u0443\u043b\u044f \u0438\u043b\u0438 \u0441\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u0442\u044c \u043f\u043e\u0441\u043b\u0435 \u043e\u0434\u043d\u043e\u0433\u043e \u0437\u0430\u043f\u0440\u043e\u0441\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=In%20transformer,be%20shared%20across%20inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[1]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=longer%20a%20one,be%20designed%20for%20AI%20systems\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[29]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u044e\u0442\u0441\u044f \u0431\u043e\u043b\u0435\u0435 \u0432\u044b\u0441\u043e\u043a\u0430\u044f <\/span><i><span style=\"font-weight: 400;\">\u0441\u043a\u0432\u043e\u0437\u043d\u0430\u044f \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c<\/span><\/i><span style=\"font-weight: 400;\"> (\u0431\u043e\u043b\u044c\u0448\u0435 \u0442\u043e\u043a\u0435\u043d\u043e\u0432 \u0438 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u0432 \u0441\u0435\u043a\u0443\u043d\u0434\u0443 \u043d\u0430 \u043a\u0430\u0436\u0434\u044b\u0439 GPU) \u0438 <\/span><i><span style=\"font-weight: 400;\">\u044d\u043d\u0435\u0440\u0433\u043e\u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u044c<\/span><\/i><span style=\"font-weight: 400;\">, \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e \u0432 \u0441\u0446\u0435\u043d\u0430\u0440\u0438\u044f\u0445 \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043c\u043d\u043e\u0433\u043e\u0445\u043e\u0434\u043e\u0432\u044b\u0445 \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u0439 \u0438 \u043f\u0440\u0438 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043a\u043e\u043b\u0438\u0447\u0435\u0441\u0442\u0432\u0430 \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20BlueField%E2%80%914%E2%80%93powered%20ICMS%20provides%20AI%E2%80%91native,shorter%20tail%20latencies%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[31]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h2><strong>NIXL: \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 \u0432\u044b\u0441\u043e\u043a\u043e\u0441\u043a\u043e\u0440\u043e\u0441\u0442\u043d\u043e\u0439 \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">\u041e\u0434\u043d\u043e\u0439 \u0438\u0437 \u043a\u043b\u044e\u0447\u0435\u0432\u044b\u0445 \u0442\u0435\u0445\u043d\u043e\u043b\u043e\u0433\u0438\u0439 \u0432 \u0441\u043e\u0441\u0442\u0430\u0432\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f <\/span><b>NIXL (NVIDIA Inference Transfer Library)<\/b><span style=\"font-weight: 400;\"> \u2013 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430, \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0438\u0440\u0443\u044e\u0449\u0430\u044f \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0443 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043c\u0435\u0436\u0434\u0443 GPU, DPU \u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u043c\u0438. NIXL \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">\u0430\u0441\u0438\u043d\u0445\u0440\u043e\u043d\u043d\u043e\u0435, point-to-point \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u0435<\/span><\/i><span style=\"font-weight: 400;\"> \u0432 \u0440\u0430\u043c\u043a\u0430\u0445 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441-\u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u0432 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, NVIDIA Dynamo) \u0438 \u0430\u0431\u0441\u0442\u0440\u0430\u0433\u0438\u0440\u0443\u0435\u0442 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0435 \u0442\u0438\u043f\u044b \u043f\u0430\u043c\u044f\u0442\u0438 \u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449 \u0447\u0435\u0440\u0435\u0437 \u043c\u043e\u0434\u0443\u043b\u044c\u043d\u044b\u0435 \u043f\u043b\u0430\u0433\u0438\u043d\u044b<\/span><a href=\"https:\/\/github.com\/ai-dynamo\/nixl#:~:text=NVIDIA%20Inference%20Xfer%20Library%20,in%20architecture\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[32]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u0440\u043e\u0449\u0435 \u0433\u043e\u0432\u043e\u0440\u044f, NIXL \u0441\u043a\u0440\u044b\u0432\u0430\u0435\u0442 \u043e\u0442 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0447\u0438\u043a\u043e\u0432 \u0440\u0430\u0437\u043b\u0438\u0447\u0438\u044f \u043c\u0435\u0436\u0434\u0443 \u043f\u0435\u0440\u0435\u043d\u043e\u0441\u043e\u043c \u0434\u0430\u043d\u043d\u044b\u0445 \u0438\u0437 \u043f\u0430\u043c\u044f\u0442\u0438 CPU, \u043f\u0430\u043c\u044f\u0442\u0438 GPU \u0438\u043b\u0438, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0447\u0442\u0435\u043d\u0438\u0435\u043c KV-\u0431\u043b\u043e\u043a\u043e\u0432 \u0441 NVMe-\u043d\u043e\u0441\u0438\u0442\u0435\u043b\u0435\u0439 \u2013 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 \u0443\u043d\u0438\u0444\u0438\u0446\u0438\u0440\u043e\u0432\u0430\u043d\u043e \u0438 \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u043f\u0435\u0440\u0435\u043c\u0435\u0449\u0430\u0435\u0442 \u043d\u0443\u0436\u043d\u044b\u0435 \u043a\u0443\u0441\u043a\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0442\u0443\u0434\u0430, \u0433\u0434\u0435 \u043e\u043d\u0438 \u0442\u0440\u0435\u0431\u0443\u044e\u0442\u0441\u044f \u0432 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0412 \u0441\u043e\u0441\u0442\u0430\u0432\u0435 <\/span><b>Inference Context Memory Platform<\/b><span style=\"font-weight: 400;\"> \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0430 NIXL \u0438\u0433\u0440\u0430\u0435\u0442 \u0440\u043e\u043b\u044c \u043e\u0440\u043a\u0435\u0441\u0442\u0440\u0430\u0442\u043e\u0440\u0430 \u0434\u0432\u0438\u0436\u0435\u043d\u0438\u044f KV-\u043a\u044d\u0448\u0430 \u043f\u043e \u0443\u0440\u043e\u0432\u043d\u044f\u043c \u043f\u0430\u043c\u044f\u0442\u0438 G1\u2013G4. \u0418\u043d\u0444\u0435\u0440\u0435\u043d\u0441-\u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0438 (\u0442\u0430\u043a\u0438\u0435 \u043a\u0430\u043a NVIDIA <\/span><b>Dynamo<\/b><span style=\"font-weight: 400;\">) \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442 NIXL \u0432\u043c\u0435\u0441\u0442\u0435 \u0441\u043e \u0441\u0432\u043e\u0438\u043c\u0438 \u043c\u0435\u043d\u0435\u0434\u0436\u0435\u0440\u0430\u043c\u0438 \u043a\u044d\u0448\u0430 \u0434\u043b\u044f \u0440\u0435\u0448\u0435\u043d\u0438\u044f, <\/span><i><span style=\"font-weight: 400;\">\u043a\u043e\u0433\u0434\u0430 \u0438 \u0447\u0442\u043e \u0432\u044b\u0433\u0440\u0443\u0436\u0430\u0442\u044c \u0438\u043b\u0438 \u0437\u0430\u0433\u0440\u0443\u0436\u0430\u0442\u044c<\/span><\/i><span style=\"font-weight: 400;\"> \u043c\u0435\u0436\u0434\u0443 HBM, DRAM \u0438 ICMS-\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435\u043c<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Inference%20frameworks%20like%20NVIDIA%20Dynamo,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[33]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=The%20ICMSP%20is%20a%20G3,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[34]<\/span><\/a><span style=\"font-weight: 400;\">. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u0440\u0438 \u0433\u0435\u043d\u0435\u0440\u0430\u0446\u0438\u0438 \u043e\u0442\u0432\u0435\u0442\u0430 NIXL \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 \u043f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u0443\u044e \u0437\u0430\u0433\u0440\u0443\u0437\u043a\u0443 (pre-fill) \u043d\u0443\u0436\u043d\u044b\u0445 \u0431\u043b\u043e\u043a\u043e\u0432 KV \u0438\u0437 ICMS (\u0443\u0440\u043e\u0432\u0435\u043d\u044c G3.5) \u043e\u0431\u0440\u0430\u0442\u043d\u043e \u0432 GPU-\u043f\u0430\u043c\u044f\u0442\u044c (HBM, G1) \u0438\u043b\u0438 \u043f\u0430\u043c\u044f\u0442\u044c \u0445\u043e\u0441\u0442\u0430 (G2) \u043f\u0435\u0440\u0435\u0434 \u0444\u0430\u0437\u043e\u0439 \u0434\u0435\u043a\u043e\u0434\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0442\u043e\u043a\u0435\u043d\u043e\u0432<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Inference%20frameworks%20like%20NVIDIA%20Dynamo,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[24]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u043e\u0441\u043b\u0435 \u0433\u0435\u043d\u0435\u0440\u0430\u0446\u0438\u0438 \u043d\u043e\u0432\u044b\u0445 \u0442\u043e\u043a\u0435\u043d\u043e\u0432 NIXL \u043c\u043e\u0436\u0435\u0442 \u0432\u044b\u0433\u0440\u0443\u0436\u0430\u0442\u044c \u043c\u0435\u043d\u0435\u0435 \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u043e\u0431\u0440\u0430\u0442\u043d\u043e \u043d\u0430 \u0444\u043b\u044d\u0448-\u0441\u043b\u043e\u0439. \u0412\u0441\u0435 \u044d\u0442\u043e \u0434\u0435\u043b\u0430\u0435\u0442\u0441\u044f \u043f\u0440\u043e\u0437\u0440\u0430\u0447\u043d\u043e \u0438 \u0432 \u0444\u043e\u043d\u043e\u0432\u043e\u043c \u0440\u0435\u0436\u0438\u043c\u0435, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f GPU \u0438\u0437\u0431\u0435\u0433\u0430\u0442\u044c \u0437\u0430\u0434\u0435\u0440\u0436\u0435\u043a \u0432\u0432\u043e\u0434\u0430-\u0432\u044b\u0432\u043e\u0434\u0430 \u0438 \u0441\u043e\u0441\u0440\u0435\u0434\u043e\u0442\u043e\u0447\u0438\u0442\u044c\u0441\u044f \u043d\u0430 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f\u0445.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0412\u0430\u0436\u043d\u043e, \u0447\u0442\u043e NIXL \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0440\u0430\u0437\u043d\u043e\u043e\u0431\u0440\u0430\u0437\u043d\u044b\u0435 \u0431\u044d\u043a\u0435\u043d\u0434\u044b \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0447\u0435\u0440\u0435\u0437 \u043f\u043b\u0430\u0433\u0438\u043d\u044b \u2013 \u0444\u0430\u0439\u043b\u043e\u0432\u044b\u0435 \u0441\u0438\u0441\u0442\u0435\u043c\u044b, \u0431\u043b\u043e\u0447\u043d\u044b\u0435 \u0443\u0441\u0442\u0440\u043e\u0439\u0441\u0442\u0432\u0430 NVMe-oF, \u043e\u0431\u044a\u0435\u043a\u0442\u043d\u044b\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0438 \u0442.\u0434.<\/span><a href=\"https:\/\/github.com\/ai-dynamo\/nixl#:~:text=NVIDIA%20Inference%20Xfer%20Library%20,in%20architecture\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[32]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u0435\u0442 \u043e\u0431\u0449\u0435\u0439 \u0444\u0438\u043b\u043e\u0441\u043e\u0444\u0438\u0438 NVIDIA \u2013 \u043d\u0435 \u00ab\u043f\u0440\u0438\u0432\u044f\u0437\u044b\u0432\u0430\u0442\u044c\u00bb \u0440\u0435\u0448\u0435\u043d\u0438\u0435 \u0436\u0435\u0441\u0442\u043a\u043e \u043a \u043e\u0434\u043d\u043e\u043c\u0443 \u0442\u0438\u043f\u0443 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u0430 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u043d\u044b\u0435 \u0442\u0440\u0430\u043d\u0441\u043f\u043e\u0440\u0442\u044b (NVMe, NVMe-over-Fabrics, RDMA, \u0432\u043a\u043b\u044e\u0447\u0430\u044f <\/span><b>NVMe KV-\u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u044f<\/b><span style=\"font-weight: 400;\"> \u043f\u043e\u0434 \u043a\u043b\u044e\u0447-\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435)<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=These%20crypto%20and%20integrity%20accelerators,performance%20required%20for%20KV%20cache\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[35]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=Nvidia%20notes%3A%20%E2%80%9CBy%20leveraging%20standard,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[36]<\/span><\/a><span style=\"font-weight: 400;\">. \u0411\u043b\u0430\u0433\u043e\u0434\u0430\u0440\u044f \u044d\u0442\u043e\u043c\u0443 ICMS-\u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043e\u0441\u0442\u0430\u0435\u0442\u0441\u044f \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u043e\u0439 \u0441 \u043e\u0431\u044b\u0447\u043d\u043e\u0439 \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u043e\u0439 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445, \u043d\u043e \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u0435\u0442 \u043d\u0443\u0436\u043d\u043e\u0439 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 \u0434\u043b\u044f KV-\u043a\u044d\u0448\u0430. \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0439 \u043f\u0440\u0438\u043c\u0435\u0440 \u2013 NIXL \u0443\u0436\u0435 \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u043e\u0432\u0430\u043d \u0432 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u0439 \u0441\u0442\u0435\u043a <\/span><b>LMCache<\/b><span style=\"font-weight: 400;\">, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044e\u0449\u0438\u0439 \u0432\u043d\u0435 NVIDIA-\u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c \u043e\u0440\u0433\u0430\u043d\u0438\u0437\u043e\u0432\u0430\u0442\u044c \u043c\u043d\u043e\u0433\u043e\u0443\u0440\u043e\u0432\u043d\u0435\u0432\u044b\u0439 KV-\u043a\u044d\u0448 (CPU, SSD, S3-\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u0438 \u043f\u0440.) \u0432 \u0442\u0430\u043a\u0438\u0445 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430\u0445 \u043a\u0430\u043a vLLM<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Open\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[37]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Unlike%20NVIDIA%E2%80%99s%20ICMS%2C%20which%20requires,4\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[38]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0441\u043b\u0443\u0447\u0430\u0435 \u0436\u0435 NVIDIA-\u0441\u0442\u0435\u043a\u0430, NIXL \u0442\u0435\u0441\u043d\u043e \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0443\u0435\u0442 \u0441 DPU BlueField-4 \u0438 \u041f\u041e DOCA (\u043e \u043d\u0438\u0445 \u043d\u0438\u0436\u0435) \u0434\u043b\u044f \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0439 \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0438 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0432 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0435 \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=with%20NVIDIA%20Inference%20Transfer%20Library,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[39]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=At%20the%20inference%20layer%2C%20NVIDIA,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[40]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h2><strong>\u0410\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u044b\u0435 \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442\u044b: GPU, DPU BlueField-4, NVLink \u0438 NVSwitch, \u0441\u0435\u0442\u044c Spectrum-X<\/strong><\/h2>\n<p><b>GPU \u0438 \u043f\u0430\u043c\u044f\u0442\u044c.<\/b><span style=\"font-weight: 400;\"> \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0440\u0430\u0441\u0441\u0447\u0438\u0442\u0430\u043d\u0430 \u043d\u0430 \u0440\u0430\u0431\u043e\u0442\u0443 \u0441 \u043d\u043e\u0432\u0435\u0439\u0448\u0438\u043c\u0438 GPU NVIDIA, \u0432\u0445\u043e\u0434\u044f\u0449\u0438\u043c\u0438 \u0432 <\/span><b>Rubin platform<\/b><span style=\"font-weight: 400;\"> (\u043f\u043e\u043a\u043e\u043b\u0435\u043d\u0438\u0435 \u043f\u043e\u0441\u043b\u0435 H100\/Blackwell). \u041a\u0430\u0436\u0434\u044b\u0439 \u0442\u0430\u043a\u043e\u0439 GPU \u0438\u043c\u0435\u0435\u0442 \u043e\u0433\u0440\u043e\u043c\u043d\u044b\u0439 \u043e\u0431\u044a\u0435\u043c \u0431\u044b\u0441\u0442\u0440\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 (<\/span><i><span style=\"font-weight: 400;\">HBM4<\/span><\/i><span style=\"font-weight: 400;\">, \u0434\u043e ~288 \u0413\u0411 \u043d\u0430 GPU) \u0438 \u043a\u043e\u043b\u043e\u0441\u0441\u0430\u043b\u044c\u043d\u0443\u044e \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u0443\u044e \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u044c ~22 \u0422\u0411\/\u0441<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=L1%20%E2%80%94%20GPU,Context\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[41]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u0430 \u043f\u0430\u043c\u044f\u0442\u044c \u0441\u043b\u0443\u0436\u0438\u0442 \u0443\u0440\u043e\u0432\u043d\u0435\u043c G1 (\u0433\u043e\u0440\u044f\u0447\u0438\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442), \u0438 \u0437\u0430 \u0441\u0447\u0435\u0442 \u043d\u043e\u0432\u043e\u0433\u043e \u043f\u043e\u043a\u043e\u043b\u0435\u043d\u0438\u044f NVLink \u043e\u043d\u0430 \u043e\u0431\u044a\u0435\u0434\u0438\u043d\u044f\u0435\u0442\u0441\u044f \u043c\u0435\u0436\u0434\u0443 GPU \u0434\u043b\u044f \u043c\u0435\u0436-GPU \u043e\u0431\u043c\u0435\u043d\u0430 \u0434\u0430\u043d\u043d\u044b\u043c\u0438. Rubin GPU \u043e\u0431\u043b\u0430\u0434\u0430\u0435\u0442 \u0434\u043e <\/span><b>3.6 \u0422\u0411\/\u0441 \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u043e\u0439 \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u0438 NVLink<\/b><span style=\"font-weight: 400;\"> \u043d\u0430 \u0447\u0438\u043f, \u0430 \u0432 \u0441\u043e\u0441\u0442\u0430\u0432\u0435 \u0441\u0435\u0440\u0432\u0435\u0440\u043e\u0432 SuperPod GPU \u043e\u0431\u044a\u0435\u0434\u0438\u043d\u044f\u044e\u0442\u0441\u044f \u0447\u0435\u0440\u0435\u0437 \u043a\u043e\u043c\u043c\u0443\u0442\u0430\u0442\u043e\u0440\u044b <\/span><b>NVSwitch<\/b><span style=\"font-weight: 400;\">, \u043e\u0431\u0440\u0430\u0437\u0443\u044f \u0435\u0434\u0438\u043d\u0443\u044e \u0432\u044b\u0441\u043e\u043a\u043e\u0441\u043a\u043e\u0440\u043e\u0441\u0442\u043d\u0443\u044e \u043c\u0430\u0442\u0440\u0438\u0446\u0443 \u0441\u0432\u044f\u0437\u0435\u0439 \u043c\u0435\u0436\u0434\u0443 \u0434\u0435\u0441\u044f\u0442\u043a\u0430\u043c\u0438 GPU<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/#:~:text=next%20generation%20of%20AI%2C%20delivering,The%20Rubin%20GPU%20is\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[42]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/#:~:text=highlights%20a%20336,Rubin%20GPU\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[43]<\/span><\/a><span style=\"font-weight: 400;\">. NVLink\/NVSwitch \u0432 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0435 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u0432\u0430\u0436\u043d\u044b\u0445 \u0434\u043b\u044f \u0441\u043b\u0443\u0447\u0430\u0435\u0432, \u043a\u043e\u0433\u0434\u0430 \u043c\u043e\u0434\u0435\u043b\u044c \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0430 \u043f\u043e \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u043c GPU \u0438\u043b\u0438 \u043a\u043e\u0433\u0434\u0430 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e GPU \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u043d\u043e \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0442 \u043e\u0434\u0438\u043d \u0431\u043e\u043b\u044c\u0448\u043e\u0439 \u0437\u0430\u043f\u0440\u043e\u0441 \u2013 \u043e\u043d\u0438 \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044e\u0442 \u043e\u0431\u043c\u0435\u043d \u0430\u043a\u0442\u0438\u0432\u0430\u0446\u0438\u044f\u043c\u0438 \u0438 \u0447\u0430\u0441\u0442\u044c\u044e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0431\u0435\u0437 \u0443\u0447\u0430\u0441\u0442\u0438\u044f PCIe, \u0441 \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0439 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u043e\u0439. NVLink \u0442\u0430\u043a\u0436\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f \u043a\u044d\u0448-\u043a\u043e\u043e\u043f\u0435\u0440\u0430\u0446\u0438\u0438 GPU \u0441 CPU: \u0442\u0430\u043a, \u043f\u0440\u043e\u0446\u0435\u0441\u0441\u043e\u0440\u044b Grace\/Vera \u0441\u043e\u0435\u0434\u0438\u043d\u044f\u044e\u0442\u0441\u044f \u0441 GPU \u0447\u0435\u0440\u0435\u0437 NVLink-C2C (Cache Coherent NVLink) \u0441 \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u043e\u0439 \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u044c\u044e ~1.8 \u0422\u0411\/\u0441, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f CPU \u0441 \u0431\u043e\u043b\u044c\u0448\u043e\u0439 DDR-\u043f\u0430\u043c\u044f\u0442\u044c\u044e \u0432\u044b\u0441\u0442\u0443\u043f\u0430\u0442\u044c <\/span><i><span style=\"font-weight: 400;\">\u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435\u043c \u043f\u0430\u043c\u044f\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\"> \u0434\u043b\u044f GPU (\u0443\u0440\u043e\u0432\u0435\u043d\u044c G2)<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=L2%20%E2%80%94%20Near,Vera%20CPU\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[44]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=%2A%20Interconnect%3A%20NVLink\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[45]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0441\u043e\u0432\u043e\u043a\u0443\u043f\u043d\u043e\u0441\u0442\u0438, \u0441\u0432\u044f\u0437\u043a\u0430 \u201cGPU + NVLink CPU + NVSwitch\u201d \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 \u0432\u043d\u0443\u0442\u0440\u0438 \u0443\u0437\u043b\u0430 \u0433\u0438\u0431\u0440\u0438\u0434\u043d\u0443\u044e \u043f\u0430\u043c\u044f\u0442\u044c (HBM + DDR) \u043e\u0431\u044a\u0435\u043c\u043e\u043c \u0432\u043f\u043b\u043e\u0442\u044c \u0434\u043e \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u0442\u0435\u0440\u0430\u0431\u0430\u0439\u0442, \u0441 \u043d\u0430\u043d\u043e\u0441\u0435\u043a\u0443\u043d\u0434\u043d\u043e-\u043d\u0441 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430\u043c\u0438 \u0434\u043e\u0441\u0442\u0443\u043f\u0430 \u0434\u043b\u044f GPU. \u0418\u043c\u0435\u043d\u043d\u043e \u043a\u043e\u0433\u0434\u0430 \u044d\u0442\u043e\u0442 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0440\u0435\u0441\u0443\u0440\u0441 \u0438\u0441\u0447\u0435\u0440\u043f\u0430\u043d, \u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u043e\u0431\u0440\u0430\u0449\u0430\u0435\u0442\u0441\u044f \u043a \u0432\u043d\u0435\u0448\u043d\u0435\u043c\u0443 \u0443\u0440\u043e\u0432\u043d\u044e G3.5 (ICMS).<\/span><\/p>\n<p><b>BlueField-4 DPU (ICMS \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u043b\u0435\u0440).<\/b><span style=\"font-weight: 400;\"> \u0421\u0435\u0440\u0434\u0446\u0435\u043c \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0432\u044b\u0441\u0442\u0443\u043f\u0430\u0435\u0442 <\/span><b>NVIDIA BlueField-4<\/b><span style=\"font-weight: 400;\"> \u2013 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u0438\u0440\u0443\u0435\u043c\u044b\u0439 DPU 4-\u0433\u043e \u043f\u043e\u043a\u043e\u043b\u0435\u043d\u0438\u044f, \u0440\u0430\u0437\u043c\u0435\u0449\u0430\u0435\u043c\u044b\u0439 \u043a\u0430\u043a \u043d\u0430 \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u0443\u0437\u043b\u0430\u0445 Rubin, \u0442\u0430\u043a \u0438 \u0432 \u0441\u0442\u043e\u0435\u0447\u043d\u044b\u0445 \u0444\u043b\u044d\u0448-\u044d\u043d\u043a\u043b\u043e\u0436\u0430\u0445 ICMS<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20introduces%20an%20intermediate%20%E2%80%9CG3,for%20KV%20cache%20data%20characteristics\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[21]<\/span><\/a><span style=\"font-weight: 400;\">. BlueField-4 \u0441\u043e\u0447\u0435\u0442\u0430\u0435\u0442 \u0432 \u043e\u0434\u043d\u043e\u043c \u0447\u0438\u043f\u0435 \u0441\u0435\u0442\u0435\u0432\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0444\u0435\u0439\u0441 <\/span><b>800 \u0413\u0431\/\u0441 Ethernet<\/b><span style=\"font-weight: 400;\"> \u0438 64-\u044f\u0434\u0435\u0440\u043d\u044b\u0439 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u0438\u0440\u0443\u0435\u043c\u044b\u0439 CPU-\u043e\u043a\u043e\u043c\u043f\u043b\u0435\u043a\u0441 (NVIDIA Grace \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u044b Arm) \u0441 \u0441\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0439 \u0431\u044b\u0441\u0442\u0440\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u044c\u044e (LPDDR)<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=NVIDIA%20BlueField,at%20up%20to%20800%20Gb%2Fs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[46]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=BlueField,accelerator%20in%20Rubin%20compute%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[47]<\/span><\/a><span style=\"font-weight: 400;\">. \u041a\u043b\u044e\u0447\u0435\u0432\u043e\u0435 \u0434\u043e\u0441\u0442\u043e\u0438\u043d\u0441\u0442\u0432\u043e BF4 \u2013 \u043d\u0430\u043b\u0438\u0447\u0438\u0435 <\/span><i><span style=\"font-weight: 400;\">\u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u044b\u0445 \u0443\u0441\u043a\u043e\u0440\u0438\u0442\u0435\u043b\u0435\u0439 \u0432\u0432\u043e\u0434\u0430-\u0432\u044b\u0432\u043e\u0434\u0430<\/span><\/i><span style=\"font-weight: 400;\">, \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e \u043f\u043e\u043b\u0435\u0437\u043d\u044b\u0445 \u0434\u043b\u044f \u0437\u0430\u0434\u0430\u0447 KV-\u043a\u044d\u0448\u0430: \u044d\u0442\u043e engine, \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0449\u0438\u0435 <\/span><b>\u043a\u0440\u0438\u043f\u0442\u043e-\u0448\u0438\u0444\u0440\u043e\u0432\u0430\u043d\u0438\u0435 \u0438 \u0446\u0435\u043b\u043e\u0441\u0442\u043d\u043e\u0441\u0442\u043d\u044b\u0439 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u044c (CRC)<\/b><span style=\"font-weight: 400;\"> \u043d\u0430 \u043b\u0438\u043d\u0438\u0438 800 \u0413\u0431\u0438\u0442\/\u0441 \u0431\u0435\u0437 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438 \u043d\u0430 CPU<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=NVIDIA%20BlueField,at%20up%20to%20800%20Gb%2Fs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[46]<\/span><\/a><span style=\"font-weight: 400;\">, \u0430 \u0442\u0430\u043a\u0436\u0435 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b \u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u044f RDMA \u0438 NVMe-oF. \u041f\u043e \u0441\u0443\u0442\u0438, BlueField-4 \u0440\u0430\u0437\u0433\u0440\u0443\u0436\u0430\u0435\u0442 \u0445\u043e\u0441\u0442-CPU \u043e\u0442 \u043b\u044e\u0431\u044b\u0445 \u043e\u043f\u0435\u0440\u0430\u0446\u0438\u0439, \u0441\u0432\u044f\u0437\u0430\u043d\u043d\u044b\u0445 \u0441 \u043f\u0435\u0440\u0435\u0434\u0430\u0447\u0435\u0439 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430: \u043e\u043d \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0430\u044e\u0442 \u0441\u0435\u0441\u0441\u0438\u0438 NVMe-over-Fabrics, \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0443 NVMe-\u043a\u043e\u043c\u0430\u043d\u0434 (\u0432 \u0442\u043e\u043c \u0447\u0438\u0441\u043b\u0435 KV-\u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0439), RDMA-\u0442\u0440\u0430\u0444\u0438\u043a\u0430, \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442 \u043f\u0440\u043e\u0432\u0435\u0440\u043a\u0443 \u0434\u0430\u043d\u043d\u044b\u0445 \u2013 <\/span><b>\u201c\u043d\u0430 \u043b\u0435\u0442\u0443\u201d \u0441 \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u044b\u043c\u0438 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430\u043c\u0438<\/b><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20architecture%20uses%20BlueField%E2%80%914%20to,with%20predictable%2C%20low%E2%80%91latency%2C%20high%E2%80%91bandwidth%20connectivity\" target=\"_blank\" rel=\"noopener\"><b>[48]<\/b><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=%2A%20%C2%A0BlueFied,CPU%20resources%20for%20inference%20serving\" target=\"_blank\" rel=\"noopener\"><b>[49]<\/b><\/a><b>.<\/b><span style=\"font-weight: 400;\"> \u0411\u043b\u0430\u0433\u043e\u0434\u0430\u0440\u044f \u044d\u0442\u043e\u043c\u0443 NVMe SSD \u0432 ICMS-\u044d\u043d\u043a\u043b\u043e\u0436\u0430\u0445 \u0432\u043e\u0441\u043f\u0440\u0438\u043d\u0438\u043c\u0430\u044e\u0442\u0441\u044f GPU-\u0443\u0437\u043b\u0430\u043c\u0438 \u043f\u043e\u0447\u0442\u0438 \u043a\u0430\u043a <\/span><i><span style=\"font-weight: 400;\">\u043f\u0440\u043e\u0434\u043b\u0435\u043d\u0438\u0435 \u043f\u0430\u043c\u044f\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\">, \u0430 \u043d\u0435 \u201c\u043c\u0435\u0434\u043b\u0435\u043d\u043d\u043e\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435\u201d: \u043d\u0435\u0442 \u043a\u043b\u0430\u0441\u0441\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0446\u0435\u043f\u043e\u0447\u043a\u0438 \u043a\u043e\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0447\u0435\u0440\u0435\u0437 CPU (SSD -&gt; \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u043b\u0435\u0440 -&gt; \u043f\u043e \u0441\u0435\u0442\u0438 -&gt; CPU DRAM -&gt; GPU), \u0432\u043c\u0435\u0441\u0442\u043e \u044d\u0442\u043e\u0433\u043e BlueField-4 \u043d\u0430\u043f\u0440\u044f\u043c\u0443\u044e \u0434\u0432\u0438\u0433\u0430\u0435\u0442 \u0434\u0430\u043d\u043d\u044b\u0435 \u043c\u0435\u0436\u0434\u0443 \u0444\u043b\u044d\u0448 \u0438 GPU-\u043d\u043e\u0434\u0430\u043c\u0438 \u043f\u043e RDMA, \u043c\u0438\u043d\u0443\u044f \u0443\u0437\u043a\u043e\u0435 \u043c\u0435\u0441\u0442\u043e x86 CPU<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Traditional%20storage%20architectures%20introduce%20multiple,metrics%20that%20determine%20inference%20responsiveness\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[50]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=storage%20controller%2C%20controller%20to%20file,metrics%20that%20determine%20inference%20responsiveness\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[51]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u0441\u043d\u0438\u043c\u0430\u0435\u0442 \u043a\u043e\u043d\u043a\u0443\u0440\u0435\u043d\u0446\u0438\u044e \u0437\u0430 CPU-\u0440\u0435\u0441\u0443\u0440\u0441\u044b (\u0432\u0430\u0436\u043d\u043e, \u043a\u043e\u0433\u0434\u0430 \u0434\u0435\u0441\u044f\u0442\u043a\u0438 GPU \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e \u0437\u0430\u043f\u0440\u0430\u0448\u0438\u0432\u0430\u044e\u0442 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442) \u0438 \u0443\u0441\u0442\u0440\u0430\u043d\u044f\u0435\u0442 \u0438\u0437\u0431\u044b\u0442\u043e\u0447\u043d\u044b\u0435 \u043a\u043e\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f, \u0440\u0430\u0434\u0438\u043a\u0430\u043b\u044c\u043d\u043e \u0441\u043d\u0438\u0436\u0430\u044f \u0441\u0443\u043c\u043c\u0430\u0440\u043d\u0443\u044e \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0443 \u0434\u043e\u0441\u0442\u0430\u0432\u043a\u0438 \u0434\u0430\u043d\u043d\u044b\u0445<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=the%20memory%20capacity%20of%20individual,GPU%20and%20CPU%20resources\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[52]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Concurrency%20presents%20an%20additional%20challenge,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[53]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0412 \u043a\u0430\u0436\u0434\u043e\u0439 \u0441\u0442\u043e\u0439\u043a\u0435 AI-\u043f\u043e\u0434\u0430 NVIDIA \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u0440\u0430\u0437\u043c\u0435\u0449\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e <\/span><b>ICMS-\u044d\u043d\u043a\u043b\u043e\u0436\u0435\u0439<\/b><span style=\"font-weight: 400;\"> (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, 16 \u0448\u0430\u0441\u0441\u0438 \u043d\u0430 \u0441\u0442\u043e\u0439\u043a\u0443), \u0432 \u043a\u0430\u0436\u0434\u043e\u043c \u2013 \u043f\u043e 4 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u043b\u0435\u0440\u0430 BlueField-4. \u0417\u0430 \u043a\u0430\u0436\u0434\u044b\u043c DPU \u043c\u043e\u0436\u0435\u0442 \u0441\u0442\u043e\u044f\u0442\u044c \u043f\u043e\u0440\u044f\u0434\u043a\u0430 <\/span><b>150 \u0422\u0411 NVMe-\u0444\u043b\u044d\u0448<\/b><span style=\"font-weight: 400;\"> \u0435\u043c\u043a\u043e\u0441\u0442\u0438<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=rack%20contains%2016%20storage%20enclosures,%3D%209%2C600%20TB\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[54]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=In%20other%20words%2C%20this%20infrastructure,It%20doesn%E2%80%99t%20do%20anything%20else\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[22]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0441\u043e\u0432\u043e\u043a\u0443\u043f\u043d\u043e \u043e\u0434\u0438\u043d \u043f\u043e\u0434 \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0434\u0435\u0440\u0436\u0430\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043f\u0435\u0442\u0430\u0431\u0430\u0439\u0442 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u044f Vera Rubin SuperPod: 1152 GPU Rubin \u0438 64 DPU BF4 \u0441 ~9.6 \u041f\u0411 \u0444\u043b\u044d\u0448 \u2013 \u0447\u0442\u043e \u044d\u043a\u0432\u0438\u0432\u0430\u043b\u0435\u043d\u0442\u043d\u043e <\/span><b>\u0434\u043e 16 \u0422\u0411 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043d\u0430 \u043e\u0434\u0438\u043d GPU<\/b><span style=\"font-weight: 400;\">)<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=Ozery%20says%20there%20are%2016,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[55]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u043e\u0434\u0447\u0435\u0440\u043a\u043d\u0435\u043c, \u0447\u0442\u043e \u044d\u0442\u0438 \u0444\u043b\u044d\u0448-\u044d\u043d\u043a\u043b\u043e\u0436\u0438 \u2013 <\/span><i><span style=\"font-weight: 400;\">\u043d\u0435 \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0430\u043b\u044c\u043d\u044b\u0435 \u0421\u0425\u0414<\/span><\/i><span style=\"font-weight: 400;\">, \u0430 \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e \u0441\u043a\u043e\u043d\u0441\u0442\u0440\u0443\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 <\/span><b>JBOF<\/b><span style=\"font-weight: 400;\"> (Just a Bunch of Flash) \u0434\u043b\u044f KV-\u043a\u044d\u0448\u0430<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[56]<\/span><\/a><span style=\"font-weight: 400;\">. \u041e\u043d\u0438 \u043d\u0435 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u044e\u0442 \u0442\u0438\u043f\u0438\u0447\u043d\u044b\u0435 \u0444\u0443\u043d\u043a\u0446\u0438\u0438 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f (RAID, \u0434\u0435\u0434\u0443\u043f\u043b\u0438\u043a\u0430\u0446\u0438\u044f, \u0440\u0435\u043f\u043b\u0438\u043a\u0430\u0446\u0438\u044f \u0438 \u0442.\u043f.), \u0430 \u0437\u0430\u0442\u043e\u0447\u0435\u043d\u044b \u043d\u0430 \u043e\u0434\u043d\u0443 \u0437\u0430\u0434\u0430\u0447\u0443 \u2013 \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u0431\u044b\u0441\u0442\u0440\u043e \u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0438 \u0432\u044b\u0434\u0430\u0432\u0430\u0442\u044c <\/span><i><span style=\"font-weight: 400;\">\u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e terabyte \u201c\u0433\u043e\u0440\u044f\u0447\u0438\u0445\u201d \u043a\u043b\u044e\u0447-\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u0431\u043b\u043e\u043a\u043e\u0432<\/span><\/i><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=A%20KV%20cache,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[23]<\/span><\/a><span style=\"font-weight: 400;\">. \u0417\u0430 \u043d\u0430\u0434\u0435\u0436\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u043e\u0442\u0432\u0435\u0447\u0430\u0435\u0442 \u0432\u0435\u0440\u0445\u043d\u0438\u0439 \u0443\u0440\u043e\u0432\u0435\u043d\u044c G4, \u0430 ICMS \u0436\u0435 \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u0435\u0442\u0441\u044f \u043a\u0430\u043a <\/span><i><span style=\"font-weight: 400;\">\u043a\u044d\u0448-\u043f\u0430\u043c\u044f\u0442\u044c \u0431\u0435\u0437 \u0434\u043e\u043b\u0433\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u0443\u0441\u0442\u043e\u0439\u0447\u0438\u0432\u043e\u0441\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\"> (stateless, ephemeral storage)<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=On%20the%20NVIDIA%20Rubin%20platform%2C,efficient%20scaling%20of%20agentic%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[57]<\/span><\/a><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=KV%20cache%20capacity%20at%20the,efficient%20scaling%20of%20agentic%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[58]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u0430\u044f \u0444\u0438\u043b\u043e\u0441\u043e\u0444\u0438\u044f \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u043b\u0430 NVIDIA \u0438\u0441\u043a\u043b\u044e\u0447\u0438\u0442\u044c \u0438\u0437 \u0442\u0440\u0430\u043a\u0442\u0430 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u043b\u0438\u0448\u043d\u0438\u0435 \u0441\u0435\u0440\u0432\u0438\u0441\u044b \u0438 \u0434\u043e\u0431\u0438\u0442\u044c\u0441\u044f \u0433\u043e\u0440\u0430\u0437\u0434\u043e \u0431\u043e\u043b\u0435\u0435 \u0432\u044b\u0441\u043e\u043a\u043e\u0439 \u044d\u043d\u0435\u0440\u0433\u043e\u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u0438: <\/span><b>\u0434\u043e 5\u00d7 \u0431\u043e\u043b\u044c\u0448\u0435 \u0442\u043e\u043a\u0435\u043d\u043e\u0432 \u043d\u0430 \u0432\u0430\u0442\u0442<\/b><span style=\"font-weight: 400;\">, \u0442.\u043a. \u043d\u0438 \u043e\u0434\u0438\u043d \u0432\u0430\u0442\u0442 \u043d\u0435 \u0442\u0440\u0430\u0442\u0438\u0442\u0441\u044f \u0432\u043f\u0443\u0441\u0442\u0443\u044e \u043d\u0430 \u0444\u043e\u043d\u043e\u0432\u044b\u0435 \u0437\u0430\u0434\u0430\u0447\u0438 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u2013 \u0432\u0441\u0435 \u0438\u0434\u0451\u0442 \u043d\u0430 \u043f\u043e\u043b\u0435\u0437\u043d\u0443\u044e \u0440\u0430\u0431\u043e\u0442\u0443 GPU<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=KV%20cache%20fundamentally%20differs%20from,purpose%20storage%20approaches\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[26]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><b>AI-\u0441\u0435\u0442\u044c Spectrum-X.<\/b><span style=\"font-weight: 400;\"> \u0414\u043b\u044f \u0441\u0432\u044f\u0437\u0438 \u043c\u0435\u0436\u0434\u0443 GPU-\u0443\u0437\u043b\u0430\u043c\u0438 \u0438 ICMS-\u043d\u043e\u0434\u0430\u043c\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0432\u044b\u0441\u043e\u043a\u043e\u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u0430\u044f \u0444\u0430\u0431\u0440\u0438\u043a\u0430 <\/span><b>NVIDIA Spectrum-X Ethernet<\/b><span style=\"font-weight: 400;\"> \u0441 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u043e\u0439 RDMA\/RoCEv2. \u041a\u043e\u043c\u043c\u0443\u0442\u0430\u0442\u043e\u0440\u044b Spectrum-6 (\u0441\u0435\u0440\u0438\u044f <\/span><b>Spectrum-X<\/b><span style=\"font-weight: 400;\"> \u2013 \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u043d\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 NVIDIA \u0438 Mellanox) \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044e\u0442 \u0441\u0443\u043c\u043c\u0430\u0440\u043d\u0443\u044e \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u0443\u044e \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u044c \u0434\u043e 102.4 \u0422\u0431\u0438\u0442\/\u0441 \u0438 \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u044b\u0435 \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c\u044b \u0434\u043b\u044f <\/span><i><span style=\"font-weight: 400;\">\u043f\u0440\u0435\u0434\u0441\u043a\u0430\u0437\u0443\u0435\u043c\u043e-\u043d\u0438\u0437\u043a\u043e\u0439 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0438<\/span><\/i><span style=\"font-weight: 400;\"> \u043d\u0430 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0430\u0445 \u0432\u0441\u0435\u0433\u043e \u043a\u043b\u0430\u0441\u0442\u0435\u0440\u0430<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,fabrics%2C%20orchestration%20frameworks%2C%20and%20storage\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[59]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=platforms\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[60]<\/span><\/a><span style=\"font-weight: 400;\">. \u041e\u043d\u0438 \u0440\u0435\u0430\u043b\u0438\u0437\u0443\u044e\u0442 <\/span><i><span style=\"font-weight: 400;\">congestion control<\/span><\/i><span style=\"font-weight: 400;\">, \u0430\u0434\u0430\u043f\u0442\u0438\u0432\u043d\u044b\u0439 \u0440\u043e\u0443\u0442\u0438\u043d\u0433 \u0438 \u0433\u0430\u0440\u0430\u043d\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043e\u0442\u0441\u0443\u0442\u0441\u0442\u0432\u0438\u0435 \u043f\u043e\u0442\u0435\u0440\u044c (lossless), \u0447\u0442\u043e \u043c\u0438\u043d\u0438\u043c\u0438\u0437\u0438\u0440\u0443\u0435\u0442 \u0434\u0436\u0438\u0442\u0442\u0435\u0440 \u0438 <\/span><i><span style=\"font-weight: 400;\">tail latency<\/span><\/i><span style=\"font-weight: 400;\"> (\u0445\u0432\u043e\u0441\u0442\u043e\u0432\u044b\u0435 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0438) \u0434\u0430\u0436\u0435 \u043f\u043e\u0434 \u043f\u043e\u043b\u043d\u043e\u0439 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u043e\u0439<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Spectrum,packet%20loss%20under%20heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[61]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=latency%2C%20and%20packet%20loss%20under,heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[62]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412\u0430\u0436\u043d\u043e\u0439 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u044c\u044e Spectrum-X \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u0430\u044f \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0430 <\/span><i><span style=\"font-weight: 400;\">\u0438\u0437\u043e\u043b\u044f\u0446\u0438\u0438 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\"> \u2013 \u0442\u043e \u0435\u0441\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441-\u043f\u043e\u0442\u043e\u043a\u043e\u0432 (\u043e\u0442 \u0440\u0430\u0437\u043d\u044b\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439 \u0438\u043b\u0438 \u0430\u0440\u0435\u043d\u0434\u0430\u0442\u043e\u0440\u043e\u0432) \u043c\u043e\u0433\u0443\u0442 \u0441\u043e\u0441\u0443\u0449\u0435\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0431\u0435\u0437 \u0432\u043b\u0438\u044f\u043d\u0438\u044f \u0434\u0440\u0443\u0433 \u043d\u0430 \u0434\u0440\u0443\u0433\u0430 \u043f\u043e \u0442\u0440\u0430\u0444\u0438\u043a\u0443<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=latency%2C%20and%20packet%20loss%20under,heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[62]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u043a\u0440\u0438\u0442\u0438\u0447\u043d\u043e \u0434\u043b\u044f <\/span><i><span style=\"font-weight: 400;\">multi-tenant<\/span><\/i><span style=\"font-weight: 400;\"> \u0441\u0446\u0435\u043d\u0430\u0440\u0438\u0435\u0432, \u0433\u0434\u0435 \u043e\u0434\u0438\u043d pod \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u0435\u0442 \u0440\u0430\u0437\u043d\u044b\u0445 \u043a\u043b\u0438\u0435\u043d\u0442\u043e\u0432: \u0441\u0435\u0442\u044c \u0433\u0430\u0440\u0430\u043d\u0442\u0438\u0440\u0443\u0435\u0442 \u0441\u0442\u0430\u0431\u0438\u043b\u044c\u043d\u0443\u044e \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0443 \u0438 \u043f\u0440\u043e\u043f\u0443\u0441\u043a\u043d\u0443\u044e \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0432\u0441\u0435\u0445. \u0412 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u0435, Ethernet-\u0444\u0430\u0431\u0440\u0438\u043a\u0430 NVIDIA \u0432\u044b\u0441\u0442\u0443\u043f\u0430\u0435\u0442 \u043a\u0430\u043a \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435 \u043f\u0430\u043c\u044f\u0442\u0438 \u2013 GPU \u043e\u0431\u0440\u0430\u0449\u0430\u044e\u0442\u0441\u044f \u043a \u0443\u0434\u0430\u043b\u0435\u043d\u043d\u043e\u043c\u0443 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0443 \u0447\u0435\u0440\u0435\u0437 RDMA \u043f\u043e\u0447\u0442\u0438 \u0442\u0430\u043a \u0436\u0435, \u043a\u0430\u043a \u043a \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438, \u0438 \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u0435\u0442\u0441\u044f \u0431\u043b\u0438\u0437\u043a\u0430\u044f \u043a \u043b\u0438\u043d\u0435\u0439\u043d\u043e\u0439 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u043c\u043e\u0441\u0442\u044c \u0431\u0435\u0437 \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u201c\u0443\u0437\u043a\u0438\u0445 \u043c\u0435\u0441\u0442\u201d NAS\/\u0441\u0435\u0440\u0432\u0435\u0440\u043e\u0432<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Spectrum,Access\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[63]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Memory%20disaggregation%20only%20works%20if,millisecond%20latency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[64]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h2><strong>\u041f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u044b\u0439 \u0441\u0442\u0435\u043a \u0438 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044f \u0441 NVIDIA AI Enterprise<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">\u0410\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u044b\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u0442\u0440\u0435\u0431\u0443\u044e\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0438 \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u041f\u041e. NVIDIA \u0441\u043e\u0437\u0434\u0430\u043b\u0430 \u0446\u0435\u043b\u043e\u0441\u0442\u043d\u044b\u0439 <\/span><b>\u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u044b\u0439 \u0441\u0442\u0435\u043a<\/b><span style=\"font-weight: 400;\"> \u0434\u043b\u044f \u043a\u043e\u043e\u0440\u0434\u0438\u043d\u0430\u0446\u0438\u0438 \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0438 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u0438 \u0441 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u043c\u0438 AI-\u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0430\u043c\u0438:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>NVIDIA Dynamo<\/b><span style=\"font-weight: 400;\"> \u2013 \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442 (\u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e, \u0447\u0430\u0441\u0442\u044c NVIDIA AI Enterprise) \u0434\u043b\u044f \u043e\u0440\u043a\u0435\u0441\u0442\u0440\u0430\u0446\u0438\u0438 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u0434\u0430\u043d\u043d\u044b\u0445. \u041e\u043d \u043e\u0442\u0432\u0435\u0447\u0430\u0435\u0442 \u0437\u0430 \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u0438\u0435 \u0438 \u0441\u043e\u0433\u043b\u0430\u0441\u043e\u0432\u0430\u043d\u043d\u043e\u0441\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043c\u0435\u0436\u0434\u0443 \u0443\u0437\u043b\u0430\u043c\u0438: \u0433\u0434\u0435 \u0445\u0440\u0430\u043d\u0438\u0442\u0441\u044f \u043a\u0430\u043a\u043e\u0439 KV-\u0431\u043b\u043e\u043a, \u0430\u043a\u0442\u0443\u0430\u043b\u0435\u043d \u043b\u0438 \u043e\u043d, \u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043a\u043e\u043f\u0438\u0439 \u043d\u0443\u0436\u043d\u043e \u0438 \u0442.\u0434.<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20ICMS%20platform%20highlights%20broader,infrastructure%20requirements%20are%20fundamentally%20changing\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[65]<\/span><\/a><span style=\"font-weight: 400;\">. Dynamo \u0432\u043c\u0435\u0441\u0442\u0435 \u0441 NIXL \u0440\u0435\u0430\u043b\u0438\u0437\u0443\u0435\u0442 \u043b\u043e\u0433\u0438\u043a\u0443 <\/span><i><span style=\"font-weight: 400;\">prefill\/decode<\/span><\/i><span style=\"font-weight: 400;\">, \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044f \u043e\u0431\u0449\u0438\u0439 KV-\u043a\u044d\u0448 \u0434\u043b\u044f \u043c\u043d\u043e\u0436\u0435\u0441\u0442\u0432\u0430 GPU, \u0438 \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0438 \u043f\u0435\u0440\u0435\u043d\u043e\u0441\u0435 \u0437\u0430\u0434\u0430\u0447 \u043c\u0435\u0436\u0434\u0443 \u0443\u0437\u043b\u0430\u043c\u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u043b\u0430\u0441\u044c \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 (\u0442.\u0435. \u0430\u0433\u0435\u043d\u0442 \u201c\u043f\u0435\u0440\u0435\u0435\u0437\u0436\u0430\u0435\u0442\u201d \u0432\u043c\u0435\u0441\u0442\u0435 \u0441\u043e \u0441\u0432\u043e\u0438\u043c \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435\u043c)<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=NVIDIA%20Dynamo%20and%20NIXL%20coordinate,generation%20AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[66]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=At%20the%20inference%20layer%2C%20NVIDIA,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[40]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u043e \u0441\u0443\u0442\u0438, \u044d\u0442\u043e \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0439 \u043c\u0435\u043d\u0435\u0434\u0436\u0435\u0440 KV-\u043a\u044d\u0448\u0430 (KV Cache Manager), \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0439 \u0441 \u043e\u0440\u043a\u0435\u0441\u0442\u0440\u0430\u0442\u043e\u0440\u0430\u043c\u0438 (\u0442\u0438\u043f\u0430 Kubernetes).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>NVIDIA Grove<\/b><span style=\"font-weight: 400;\"> \u2013 \u0442\u0430\u043a NVIDIA \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u0442 \u0442\u043e\u043f\u043e\u043b\u043e\u0433\u0438\u044f-\u043e\u0440\u0438\u0435\u043d\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0439 \u043f\u043b\u0430\u043d\u0438\u0440\u043e\u0432\u0449\u0438\u043a (\u0432\u0435\u0440\u043e\u044f\u0442\u043d\u043e, \u043d\u0430\u0434\u0441\u0442\u0440\u043e\u0439\u043a\u0430 \u043d\u0430\u0434 Kubernetes), \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0440\u0430\u0437\u043c\u0435\u0449\u0430\u0435\u0442 \u0437\u0430\u0434\u0430\u0447\u0438 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u0441 \u0443\u0447\u0435\u0442\u043e\u043c \u043b\u043e\u043a\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u0438\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=At%20the%20inference%20layer%2C%20NVIDIA,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[40]<\/span><\/a><span style=\"font-weight: 400;\">. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, Grove \u043f\u043e\u0441\u0442\u0430\u0440\u0430\u0435\u0442\u0441\u044f \u0437\u0430\u043f\u0443\u0441\u0442\u0438\u0442\u044c \u043f\u043e\u0432\u0442\u043e\u0440\u043d\u044b\u0439 \u0437\u0430\u043f\u0440\u043e\u0441 \u0442\u043e\u0433\u043e \u0436\u0435 \u0434\u0438\u0430\u043b\u043e\u0433\u0430 \u043d\u0430 \u0442\u043e\u043c \u0436\u0435 \u0441\u0435\u0440\u0432\u0435\u0440\u0435 \u0438\u043b\u0438 \u0441\u0442\u043e\u0439\u043a\u0435, \u0433\u0434\u0435 \u043e\u0441\u0442\u0430\u043b\u0441\u044f \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0439 KV-\u043a\u044d\u0448, \u0447\u0442\u043e\u0431\u044b \u043c\u0438\u043d\u0438\u043c\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0441\u0435\u0442\u0435\u0432\u044b\u0435 \u043e\u0431\u0440\u0430\u0449\u0435\u043d\u0438\u044f<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=At%20the%20inference%20layer%2C%20NVIDIA,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[40]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=orchestration%20layer%20using%20NVIDIA%20Grove,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[67]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u043f\u043e\u0432\u044b\u0448\u0430\u0435\u0442 cache-hit rate \u0438 \u0441\u043a\u0432\u043e\u0437\u043d\u0443\u044e \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043f\u0440\u0438 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043d\u0430 \u0434\u0435\u0441\u044f\u0442\u043a\u0438\/\u0441\u043e\u0442\u043d\u0438 \u0443\u0437\u043b\u043e\u0432.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>NVIDIA DOCA (Data-Center Offload and Control Architecture)<\/b><span style=\"font-weight: 400;\"> \u2013 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u044b\u0439 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a \u0434\u043b\u044f BlueField DPU. \u0412 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0435 ICMS, NVIDIA \u0434\u043e\u0431\u0430\u0432\u0438\u043b\u0430 \u0432 DOCA \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u043b\u043e\u0439 <\/span><i><span style=\"font-weight: 400;\">KV-communication\/storage<\/span><\/i><span style=\"font-weight: 400;\">, \u0442\u043e \u0435\u0441\u0442\u044c API \u0438 \u0441\u0435\u0440\u0432\u0438\u0441\u044b \u0434\u043b\u044f \u0440\u0430\u0431\u043e\u0442\u044b \u0441 KV-\u043a\u044d\u0448\u0435\u043c \u043a\u0430\u043a \u0441 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u043c \u0442\u0438\u043f\u043e\u043c \u0434\u0430\u043d\u043d\u044b\u0445<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Additionally%2C%20the%20NVIDIA%20DOCA%20framework,from%20the%20underlying%20flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[68]<\/span><\/a><span style=\"font-weight: 400;\">. DOCA \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u0438\u0437\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u0438\u043d\u0442\u0435\u0440\u0444\u0435\u0439\u0441\u044b, \u0447\u0435\u0440\u0435\u0437 \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0432\u043d\u0435\u0448\u043d\u0438\u0435 \u0441\u0438\u0441\u0442\u0435\u043c\u044b \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u043c\u043e\u0433\u0443\u0442 \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0441 ICMS-\u0441\u043b\u043e\u0435\u043c (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c \u0440\u0430\u0437\u043c\u0435\u0449\u0435\u043d\u0438\u0435\u043c \u0434\u0430\u043d\u043d\u044b\u0445 \u0443\u0440\u043e\u0432\u043d\u044f G3.5)<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[69]<\/span><\/a><span style=\"font-weight: 400;\">. BlueField-4, \u0440\u0430\u0431\u043e\u0442\u0430\u044e\u0449\u0438\u0439 \u043f\u043e\u0434 \u0443\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u0435\u043c DOCA, \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0435\u0442, \u0433\u0440\u0443\u0431\u043e \u0433\u043e\u0432\u043e\u0440\u044f, \u0440\u043e\u043b\u044c <\/span><i><span style=\"font-weight: 400;\">runtime<\/span><\/i><span style=\"font-weight: 400;\"> \u0434\u043b\u044f KV-\u043a\u044d\u0448\u0430 \u2013 \u043e\u043d \u043f\u0440\u0438\u043d\u0438\u043c\u0430\u0435\u0442 \u043a\u043e\u043c\u0430\u043d\u0434\u044b \u043e\u0442 Dynamo\/NIXL (\u0447\u0435\u0440\u0435\u0437 DOCA API) \u043f\u043e \u0442\u043e\u043c\u0443, \u043a\u0430\u043a\u0438\u0435 \u0431\u043b\u043e\u043a\u0438 \u043a\u044d\u0448\u0430 \u0447\u0438\u0442\u0430\u0442\u044c\/\u043f\u0438\u0441\u0430\u0442\u044c \u043d\u0430 SSD, \u0448\u0438\u0444\u0440\u043e\u0432\u0430\u0442\u044c, \u043f\u0435\u0440\u0435\u0434\u0430\u0432\u0430\u0442\u044c \u043f\u043e \u0441\u0435\u0442\u0438 \u0438 \u0442.\u0434., \u0440\u0435\u0430\u043b\u0438\u0437\u0443\u044f \u0438\u0445 \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u043d\u0430 \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u043e\u043c \u0443\u0440\u043e\u0432\u043d\u0435<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Additionally%2C%20the%20NVIDIA%20DOCA%20framework,from%20the%20underlying%20flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[68]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=align%20with%20the%20block,as%20a%20distinct%20data%20class\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[70]<\/span><\/a><span style=\"font-weight: 400;\">. DOCA \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u043e\u0442\u043a\u0440\u044b\u0442\u0430 \u0434\u043b\u044f \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u043e\u0432: NVIDIA \u0437\u0430\u044f\u0432\u043b\u044f\u0435\u0442 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 open-\u0438\u043d\u0442\u0435\u0440\u0444\u0435\u0439\u0441\u043e\u0432, \u0447\u0442\u043e\u0431\u044b \u0441\u0442\u043e\u0440\u043e\u043d\u043d\u0438\u0435 \u0432\u0435\u043d\u0434\u043e\u0440\u044b \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449 \u043c\u043e\u0433\u043b\u0438 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0430\u0442\u044c \u0441\u0432\u043e\u0438 \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u043a \u043d\u043e\u0432\u043e\u043c\u0443 <\/span><i><span style=\"font-weight: 400;\">tier<\/span><\/i><span style=\"font-weight: 400;\"> \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[69]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>NVIDIA AI Enterprise \u0438 NIM.<\/b><span style=\"font-weight: 400;\"> \u0412\u0435\u0441\u044c \u043a\u043e\u043c\u043f\u043b\u0435\u043a\u0441 \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0432 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0443 AI Enterprise \u2013 \u043a\u043e\u043c\u043c\u0435\u0440\u0447\u0435\u0441\u043a\u0443\u044e \u043e\u0431\u043e\u043b\u043e\u0447\u043a\u0443 NVIDIA \u0434\u043b\u044f \u0440\u0430\u0437\u0432\u0435\u0440\u0442\u044b\u0432\u0430\u043d\u0438\u044f AI \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u044f\u0442\u0438\u044f. \u041e\u0442\u0434\u0435\u043b\u044c\u043d\u043e \u0443\u043f\u043e\u043c\u0438\u043d\u0430\u0435\u0442\u0441\u044f \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442 <\/span><b>NIM<\/b><span style=\"font-weight: 400;\"> (\u0432 \u043c\u0430\u0442\u0435\u0440\u0438\u0430\u043b\u0430\u0445 \u043f\u0430\u0440\u0442\u043d\u0451\u0440\u043e\u0432 \u0435\u0433\u043e \u0440\u0430\u0441\u0448\u0438\u0444\u0440\u043e\u0432\u044b\u0432\u0430\u044e\u0442 \u043a\u0430\u043a NVIDIA Inference Microservice \u0438\u043b\u0438 Manager) \u2013 \u043d\u0430\u0431\u043e\u0440 \u043c\u0438\u043a\u0440\u043e\u0441\u0435\u0440\u0432\u0438\u0441\u043e\u0432 \u0438 API \u0434\u043b\u044f \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0447\u0438\u043a\u043e\u0432, \u0443\u043f\u0440\u043e\u0449\u0430\u044e\u0449\u0438\u0439 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Hardware%20needs%20software%20to%20manage,software%20stack%20provides%20that%20through\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[71]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,across%20nodes%20and%20maintains%20consistency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[72]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u043e \u0441\u0443\u0442\u0438, NIM \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0432\u044b\u0437\u0432\u0430\u0442\u044c \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0435 \u0444\u0443\u043d\u043a\u0446\u0438\u0438 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0438\u043b\u0438 \u0437\u0430\u0433\u0440\u0443\u0437\u0438\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0441 \u0438\u0434\u0435\u043d\u0442\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u043e\u043c \u0441\u0435\u0441\u0441\u0438\u0438) \u0447\u0435\u0440\u0435\u0437 \u0432\u044b\u0441\u043e\u043a\u043e\u0443\u0440\u043e\u0432\u043d\u0435\u0432\u044b\u0435 \u0441\u0435\u0440\u0432\u0438\u0441\u044b, \u0431\u0435\u0437 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u0438 \u0432\u0440\u0443\u0447\u043d\u0443\u044e \u043e\u043f\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u0442\u044c RDMA \u0438\u043b\u0438 NIXL. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c, \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0447\u0438\u043a\u0438 \u043c\u043e\u0433\u0443\u0442 \u0432\u043e\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u043d\u043e\u0432\u044b\u043c \u0443\u0440\u043e\u0432\u043d\u0435\u043c \u043f\u0430\u043c\u044f\u0442\u0438 \u0432 \u0441\u0432\u043e\u0438\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f\u0445 (\u0447\u0430\u0442-\u0431\u043e\u0442\u0430\u0445, \u043f\u043e\u0438\u0441\u043a\u043e\u0432\u044b\u0445 \u0434\u0432\u0438\u0436\u043a\u0430\u0445 \u0438 \u043f\u0440.)\u00a0<\/span><\/li>\n<li aria-level=\"1\"><b>LMCache, Triton \u0438 \u0434\u0440.:<\/b> \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043d\u0435 \u0438\u0437\u043e\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u0430 \u043e\u0442 \u043e\u0441\u0442\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u044b. \u041a\u0430\u043a \u0443\u0436\u0435 \u043e\u0442\u043c\u0435\u0447\u0430\u043b\u043e\u0441\u044c, \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u0435\u0442 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u0439 \u043f\u0440\u043e\u0435\u043a\u0442 <b>LMCache<\/b> (Long Memory Cache) \u043e\u0442 \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0438\u0442\u0435\u0442\u0430 \u0427\u0438\u043a\u0430\u0433\u043e, \u043d\u0430\u0446\u0435\u043b\u0435\u043d\u043d\u044b\u0439 \u043d\u0430 \u0443\u043d\u0438\u0432\u0435\u0440\u0441\u0430\u043b\u044c\u043d\u043e\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 KV-\u043a\u044d\u0448\u0430 \u0434\u043b\u044f LLM \u0438 \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u044b\u0439 \u0441 \u0440\u0430\u0437\u043d\u044b\u043c\u0438 \u0436\u0435\u043b\u0435\u0437\u043d\u044b\u043c\u0438 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430\u043c\u0438<a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Open\" target=\"_blank\" rel=\"noopener\">[37]<\/a>. NVIDIA \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044e LMCache (\u0432 \u0447\u0430\u0441\u0442\u043d\u043e\u0441\u0442\u0438, Dell \u0438 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0434\u0440\u0443\u0433\u0438\u0435 \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u044b \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442 \u0441\u0432\u044f\u0437\u043a\u0443 LMCache+NIXL \u043d\u0430 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u0445 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u0445, \u043e \u0447\u0435\u043c \u043d\u0438\u0436\u0435)<a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=For%20environments%20without%20NVIDIA%20BlueField,extension%20of%20your%20GPU%20memory\" target=\"_blank\" rel=\"noopener\">[73]<\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,across%20these%20three%20storage%20engines\" target=\"_blank\" rel=\"noopener\">[74]<\/a>. \u0418\u043d\u0444\u0435\u0440\u0435\u043d\u0441-\u0441\u0435\u0440\u0432\u0435\u0440 <b>Triton<\/b> \u0442\u0430\u043a\u0436\u0435 \u0431\u0443\u0434\u0435\u0442 \u0441\u043f\u043e\u0441\u043e\u0431\u0435\u043d \u0440\u0430\u0431\u043e\u0442\u0430\u0442\u044c \u0441 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u044c\u044e \u2013 \u043e\u0436\u0438\u0434\u0430\u0435\u0442\u0441\u044f, \u0447\u0442\u043e \u0431\u0443\u0434\u0443\u0449\u0438\u0435 \u0432\u0435\u0440\u0441\u0438\u0438 Triton \u043d\u0430\u0443\u0447\u0430\u0442\u0441\u044f \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0441 Dynamo\/NIXL \u0434\u043b\u044f \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0438 \u0434\u043b\u0438\u043d\u043d\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043e\u0432. \u0423\u0436\u0435 \u0441\u0435\u0439\u0447\u0430\u0441 \u0442\u0430\u043a\u0438\u0435 \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0430\u0446\u0438\u0438 \u0432\u043d\u0435\u0434\u0440\u044f\u044e\u0442\u0441\u044f \u0432 open-source inference-\u0434\u0432\u0438\u0436\u043a\u0438: \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, <i>vLLM \u0441 PagedAttention<\/i> \u0443 AMD \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0438\u0440\u0443\u0435\u0442 \u0440\u0430\u0437\u043c\u0435\u0449\u0435\u043d\u0438\u0435 KV \u0432\u043d\u0443\u0442\u0440\u0438 \u043f\u0430\u043c\u044f\u0442\u0438 GPU<a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\">[75]<\/a>, \u0438 \u043f\u0440\u0438 \u043f\u043e\u043c\u043e\u0449\u0438 LMCache \u043c\u043e\u0436\u0435\u0442 \u0432\u044b\u0433\u0440\u0443\u0436\u0430\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0432\u043e \u0432\u043d\u0435\u0448\u043d\u0435\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u043d\u0430 \u0431\u043e\u043b\u0435\u0435 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u043d\u043e\u043c \u043e\u0431\u043e\u0440\u0443\u0434\u043e\u0432\u0430\u043d\u0438\u0438. \u0412\u0441\u0435 \u044d\u0442\u043e \u0433\u043e\u0432\u043e\u0440\u0438\u0442 \u043e \u0442\u043e\u043c, \u0447\u0442\u043e \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u044b\u0439 \u0441\u0442\u0435\u043a \u0432\u043e\u043a\u0440\u0443\u0433 KV-\u043a\u044d\u0448\u0430 \u0430\u043a\u0442\u0438\u0432\u043d\u043e \u0444\u043e\u0440\u043c\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0438 \u0440\u0430\u0441\u0448\u0438\u0440\u044f\u0435\u0442\u0441\u044f, \u043f\u0440\u0438\u0447\u0435\u043c NVIDIA \u0437\u0430\u0434\u0430\u0435\u0442 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442, \u043d\u043e \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0438 open-source \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u0434\u043b\u044f \u0448\u0438\u0440\u043e\u043a\u043e\u0439 \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u044b.<\/li>\n<\/ul>\n<p><b>\u0418\u0442\u043e\u0433:<\/b><span style=\"font-weight: 400;\"> \u0441\u043e\u0447\u0435\u0442\u0430\u043d\u0438\u0435 \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u044b\u0445 \u0438 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u044b\u0445 \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442\u043e\u0432 NVIDIA \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">\u0441\u043a\u0432\u043e\u0437\u043d\u0443\u044e \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\"> \u2013 \u043e\u0442 \u0443\u0440\u043e\u0432\u043d\u044f \u0441\u0435\u0442\u0435\u0432\u043e\u0433\u043e \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b\u0430 (RDMA, NVMeoF), \u0447\u0435\u0440\u0435\u0437 DPU-\u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u0435, \u0434\u043e \u0443\u0440\u043e\u0432\u043d\u044f \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u043e\u0432 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u0438 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439. \u042d\u0442\u043e \u043f\u0440\u0435\u0432\u0440\u0430\u0449\u0430\u0435\u0442 \u0434\u043e\u043b\u0433\u043e\u0441\u0440\u043e\u0447\u043d\u044b\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u0432 \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0435\u043c\u044b\u0439 \u0440\u0435\u0441\u0443\u0440\u0441, \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b\u0439 \u201c\u0438\u0437 \u043a\u043e\u0440\u043e\u0431\u043a\u0438\u201d \u0434\u043b\u044f \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0447\u0438\u043a\u043e\u0432 AI-\u0441\u0438\u0441\u0442\u0435\u043c, \u0431\u0435\u0437 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u0438 \u0441\u0430\u043c\u0438\u043c \u0441\u0442\u0440\u043e\u0438\u0442\u044c \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u044b \u0434\u043b\u044f \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f\/\u0437\u0430\u0433\u0440\u0443\u0437\u043a\u0438 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044f \u043c\u043e\u0434\u0435\u043b\u0435\u0439.<\/span><\/p>\n<h2><strong>\u0418\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044f \u0441\u043e storage-\u0440\u0435\u0448\u0435\u043d\u0438\u044f\u043c\u0438: VAST Data, WEKA, Hammerspace \u0438 \u0434\u0440.<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">NVIDIA \u043d\u0435 \u043f\u043b\u0430\u043d\u0438\u0440\u0443\u0435\u0442 \u0435\u0434\u0438\u043d\u043e\u043b\u0438\u0447\u043d\u043e \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u044c \u0432\u0441\u0435 \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442\u044b \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u2013 \u0432\u043c\u0435\u0441\u0442\u043e \u044d\u0442\u043e\u0433\u043e \u043e\u0431\u044a\u044f\u0432\u043b\u0435\u043d \u0448\u0438\u0440\u043e\u043a\u0438\u0439 \u043f\u0435\u0440\u0435\u0447\u0435\u043d\u044c <\/span><i><span style=\"font-weight: 400;\">\u043f\u0430\u0440\u0442\u043d\u0435\u0440\u043e\u0432 \u043f\u043e \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044e \u0434\u0430\u043d\u043d\u044b\u0445<\/span><\/i><span style=\"font-weight: 400;\">, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0430\u0442 \u043d\u043e\u0432\u0443\u044e \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0443. \u0412\u043e \u0432\u0440\u0435\u043c\u044f \u0430\u043d\u043e\u043d\u0441\u0430 Jensen Huang \u0434\u0435\u043c\u043e\u043d\u0441\u0442\u0440\u0438\u0440\u043e\u0432\u0430\u043b \u0441\u043b\u0430\u0439\u0434 \u0441 \u043b\u043e\u0433\u043e\u0442\u0438\u043f\u0430\u043c\u0438 \u043c\u043d\u043e\u0433\u0438\u0445 \u043f\u043e\u0441\u0442\u0430\u0432\u0449\u0438\u043a\u043e\u0432 \u0421\u0425\u0414: <\/span><b>VAST Data, DDN, Dell, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, WEKA<\/b><span style=\"font-weight: 400;\"> \u0438 \u0434\u0440. (NetApp \u0438 Lenovo \u0442\u0430\u043a\u0436\u0435 \u0443\u043f\u043e\u043c\u044f\u043d\u0443\u0442\u044b \u043a\u0430\u043a \u043e\u0436\u0438\u0434\u0430\u044e\u0449\u0438\u0435 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 \u0431\u043b\u0438\u0436\u0435 \u043a \u0437\u0430\u043f\u0443\u0441\u043a\u0443)<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%20highlighted%20during%20its%20launch,called%20out%20NetApp%20specifically%20during\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[76]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=supporting%20the%20effort%2C%20including%20AIC%2C%C2%A0Cloudian%2C%C2%A0DDN%2C%C2%A0Dell,NetApp%20specifically%20during%20his%20keynote\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[77]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0446\u0435\u043b\u043e\u043c \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u044b \u0434\u0435\u043b\u044f\u0442\u0441\u044f \u043d\u0430 \u0434\u0432\u0435 \u043a\u0430\u0442\u0435\u0433\u043e\u0440\u0438\u0438: <\/span><b>(1)<\/b><span style=\"font-weight: 400;\"> \u0442\u0435, \u043a\u0442\u043e \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u0443\u0435\u0442 \u0441\u0432\u043e\u0435 \u041f\u041e \u043d\u0435\u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u0435\u043d\u043d\u043e \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 ICMS (\u0443\u0440\u043e\u0432\u0435\u043d\u044c G3.5), \u0442\u043e \u0435\u0441\u0442\u044c \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u044e\u0442 \u0441\u0432\u043e\u0438 \u0441\u0435\u0440\u0432\u0438\u0441\u044b \u0432\u043d\u0443\u0442\u0440\u0438 BlueField-4 \u043a\u043e\u043d\u0442\u0440\u043e\u043b\u043b\u0435\u0440\u043e\u0432 \u0438\u043b\u0438 \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0438\u0440\u0443\u044e\u0442 \u0438\u0445 \u043f\u043e\u0434 KV-\u043a\u044d\u0448; <\/span><b>(2)<\/b><span style=\"font-weight: 400;\"> \u0442\u0435, \u043a\u0442\u043e \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044e\u0442 \u0441\u0432\u044f\u0437\u043a\u0443 \u0441 \u0432\u043d\u0435\u0448\u043d\u0438\u043c\u0438 \u0443\u0440\u043e\u0432\u043d\u044f\u043c\u0438 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f (G4), \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044f \u0434\u043e\u043b\u0433\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u0441\u043e\u0445\u0440\u0430\u043d\u043d\u043e\u0441\u0442\u044c \u0438\u043b\u0438 \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u0443\u044e \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u043e\u0441\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u041d\u0438\u0436\u0435 \u043f\u0440\u0438\u0432\u0435\u0434\u0435\u043d\u0430 \u0442\u0430\u0431\u043b\u0438\u0446\u0430 \u0441 \u043f\u0440\u0438\u043c\u0435\u0440\u0430\u043c\u0438 \u0440\u0435\u0448\u0435\u043d\u0438\u0439 \u043e\u0442 \u0440\u0430\u0437\u043d\u044b\u0445 \u0432\u0435\u043d\u0434\u043e\u0440\u043e\u0432 \u0438 \u0438\u0445 \u0440\u043e\u043b\u044c\u044e \u0432 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0435 \u043d\u043e\u0432\u043e\u0439 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b:<\/span><\/p>\n<div class=\"_table-holder\"><table>\n<thead>\n<tr>\n<th><b>\u0412\u0435\u043d\u0434\u043e\u0440 \/ \u0440\u0435\u0448\u0435\u043d\u0438\u0435<\/b><\/th>\n<th><b>\u041f\u043e\u0434\u0445\u043e\u0434 \u043a \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438<\/b><\/th>\n<th><b>\u041e\u0441\u043e\u0431\u0435\u043d\u043d\u043e\u0441\u0442\u0438 \u0438 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044f<\/b><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><b>VAST Data<\/b><span style=\"font-weight: 400;\"> (\u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 <\/span><b>AI OS<\/b><span style=\"font-weight: 400;\">, \u043f\u0440\u043e\u0435\u043a\u0442 <\/span><i><span style=\"font-weight: 400;\">Ceres<\/span><\/i><span style=\"font-weight: 400;\">, <\/span><b>VUA<\/b><span style=\"font-weight: 400;\">)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u041f\u043b\u043e\u0442\u043d\u043e \u0441\u043e\u0442\u0440\u0443\u0434\u043d\u0438\u0447\u0430\u0435\u0442 \u0441 NVIDIA \u0434\u043b\u044f \u0437\u0430\u043f\u0443\u0441\u043a\u0430 \u0441\u0432\u043e\u0435\u0433\u043e \u041f\u041e \u043d\u0430 DPU. \u0423\u0436\u0435 \u0440\u0435\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u043d\u0430 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0430 BlueField-3 (\u043a\u043e\u043d\u0442\u0440\u043e\u043b\u043b\u0435\u0440\u044b BF3 \u0432\u0441\u0442\u0440\u043e\u0435\u043d\u044b \u0432 \u0444\u043b\u044d\u0448-\u044d\u043d\u043a\u043b\u043e\u0436\u0438 VAST Ceres), \u043e\u0436\u0438\u0434\u0430\u0435\u0442\u0441\u044f \u0432\u0435\u0440\u0441\u0438\u044f \u043d\u0430 BF4. \u0422\u0430\u043a\u0436\u0435 VAST \u043e\u0442\u043a\u0440\u044b\u043b\u0430 \u0441\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0439 \u043f\u0440\u043e\u0435\u043a\u0442 <\/span><b>VAST Undivided Attention (VUA)<\/b><span style=\"font-weight: 400;\"> \u2013 open-source \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u0430 KV-\u043a\u044d\u0448\u0430 \u043d\u0430 \u0444\u043b\u0435\u0448-\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u0445<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,across%20these%20three%20storage%20engines\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[74]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=many%20storage%20suppliers%20are%20partnering,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[78]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/td>\n<td><span style=\"font-weight: 400;\">VAST \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u043f\u0440\u0435\u0432\u0440\u0430\u0449\u0430\u0435\u0442 \u0441\u0432\u043e\u0438 \u0444\u043b\u0435\u0448-\u043c\u0430\u0441\u0441\u0438\u0432\u044b \u0432 <\/span><i><span style=\"font-weight: 400;\">\u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u0435 GPU-\u043f\u0430\u043c\u044f\u0442\u0438<\/span><\/i><span style=\"font-weight: 400;\">. \u041f\u041e VAST CNode \u043c\u043e\u0436\u0435\u0442 \u0432\u044b\u043f\u043e\u043b\u043d\u044f\u0442\u044c\u0441\u044f \u043f\u0440\u044f\u043c\u043e \u043d\u0430 BlueField, \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0432\u0430\u044f \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 \u0434\u043b\u044f GPU \u0434\u043e\u0441\u0442\u0443\u043f \u043a \u0434\u0430\u043d\u043d\u044b\u043c. \u0412 \u0441\u0432\u044f\u0437\u043a\u0435 \u0441 ICMS \u044d\u0442\u043e \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u0442 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0434\u043e \u043f\u0435\u0442\u0430\u0431\u0430\u0439\u0442 \u0441 \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0439 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u043e\u0439. VUA \u0434\u0430\u0435\u0442 \u0441\u043e\u043e\u0431\u0449\u0435\u0441\u0442\u0432\u0443 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u0439 \u0438\u043d\u0441\u0442\u0440\u0443\u043c\u0435\u043d\u0442, \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u044b\u0439 \u0441 NVIDIA NIXL, \u0434\u043b\u044f \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0438 <\/span><i><span style=\"font-weight: 400;\">KV-\u043a\u044d\u0448-\u043e\u0444\u0444\u043b\u043e\u0430\u0434\u0430<\/span><\/i><span style=\"font-weight: 400;\">.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>WEKA<\/b><span style=\"font-weight: 400;\"> (<\/span><i><span style=\"font-weight: 400;\">Augmented Memory Grid<\/span><\/i><span style=\"font-weight: 400;\">)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u041a\u043e\u043c\u043f\u0430\u043d\u0438\u044f WEKA (\u0438\u0437\u0432\u0435\u0441\u0442\u043d\u0430 \u0441\u0432\u043e\u0435\u0439 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u043e\u0439 WekaFS \u0434\u043b\u044f HPC) \u0435\u0449\u0435 \u0434\u043e \u043f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u044f BF4 \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u043b\u0430 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 <\/span><i><span style=\"font-weight: 400;\">\u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u044f \u043f\u0430\u043c\u044f\u0442\u0438 GPU<\/span><\/i><span style=\"font-weight: 400;\"> \u043d\u0430 \u0431\u0430\u0437\u0435 NVMe. <\/span><b>Augmented Memory Grid<\/b><span style=\"font-weight: 400;\"> \u043e\u0442 WEKA \u2013 \u044d\u0442\u043e \u041f\u041e, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044e\u0449\u0435\u0435 \u0447\u0435\u0440\u0435\u0437 <\/span><b>GPUDirect Storage<\/b><span style=\"font-weight: 400;\"> \u0434\u0430\u0442\u044c GPU \u043f\u0440\u044f\u043c\u043e\u0439 RDMA-\u0434\u043e\u0441\u0442\u0443\u043f \u043a NVMe-\u043f\u0443\u043b\u0443, \u043a\u0430\u043a \u043a \u043f\u0440\u043e\u0434\u043e\u043b\u0436\u0435\u043d\u0438\u044e \u043f\u0430\u043c\u044f\u0442\u0438<\/span><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=infrastructure\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[79]<\/span><\/a><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Augmented%20Memory%20Grid%20extends%20GPU,protocols%20or%20heavyweight%20storage%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[80]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u041f\u043e \u0441\u0443\u0442\u0438, WEKA \u0441\u043e\u0437\u0434\u0430\u043b\u0430 \u0430\u043d\u0430\u043b\u043e\u0433 G3.5 \u0443\u0440\u043e\u0432\u043d\u044f \u0441\u0432\u043e\u0438\u043c\u0438 \u0441\u0438\u043b\u0430\u043c\u0438: \u0438\u0445 \u0440\u0435\u0448\u0435\u043d\u0438\u044e \u0443\u0436\u0435 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442 \u0440\u044f\u0434 \u043a\u043b\u0438\u0435\u043d\u0442\u043e\u0432, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f \u0445\u0440\u0430\u043d\u0438\u0442\u044c KV-\u043a\u044d\u0448 \u043d\u0430 SSD \u0441 \u0434\u043e\u0441\u0442\u0443\u043f\u043e\u043c \u0437\u0430 \u043c\u0438\u043a\u0440\u043e\u0441\u0435\u043a\u0443\u043d\u0434\u044b. \u041f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u0435 BlueField-4 <\/span><b>\u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u0438\u0437\u0438\u0440\u0443\u0435\u0442<\/b><span style=\"font-weight: 400;\"> \u0442\u0430\u043a\u043e\u0439 \u043f\u043e\u0434\u0445\u043e\u0434 \u0438 \u0434\u0430\u0435\u0442 \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u043e\u0435 \u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u0435. WEKA \u0437\u0430\u044f\u0432\u043b\u044f\u0435\u0442 \u043f\u043e\u043b\u043d\u0443\u044e \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 \u043d\u043e\u0432\u043e\u0439 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b: \u0441\u0432\u044f\u0437\u043a\u0430 BlueField-4 + DOCA + NIXL \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u043d\u0430 \u0434\u043b\u044f \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0430\u0446\u0438\u0438 <\/span><i><span style=\"font-weight: 400;\">Augmented Memory Grid<\/span><\/i><span style=\"font-weight: 400;\">, \u0441\u043d\u0438\u0437\u0438\u0432 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0443 \u043d\u0430 CPU \u0438 \u0443\u043b\u0443\u0447\u0448\u0438\u0432 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u043c\u043e\u0441\u0442\u044c<\/span><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=through%20slow%20protocols%20or%20heavyweight,storage%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[81]<\/span><\/a><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=As%20the%20ecosystem%20evolves%2C%20NVIDIA,approach%20allows%20inference%20platforms%20to\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[82]<\/span><\/a><span style=\"font-weight: 400;\">. \u0418\u043d\u044b\u043c\u0438 \u0441\u043b\u043e\u0432\u0430\u043c\u0438, WEKA \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u0443\u0435\u0442 \u0441\u0432\u043e\u044e \u0441\u0438\u0441\u0442\u0435\u043c\u0443 \u0432 G3.5-tier, \u0447\u0442\u043e\u0431\u044b \u043a\u043b\u0438\u0435\u043d\u0442\u044b \u043c\u043e\u0433\u043b\u0438 \u043f\u0440\u043e\u0437\u0440\u0430\u0447\u043d\u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u043d\u043e\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 NVIDIA+WEKA.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Hammerspace<\/b><span style=\"font-weight: 400;\"> (<\/span><i><span style=\"font-weight: 400;\">Tier Zero<\/span><\/i><span style=\"font-weight: 400;\">)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Hammerspace \u0438\u0437\u0432\u0435\u0441\u0442\u043d\u0430 \u041f\u041e \u0434\u043b\u044f \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u043e\u0439 \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0439 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0439 \u0441\u0440\u0435\u0434\u044b. \u041e\u043d\u0438 \u043f\u0440\u043e\u0434\u0432\u0438\u0433\u0430\u044e\u0442 \u043a\u043e\u043d\u0446\u0435\u043f\u0446\u0438\u044e <\/span><b>Tier Zero<\/b><span style=\"font-weight: 400;\"> \u2013 \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u0431\u044b\u0441\u0442\u0440\u043e\u0433\u043e \u0441\u043b\u043e\u044f \u0434\u0430\u043d\u043d\u044b\u0445 \u0434\u043b\u044f \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u0439. \u0412 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0435 AI, Tier0 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u0435\u0442 \u0443\u0440\u043e\u0432\u043d\u044e G3 \u0432 \u0442\u0435\u0440\u043c\u0438\u043d\u043e\u043b\u043e\u0433\u0438\u0438 NVIDIA (\u043b\u043e\u043a\u0430\u043b\u044c\u043d\u043e\u0435 NVMe), \u043d\u043e Hammerspace \u0441\u0447\u0438\u0442\u0430\u0435\u0442, \u0447\u0442\u043e \u0441 BlueField <\/span><b>Inference Context Memory<\/b><span style=\"font-weight: 400;\"> \u0441\u0442\u0430\u043d\u0435\u0442 \u043f\u0440\u043e\u0434\u043e\u043b\u0436\u0435\u043d\u0438\u0435\u043c \u0438\u0445 \u043f\u043e\u0434\u0445\u043e\u0434\u0430 \u043d\u0430 \u043d\u043e\u0432\u044b\u0439 \u0443\u0440\u043e\u0432\u0435\u043d\u044c.<\/span><\/td>\n<td><span style=\"font-weight: 400;\">\u041f\u043e \u0437\u0430\u044f\u0432\u043b\u0435\u043d\u0438\u044e Hammerspace, \u0438\u0445 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0438\u0437\u043d\u0430\u0447\u0430\u043b\u044c\u043d\u043e \u201c\u0437\u0430\u0442\u043e\u0447\u0435\u043d\u0430 \u043f\u043e\u0434 \u0434\u043e\u0441\u0442\u0430\u0432\u043a\u0443 \u043d\u0443\u0436\u043d\u044b\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u043a \u043d\u0443\u0436\u043d\u043e\u043c\u0443 GPU \u0432 \u043d\u0443\u0436\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442\u201d (\u0446\u0438\u0442\u0430\u0442\u0430 CMO)<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=3,4%20and%20we%20have%20a\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[83]<\/span><\/a><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=program%20in%20plan%20to%20support,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[84]<\/span><\/a><span style=\"font-weight: 400;\">. \u0421\u0435\u0439\u0447\u0430\u0441 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f <\/span><b>\u0430\u043a\u0442\u0438\u0432\u043d\u043e \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 \u0441 NVIDIA<\/b><span style=\"font-weight: 400;\">, \u043f\u043b\u0430\u043d\u0438\u0440\u0443\u044f \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 BlueField-4 \u0438 ICMS \u0432 \u0441\u0432\u043e\u0435\u043c \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u0435. \u042d\u0442\u043e \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u0442 Hammerspace \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u0442\u044c KV-\u043a\u044d\u0448\u043e\u043c \u043a\u0430\u043a \u0447\u0430\u0441\u0442\u044c\u044e \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u043d\u0435\u0439\u043c\u0441\u043f\u0435\u0439\u0441\u0430: \u0438\u0445 \u041f\u041e \u0431\u0443\u0434\u0435\u0442 \u043e\u0442\u0441\u043b\u0435\u0436\u0438\u0432\u0430\u0442\u044c, \u0433\u0434\u0435 \u0432 \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0439 \u0441\u0440\u0435\u0434\u0435 \u0434\u043e\u043b\u0436\u0435\u043d \u0445\u0440\u0430\u043d\u0438\u0442\u044c\u0441\u044f \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442, \u043f\u0435\u0440\u0435\u043c\u0435\u0449\u0430\u0442\u044c \u0435\u0433\u043e \u0431\u043b\u0438\u0436\u0435 \u043a \u043d\u0443\u0436\u043d\u044b\u043c GPU, \u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u0438 BF4 \u0434\u043b\u044f \u0431\u044b\u0441\u0442\u0440\u043e\u0433\u043e \u0434\u043e\u0441\u0442\u0443\u043f\u0430. \u0422\u0430\u043a\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c Tier Zero \u044d\u0432\u043e\u043b\u044e\u0446\u0438\u043e\u043d\u0438\u0440\u0443\u0435\u0442 \u0432 \u043f\u043e\u043b\u043d\u043e\u0446\u0435\u043d\u043d\u044b\u0439 G3.5 \u0443\u0440\u043e\u0432\u0435\u043d\u044c \u0434\u043b\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430, \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u044f \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u0443\u044e \u0432\u0438\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0445.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>IBM (Storage Scale)<\/b><\/td>\n<td><span style=\"font-weight: 400;\">IBM \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u0442 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 \u043d\u0430 \u0431\u0430\u0437\u0435 <\/span><b>IBM Storage Scale<\/b><span style=\"font-weight: 400;\"> (\u0431\u044b\u0432\u0448\u0438\u0439 GPFS) \u2013 \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0439 \u0444\u0430\u0439\u043b\u043e\u0432\u043e\u0439 \u0441\u0438\u0441\u0442\u0435\u043c\u044b \u0441 \u0435\u0434\u0438\u043d\u044b\u043c \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u044b\u043c \u043d\u0435\u0439\u043c\u0441\u043f\u0435\u0439\u0441\u043e\u043c \u0438 \u0443\u0447\u0435\u0442\u043e\u043c \u043b\u043e\u043a\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u0434\u0430\u043d\u043d\u044b\u0445. \u0412\u043c\u0435\u0441\u0442\u0435 \u0441 NVIDIA \u043e\u043d\u0438 \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u0443\u044e\u0442 Storage\u00a0Scale \u0441 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u043e\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438.<\/span><\/td>\n<td><span style=\"font-weight: 400;\">IBM Storage Scale \u0431\u0435\u0440\u0435\u0442 \u043d\u0430 \u0441\u0435\u0431\u044f \u043e\u0431\u044a\u0435\u0434\u0438\u043d\u0435\u043d\u0438\u0435 \u0443\u0440\u043e\u0432\u043d\u0435\u0439 G3 \u0438 G4: KV-\u0434\u0430\u043d\u043d\u044b\u0435 \u043f\u0438\u0448\u0443\u0442\u0441\u044f \u043d\u0430 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0435 NVMe \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u0443\u0437\u043b\u0430, \u043d\u043e <\/span><b>\u0442\u0443\u0442 \u0436\u0435 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b \u0447\u0435\u0440\u0435\u0437 \u043e\u0431\u0449\u0438\u0439 namespace<\/b><span style=\"font-weight: 400;\"> \u0434\u0440\u0443\u0433\u0438\u043c \u0443\u0437\u043b\u0430\u043c \u0431\u0435\u0437 \u044f\u0432\u043d\u043e\u0433\u043e \u043a\u043e\u043f\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=IBM%20Storage%20Scale%20provides%20a,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[85]<\/span><\/a><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=awareness,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[86]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u0434\u043e\u0441\u0442\u0438\u0433\u0430\u0435\u0442\u0441\u044f \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u0430\u043c\u0438 \u0440\u0435\u043f\u043b\u0438\u043a\u0430\u0446\u0438\u0438\/\u0448\u0430\u0440\u0434\u0438\u043d\u0433\u0430 FS. BlueField-4 \u043f\u043e\u0434\u043a\u043b\u044e\u0447\u0430\u0435\u0442\u0441\u044f \u043a Storage\u00a0Scale \u0438 \u0443\u0441\u043a\u043e\u0440\u044f\u0435\u0442 \u0434\u043e\u0441\u0442\u0443\u043f \u043a \u044d\u0442\u0438\u043c \u0434\u0430\u043d\u043d\u044b\u043c \u043f\u043e \u0441\u0435\u0442\u0438, \u043c\u0438\u043d\u0438\u043c\u0438\u0437\u0438\u0440\u0443\u044f \u0443\u0447\u0430\u0441\u0442\u0438\u0435 CPU<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=NVIDIA%20BlueField,RDMA%20access%20to%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[87]<\/span><\/a><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=enables%20high%20bandwidth%2C%20low%20latency,RDMA%20access%20to%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[88]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0438\u0442\u043e\u0433\u0435 \u0440\u0430\u0437\u043d\u044b\u0435 \u044d\u043a\u0437\u0435\u043c\u043f\u043b\u044f\u0440\u044b NVIDIA\u00a0Dynamo \u043d\u0430 \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u0438\u0445 \u0441\u0435\u0440\u0432\u0435\u0440\u0430\u0445 \u043c\u043e\u0433\u0443\u0442 \u043e\u0431\u0440\u0430\u0449\u0430\u0442\u044c\u0441\u044f \u043a \u043e\u0434\u043d\u0438\u043c \u0438 \u0442\u0435\u043c \u0436\u0435 KV-\u0437\u0430\u043f\u0438\u0441\u044f\u043c, \u043a\u0430\u043a \u0435\u0441\u043b\u0438 \u0431\u044b \u043e\u043d\u0438 \u0431\u044b\u043b\u0438 \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=IBM%20Storage%20Scale%20provides%20a,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[85]<\/span><\/a><span style=\"font-weight: 400;\">. IBM \u0437\u0430\u044f\u0432\u043b\u044f\u0435\u0442, \u0447\u0442\u043e \u0438\u0445 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 \u043f\u0440\u043e\u0437\u0440\u0430\u0447\u043d\u043e \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u043e\u0442 \u043f\u0430\u0440\u044b \u0434\u043e \u0442\u044b\u0441\u044f\u0447 \u0443\u0437\u043b\u043e\u0432, \u0432\u043a\u043b\u044e\u0447\u0430\u044f \u0433\u0438\u0431\u0440\u0438\u0434\u043d\u044b\u0435 \u043e\u0431\u043b\u0430\u043a\u043e\/\u043a\u0440\u0430\u0439 \u0441\u0446\u0435\u043d\u0430\u0440\u0438\u0438 \u2013 \u0435\u0434\u0438\u043d\u044b\u0439 namespace \u0443\u043f\u0440\u043e\u0449\u0430\u0435\u0442 \u0440\u0430\u0431\u043e\u0442\u0443 \u0441 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043e\u043c \u043d\u0430 \u0440\u0430\u0437\u043d\u044b\u0445 \u043f\u043b\u043e\u0449\u0430\u0434\u043a\u0430\u0445<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=reduces%20recomputation%20and%20improves%20cache,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[89]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/td>\n<\/tr>\n<tr>\n<td><b>Dell Technologies<\/b><span style=\"font-weight: 400;\"> (PowerScale, ObjectScale, Lightning)<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Dell \u2013 \u043a\u0440\u0443\u043f\u043d\u044b\u0439 \u0438\u0433\u0440\u043e\u043a \u0432 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430\u0445 \u2013 \u0442\u043e\u0436\u0435 \u0432\u043a\u043b\u044e\u0447\u0438\u043b\u0441\u044f \u0432 \u0433\u043e\u043d\u043a\u0443. \u0421\u043e\u0432\u043c\u0435\u0441\u0442\u043d\u043e \u0441 NVIDIA \u043e\u043d\u0438 \u0430\u043d\u043e\u043d\u0441\u0438\u0440\u043e\u0432\u0430\u043b\u0438 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 <\/span><b>Context Memory Storage (CMS)<\/b><span style=\"font-weight: 400;\"> \u0432 \u0441\u0432\u043e\u0438\u0445 \u0441\u0438\u0441\u0442\u0435\u043c\u0430\u0445: PowerScale (NAS), ObjectScale (\u043e\u0431\u044a\u0435\u043a\u0442\u043d\u043e\u0435 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435) \u0438 \u043d\u043e\u0432\u044b\u0439 Lightning (\u043f\u0430\u0440\u0430\u043b\u043b\u0435\u043b\u044c\u043d\u0430\u044f \u0444\u0430\u0439\u043b\u043e\u0432\u0430\u044f \u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u0434\u043b\u044f AI).<\/span><\/td>\n<td><span style=\"font-weight: 400;\">Dell \u0438\u0437\u043d\u0430\u0447\u0430\u043b\u044c\u043d\u043e \u043f\u043e\u043a\u0430\u0437\u0430\u043b\u0430 \u0432\u043f\u0435\u0447\u0430\u0442\u043b\u044f\u044e\u0449\u0438\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043f\u043e \u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u044e \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u043d\u0430 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0435\u043c \u0436\u0435\u043b\u0435\u0437\u0435: \u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u043e\u0432\u0430\u0432 <\/span><b>LMCache + NIXL<\/b><span style=\"font-weight: 400;\"> \u0441 RDMA \u0432 \u0441\u0432\u043e\u0438 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u043e\u043d\u0438 \u0434\u043e\u0431\u0438\u043b\u0438\u0441\u044c \u0434\u043e <\/span><i><span style=\"font-weight: 400;\">19\u00d7 \u0441\u043e\u043a\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0434\u043e \u043f\u0435\u0440\u0432\u043e\u0433\u043e \u0442\u043e\u043a\u0435\u043d\u0430 (TTFT)<\/span><\/i><span style=\"font-weight: 400;\"> \u0438 ~5.3\u00d7 \u0440\u043e\u0441\u0442\u0430 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u0432 \u0441\u0435\u043a\u0443\u043d\u0434\u0443 \u043d\u0430 LLM, \u0434\u0430\u0436\u0435 \u0431\u0435\u0437 BF4<\/span><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=At%20the%20same%20time%2C%20for,number%20of%20queries%20per%20second\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[90]<\/span><\/a><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=accelerate%20inference%2C%20delivering%20a%2019x,number%20of%20queries%20per%20second\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[91]<\/span><\/a><span style=\"font-weight: 400;\">. \u0421 \u0432\u044b\u0445\u043e\u0434\u043e\u043c BlueField-4 Dell \u201c\u0434\u043e\u0432\u0438\u043d\u0447\u0438\u0432\u0430\u0435\u0442\u201d \u0440\u0435\u0448\u0435\u043d\u0438\u0435: \u0440\u0430\u0437\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u043e-\u0438\u043d\u0442\u0435\u0433\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u0441\u0438\u0441\u0442\u0435\u043c\u044b, \u0433\u0434\u0435 BF4 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u0435\u0442\u0441\u044f \u0434\u043b\u044f CMS-\u0443\u0440\u043e\u0432\u043d\u044f \u0432 \u0441\u0432\u044f\u0437\u043a\u0435 \u0441 \u0444\u043b\u044d\u0448-\u043c\u0430\u0441\u0441\u0438\u0432\u0430\u043c\u0438 Dell<\/span><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=Introducing%20Context%20Memory%20Storage%20Platform,4\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[92]<\/span><\/a><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=the%20concept%20of%20CMS%20to,CMS%20to%20further%20accelerate%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[93]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u0439 \u043f\u0435\u0440\u0438\u043e\u0434 Dell \u043f\u0440\u043e\u0434\u0432\u0438\u0433\u0430\u0435\u0442 \u0433\u0438\u0431\u0440\u0438\u0434: \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, PowerScale \u043c\u043e\u0436\u0435\u0442 \u0443\u0436\u0435 \u0441\u0435\u0439\u0447\u0430\u0441 \u0447\u0435\u0440\u0435\u0437 <\/span><b>NFS-over-RDMA<\/b><span style=\"font-weight: 400;\"> \u0441\u043b\u0443\u0436\u0438\u0442\u044c \u0431\u044b\u0441\u0442\u0440\u044b\u043c \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435\u043c KV-\u043a\u044d\u0448\u0430 (NAS \u0441 \u043d\u0438\u0437\u043a\u043e\u0439 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u043e\u0439), ObjectScale \u2013 \u043a\u0430\u043a S3-over-RDMA (\u043e\u0431\u044a\u0435\u043a\u0442\u043d\u043e\u0435), \u0430 Lightning (\u043f\u043e\u043a\u0430 \u0432 \u043f\u0440\u0435\u0432\u044c\u044e) \u2013 \u0447\u0435\u0440\u0435\u0437 NVMeoF \u043f\u0440\u044f\u043c\u043e \u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0435 \u0441 \u0434\u0438\u0441\u043a\u043e\u0432 \u043d\u0430 GPU<\/span><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=We%20support%20this%20offloading%20capability,storage%20for%20your%20specific%20needs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[94]<\/span><\/a><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=RDMA%20technology%2C%20you%20get%20the,minimizing%20latency%20and%20maximizing%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[95]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u0438 \u0432\u0430\u0440\u0438\u0430\u043d\u0442\u044b \u0434\u0430\u044e\u0442 \u043a\u043b\u0438\u0435\u043d\u0442\u0430\u043c \u201c\u0432\u043a\u0443\u0441\u201d \u043f\u0440\u0435\u0438\u043c\u0443\u0449\u0435\u0441\u0442\u0432\u0430 \u0434\u043b\u0438\u043d\u043d\u043e\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0443\u0436\u0435 \u0441\u0435\u0433\u043e\u0434\u043d\u044f, \u0430 \u0432 \u0431\u0443\u0434\u0443\u0449\u0435\u043c \u043c\u043e\u0433\u0443\u0442 \u0441\u043e\u0447\u0435\u0442\u0430\u0442\u044c\u0441\u044f \u0441 BlueField \u0434\u043b\u044f \u0435\u0449\u0435 \u0431\u043e\u043b\u044c\u0448\u0435\u0439 \u043e\u043f\u0442\u0438\u043c\u0438\u0437\u0430\u0446\u0438\u0438.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<p><span style=\"font-weight: 400;\">\u041a\u0430\u043a \u0432\u0438\u0434\u043d\u043e, \u043f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0432\u0441\u0435 \u043a\u0440\u0443\u043f\u043d\u044b\u0435 \u0432\u0435\u043d\u0434\u043e\u0440\u044b \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0437\u0430\u044f\u0432\u0438\u043b\u0438 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0443 \u043a\u043e\u043d\u0446\u0435\u043f\u0446\u0438\u0438 <\/span><i><span style=\"font-weight: 400;\">KV-cache offload<\/span><\/i><span style=\"font-weight: 400;\">. <\/span><b>Hammerspace, VAST, WEKA<\/b><span style=\"font-weight: 400;\"> \u2013 \u0444\u043e\u043a\u0443\u0441\u0438\u0440\u0443\u044e\u0442\u0441\u044f \u043d\u0430 \u043f\u0440\u043e\u0433\u0440\u0430\u043c\u043c\u043d\u043e\u043c \u0443\u0440\u043e\u0432\u043d\u0435 \u0438 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u0438 \u0441 DPU; <\/span><b>Dell, IBM, Pure<\/b><span style=\"font-weight: 400;\"> \u0438 \u0434\u0440\u0443\u0433\u0438\u0435 \u2013 \u0430\u0434\u0430\u043f\u0442\u0438\u0440\u0443\u044e\u0442 \u0441\u0432\u043e\u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b \u043f\u043e\u0434 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u044b NVIDIA (NVMe KV, RDMA) \u0438\/\u0438\u043b\u0438 \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u044e\u0442 \u041f\u041e \u043d\u0430 BlueField. \u041f\u043e \u0441\u0443\u0442\u0438, NVIDIA <\/span><b>\u0437\u0430\u0434\u0430\u0435\u0442 \u043e\u0442\u0440\u0430\u0441\u043b\u0435\u0432\u043e\u0439 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442<\/b><span style=\"font-weight: 400;\"> \u0434\u043b\u044f \u0442\u0430\u043a\u043e\u0433\u043e \u00ab\u0443\u0440\u043e\u0432\u043d\u044f \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430\u00bb, \u0438 \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u0441\u0442\u0440\u0435\u043c\u0438\u0442\u0441\u044f \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0435\u043c\u0443<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,across%20these%20three%20storage%20engines\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[74]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%2C%20with%20this%20announcement%2C%20validates,think%20carefully%20about%20their%20choices\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[96]<\/span><\/a><span style=\"font-weight: 400;\">. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c \u043a\u043b\u0438\u0435\u043d\u0442\u044b \u043d\u0435 \u043f\u0440\u0438\u0432\u044f\u0437\u0430\u043d\u044b \u043a \u043e\u0434\u043d\u043e\u043c\u0443 \u043f\u043e\u0441\u0442\u0430\u0432\u0449\u0438\u043a\u0443: \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043e\u0441\u0442\u0430\u0451\u0442\u0441\u044f \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043e\u0442\u043a\u0440\u044b\u0442\u043e\u0439 (\u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u044b NVMe-oF, \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u043e\u0441\u0442\u044c \u0441\u043e \u0441\u0442\u043e\u0440\u043e\u043d\u043d\u0438\u043c\u0438 Flash-\u044d\u043d\u043a\u043b\u043e\u0436\u0430\u043c\u0438), \u0442\u0430\u043a \u0447\u0442\u043e <\/span><b>G3.5-\u0441\u043b\u043e\u0439 \u043c\u043e\u0436\u043d\u043e \u0440\u0435\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u0442\u044c \u0440\u0430\u0437\u043d\u044b\u043c\u0438 \u043f\u0440\u043e\u0434\u0443\u043a\u0442\u0430\u043c\u0438<\/b><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u043f\u043e\u0434\u0442\u0432\u0435\u0440\u0434\u0438\u043b \u0438 \u0430\u043d\u043e\u043d\u0441: NVIDIA \u043f\u043e\u043a\u0430\u0437\u0430\u043b\u0430 \u201c\u0440\u0435\u0444\u0435\u0440\u0435\u043d\u0441\u043d\u044b\u0439 \u0434\u0438\u0437\u0430\u0439\u043d\u201d ICMS-\u044d\u043d\u043a\u043b\u043e\u0436\u0435\u0439, \u043d\u043e \u0441\u0430\u043c\u0438 \u0443\u0441\u0442\u0440\u043e\u0439\u0441\u0442\u0432\u0430 \u043f\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u0430\u043c\u0438 \u043d\u0430 \u0431\u0430\u0437\u0435 \u0438\u0445 \u043f\u0440\u043e\u0432\u0435\u0440\u0435\u043d\u043d\u044b\u0445 \u0442\u0435\u0445\u043d\u043e\u043b\u043e\u0433\u0438\u0439<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=serve%20inference%20context%20memory,doesn%E2%80%99t%20do%20anything%20else\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[97]<\/span><\/a><span style=\"font-weight: 400;\">. \u0414\u043b\u044f \u043a\u043e\u043d\u0435\u0447\u043d\u043e\u0433\u043e \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u044d\u0442\u043e \u0432\u044b\u0440\u0430\u0436\u0430\u0435\u0442\u0441\u044f \u0432 \u0431\u043e\u043b\u044c\u0448\u0435\u043c \u0432\u044b\u0431\u043e\u0440\u0435: \u043c\u043e\u0436\u043d\u043e \u0431\u0443\u0434\u0435\u0442 \u043f\u043e\u0441\u0442\u0440\u043e\u0438\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0439 storage-tier \u043d\u0430 \u0440\u0435\u0448\u0435\u043d\u0438\u0438, \u043b\u0443\u0447\u0448\u0438\u043c \u043e\u0431\u0440\u0430\u0437\u043e\u043c \u043f\u043e\u0434\u0445\u043e\u0434\u044f\u0449\u0435\u043c \u043f\u043e\u0434 \u043e\u0441\u0442\u0430\u043b\u044c\u043d\u0443\u044e \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0443 (\u0431\u0443\u0434\u044c \u0442\u043e all-flash \u043c\u0430\u0441\u0441\u0438\u0432\u044b VAST, \u041f\u041e \u0442\u0438\u043f\u0430 WekaFS \u0438\u043b\u0438 \u043e\u0431\u043b\u0430\u0447\u043d\u044b\u0435 \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430, \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u044e\u0449\u0438\u0435 RDMA).<\/span><\/p>\n<h2><strong>\u0420\u0435\u0430\u043b\u044c\u043d\u044b\u0435 \u0441\u0446\u0435\u043d\u0430\u0440\u0438\u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u044f<\/strong><\/h2>\n<ol>\n<li><b> \u0418\u043d\u0444\u0435\u0440\u0435\u043d\u0441 \u0431\u043e\u043b\u044c\u0448\u0438\u0445 \u044f\u0437\u044b\u043a\u043e\u0432\u044b\u0445 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 (LLM) \u0441 \u0434\u043b\u0438\u043d\u043d\u044b\u043c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043e\u043c.<\/b><span style=\"font-weight: 400;\"> \u0413\u043b\u0430\u0432\u043d\u044b\u0439 \u0434\u0440\u0430\u0439\u0432\u0435\u0440 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b \u2013 \u0432\u043e\u0437\u0440\u043e\u0441\u0448\u0430\u044f \u043f\u043e\u0442\u0440\u0435\u0431\u043d\u043e\u0441\u0442\u044c \u0432 \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u0434\u0438\u0430\u043b\u043e\u0433\u0430\u0445 \u0438 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0435 \u043e\u0433\u0440\u043e\u043c\u043d\u044b\u0445 \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043e\u0432 \u0432 LLM. \u0421\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0435 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u0432\u044b\u0445\u043e\u0434\u044f\u0442 \u0437\u0430 \u0440\u0430\u043c\u043a\u0438 \u043f\u0440\u043e\u0441\u0442\u043e\u0433\u043e \u201c\u0432\u043e\u043f\u0440\u043e\u0441-\u043e\u0442\u0432\u0435\u0442\u201d: \u043f\u043e\u044f\u0432\u043b\u044f\u044e\u0442\u0441\u044f <\/span><i><span style=\"font-weight: 400;\">\u043c\u043d\u043e\u0433\u043e\u0445\u043e\u0434\u043e\u0432\u044b\u0435 \u0430\u0433\u0435\u043d\u0442\u043d\u044b\u0435 \u0441\u0438\u0441\u0442\u0435\u043c\u044b<\/span><\/i><span style=\"font-weight: 400;\">, \u0433\u0434\u0435 \u043c\u043e\u0434\u0435\u043b\u044c \u043c\u043e\u0436\u0435\u0442 \u0432\u0435\u0441\u0442\u0438 \u0434\u0438\u0430\u043b\u043e\u0433, \u0432\u044b\u0437\u044b\u0432\u0430\u0442\u044c \u0438\u043d\u0441\u0442\u0440\u0443\u043c\u0435\u043d\u0442\u044b, \u043f\u043b\u0430\u043d\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0446\u0435\u043f\u043e\u0447\u043a\u0443 \u0434\u0435\u0439\u0441\u0442\u0432\u0438\u0439 \u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u0442\u044c \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u0435 \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b, \u0438\u043d\u043e\u0433\u0434\u0430 \u0432 \u0442\u0435\u0447\u0435\u043d\u0438\u0435 \u043c\u0438\u043d\u0443\u0442 \u0438\u043b\u0438 \u0447\u0430\u0441\u043e\u0432<\/span><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Agentic%20systems%20don%E2%80%99t%20answer%20once,across%20minutes%2C%20hours%2C%20or%20longer\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[98]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0442\u0430\u043a\u0438\u0445 \u0443\u0441\u043b\u043e\u0432\u0438\u044f\u0445 <\/span><i><span style=\"font-weight: 400;\">KV-\u043a\u044d\u0448 \u043f\u0440\u0435\u0432\u0440\u0430\u0449\u0430\u0435\u0442\u0441\u044f \u0432 \u0434\u043e\u043b\u0433\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u0443\u044e \u043f\u0430\u043c\u044f\u0442\u044c \u043c\u043e\u0434\u0435\u043b\u0438<\/span><\/i><span style=\"font-weight: 400;\">, \u043a\u043e\u0442\u043e\u0440\u0443\u044e \u043d\u0435\u043b\u044c\u0437\u044f \u043f\u0440\u043e\u0441\u0442\u043e \u0441\u0431\u0440\u043e\u0441\u0438\u0442\u044c \u043f\u043e\u0441\u043b\u0435 \u043a\u0430\u0436\u0434\u043e\u0433\u043e \u043e\u0442\u0432\u0435\u0442\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=In%20transformer,be%20shared%20across%20inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[1]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=longer%20a%20one,be%20designed%20for%20AI%20systems\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[29]<\/span><\/a><span style=\"font-weight: 400;\">. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u0438\u043c \u0432\u0438\u0440\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u043f\u043e\u043c\u043e\u0449\u043d\u0438\u043a\u0430, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u0437\u0430\u043f\u043e\u043c\u0438\u043d\u0430\u0435\u0442 \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0438\u0435 \u0440\u0430\u0437\u0433\u043e\u0432\u043e\u0440\u044b \u0441 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c: \u0435\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0435 \u043e\u043a\u043d\u043e \u043c\u043e\u0436\u0435\u0442 \u0438\u0437\u043c\u0435\u0440\u044f\u0442\u044c\u0441\u044f \u043c\u0438\u043b\u043b\u0438\u043e\u043d\u0430\u043c\u0438 \u0442\u043e\u043a\u0435\u043d\u043e\u0432, \u0430 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 (KV) \u0437\u0430\u043d\u0438\u043c\u0430\u0435\u0442 \u0441\u043e\u0442\u043d\u0438 \u0433\u0438\u0433\u0430\u0431\u0430\u0439\u0442. \u042f\u0432\u043d\u043e \u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0441\u0442\u043e\u043b\u044c\u043a\u043e \u0432 HBM \u043d\u0435\u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u2013 \u0440\u0430\u043d\u0435\u0435 \u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u0431\u044b\u043b\u0430 \u0432\u044b\u043d\u0443\u0436\u0434\u0435\u043d\u0430 \u043b\u0438\u0431\u043e \u043e\u0431\u0440\u0435\u0437\u0430\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 (\u0437\u0430\u0431\u044b\u0432\u0430\u0442\u044c \u0438\u0441\u0442\u043e\u0440\u0438\u044e), \u043b\u0438\u0431\u043e \u043f\u043e\u0441\u0442\u043e\u044f\u043d\u043d\u043e \u043f\u0435\u0440\u0435\u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u0437\u0430\u043d\u043e\u0432\u043e \u0432\u0435\u0441\u044c \u043f\u0440\u043e\u0439\u0434\u0435\u043d\u043d\u044b\u0439 \u043f\u0443\u0442\u044c, \u0442\u0440\u0430\u0442\u044f \u043c\u043d\u043e\u0433\u043e \u0432\u0440\u0435\u043c\u0435\u043d\u0438 \u0438 \u044d\u043d\u0435\u0440\u0433\u0438\u0438. <\/span><b>Inference Context Memory Platform<\/b><span style=\"font-weight: 400;\"> \u0443\u0441\u0442\u0440\u0430\u043d\u044f\u0435\u0442 \u044d\u0442\u0443 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u0443, \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u044f <\/span><i><span style=\"font-weight: 400;\">\u043f\u0440\u043e\u0437\u0440\u0430\u0447\u043d\u043e \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u043d\u044b\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0432\u043d\u0435 GPU, \u043d\u043e \u0441 \u043f\u043e\u0447\u0442\u0438-GPU \u0441\u043a\u043e\u0440\u043e\u0441\u0442\u044c\u044e \u0434\u043e\u0441\u0442\u0443\u043f\u0430<\/span><\/i><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u043e\u0437\u043d\u0430\u0447\u0430\u0435\u0442, \u0447\u0442\u043e <\/span><b>LLM \u043c\u043e\u0436\u0435\u0442 \u201c\u043f\u043e\u043c\u043d\u0438\u0442\u044c\u201d \u0431\u043e\u043b\u044c\u0448\u0438\u0435 \u0438\u0441\u0442\u043e\u0440\u0438\u0438, \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u044b, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043f\u0440\u0435\u0434\u044b\u0434\u0443\u0449\u0438\u0445 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432<\/b><span style=\"font-weight: 400;\"> \u0438 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0438\u0445 \u043c\u0433\u043d\u043e\u0432\u0435\u043d\u043d\u043e, \u0431\u0435\u0437 \u0448\u0442\u0440\u0430\u0444\u0430 \u043f\u043e latency. \u041f\u0440\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438\u0435 \u0432\u044b\u0433\u043e\u0434\u044b: \u0431\u043e\u043b\u0435\u0435 \u0434\u043b\u0438\u043d\u043d\u044b\u0435 \u043e\u0442\u0432\u0435\u0442\u044b \u0431\u0435\u0437 \u0434\u0435\u0433\u0440\u0430\u0434\u0430\u0446\u0438\u0438 \u0441\u043a\u043e\u0440\u043e\u0441\u0442\u0438, \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u043e\u0441\u0442\u044c \u043c\u043e\u0434\u0435\u043b\u0438 \u0434\u0430\u0432\u0430\u0442\u044c <\/span><i><span style=\"font-weight: 400;\">\u043f\u0435\u0440\u0441\u043e\u043d\u0430\u043b\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043e\u0442\u0432\u0435\u0442\u044b<\/span><\/i><span style=\"font-weight: 400;\">, \u0443\u0447\u0438\u0442\u044b\u0432\u0430\u044e\u0449\u0438\u0435 \u0434\u043e\u043b\u0433\u043e\u0441\u0440\u043e\u0447\u043d\u044b\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0440\u0430\u0437\u0433\u043e\u0432\u043e\u0440\u0430, \u0438\u043b\u0438 \u0430\u043d\u0430\u043b\u0438\u0437\u0438\u0440\u043e\u0432\u0430\u0442\u044c \u0431\u043e\u043b\u044c\u0448\u0438\u0435 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u044b (\u0441\u0443\u043c\u043c\u0430\u0440\u0438\u0437\u0430\u0446\u0438\u0438, Q&amp;A) \u043f\u043e\u043b\u043d\u043e\u0441\u0442\u044c\u044e, \u0434\u0435\u0440\u0436\u0430 \u0438\u0445 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0432 KV-\u043a\u044d\u0448\u0435. \u0412\u0441\u0451 \u044d\u0442\u043e \u043f\u0440\u0438 \u043c\u0438\u043d\u0438\u043c\u0430\u043b\u044c\u043d\u044b\u0445 \u043f\u0430\u0443\u0437\u0430\u0445 \u043c\u0435\u0436\u0434\u0443 \u0442\u043e\u043a\u0435\u043d\u0430\u043c\u0438. \u0421\u043e\u0433\u043b\u0430\u0441\u043d\u043e \u0442\u0435\u0441\u0442\u0430\u043c NVIDIA \u0438 \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u043e\u0432, \u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u0441 offload-\u043a\u044d\u0448\u043e\u043c \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u0430 <\/span><b>\u0443\u0432\u0435\u043b\u0438\u0447\u0438\u0442\u044c throughput \u043d\u0430 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f \u0432 \u0440\u0430\u0437\u044b<\/b><span style=\"font-weight: 400;\"> \u0438 \u0441\u043d\u0438\u0437\u0438\u0442\u044c \u0441\u0440\u0435\u0434\u043d\u0435\u0435 \u0432\u0440\u0435\u043c\u044f \u043e\u0442\u0432\u0435\u0442\u0430 \u0434\u0430\u0436\u0435 \u043f\u0440\u0438 10-100\u00d7 \u0443\u0432\u0435\u043b\u0438\u0447\u0435\u043d\u0438\u0438 \u0434\u043b\u0438\u043d\u044b \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=The%20ability%20to%20offload%20KV,engine%E2%80%94transforms%20the%20economics%20of%20AI\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[99]<\/span><\/a><span style=\"font-weight: 400;\">. \u0422\u0430\u043a\u0436\u0435 \u0441\u043d\u0438\u0436\u0430\u0435\u0442\u0441\u044f \u044d\u043d\u0435\u0440\u0433\u043e\u043f\u043e\u0442\u0440\u0435\u0431\u043b\u0435\u043d\u0438\u0435 \u043d\u0430 \u0437\u0430\u0434\u0430\u0447\u0443 \u2013 GPU \u043d\u0435 \u0442\u0440\u0430\u0442\u044f\u0442 \u0446\u0438\u043a\u043b\u044b \u043d\u0430 \u043f\u043e\u0432\u0442\u043e\u0440\u043d\u044b\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u0432\u043d\u0438\u043c\u0430\u043d\u0438\u044f, \u0430 \u0440\u0430\u0431\u043e\u0442\u0430\u044e\u0442 \u00ab\u043f\u043e \u0434\u0435\u043b\u0443\u00bb<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li><b> \u041c\u0443\u043b\u044c\u0442\u0438-\u0430\u0440\u0435\u043d\u0434\u043d\u043e\u0435 (multi-tenant) \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u043d\u0438\u0435 AI-\u0441\u0435\u0440\u0432\u0438\u0441\u043e\u0432.<\/b><span style=\"font-weight: 400;\"> \u0412 \u0431\u043e\u043b\u044c\u0448\u0438\u0445 \u043e\u0431\u043b\u0430\u0447\u043d\u044b\u0445 \u0441\u0435\u0440\u0432\u0438\u0441\u0430\u0445 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, AI-\u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430\u0445 \u0432\u0440\u043e\u0434\u0435 ChatGPT, \u043a\u043e\u0440\u043f\u043e\u0440\u0430\u0442\u0438\u0432\u043d\u044b\u0445 GPT-\u0445\u0430\u0431\u043e\u0432 \u0438 \u0434\u0440.) \u043e\u0434\u0438\u043d \u043a\u043b\u0430\u0441\u0442\u0435\u0440 \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u0435\u0442 \u0442\u044b\u0441\u044f\u0447\u0438 \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u0441\u0435\u0441\u0441\u0438\u0439 \u0438 \u0440\u0430\u0437\u043d\u044b\u0445 \u043c\u043e\u0434\u0435\u043b\u0435\u0439. \u042d\u0442\u043e \u0441\u043e\u0437\u0434\u0430\u0435\u0442 \u0432\u044b\u0441\u043e\u043a\u0438\u0435 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438 \u043d\u0430 \u043f\u043e\u0434\u0441\u0438\u0441\u0442\u0435\u043c\u0443 \u043f\u0430\u043c\u044f\u0442\u0438 \u0438 \u0432\u0432\u043e\u0434\u0430-\u0432\u044b\u0432\u043e\u0434\u0430: \u0435\u0441\u043b\u0438 1000 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432 \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e \u043f\u044b\u0442\u0430\u044e\u0442\u0441\u044f \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0438\u043b\u0438 \u0437\u0430\u0433\u0440\u0443\u0437\u0438\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442, \u0442\u0440\u0430\u0434\u0438\u0446\u0438\u043e\u043d\u043d\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 (CPU + SAN\/NAS) \u201c\u0437\u0430\u0445\u043b\u0435\u0431\u044b\u0432\u0430\u0435\u0442\u0441\u044f\u201d \u2013 \u0432\u043e\u0437\u043d\u0438\u043a\u0430\u044e\u0442 \u043a\u043e\u043d\u0442\u0435\u043d\u0448\u0435\u043d \u043f\u043e CPU, \u043e\u0447\u0435\u0440\u0435\u0434\u0438 \u043d\u0430 \u0434\u0438\u0441\u043a\u0430\u0445, \u0440\u043e\u0441\u0442 tail latency<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Concurrency%20presents%20an%20additional%20challenge,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[53]<\/span><\/a><span style=\"font-weight: 400;\">. NVIDIA ICMS \u0430\u0434\u0440\u0435\u0441\u0443\u0435\u0442 \u044d\u0442\u0443 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u0443: \u0440\u0430\u0437\u0433\u0440\u0443\u0437\u043a\u0430 \u043e\u043f\u0435\u0440\u0430\u0446\u0438\u0439 \u043d\u0430 DPU \u0438 RDMA-\u0441\u0435\u0442\u044c \u0443\u0441\u0442\u0440\u0430\u043d\u044f\u0435\u0442 \u0443\u0437\u043a\u0438\u0435 \u043c\u0435\u0441\u0442\u0430, \u0438 <\/span><b>\u0441\u043e\u0442\u043d\u0438 GPU \u043c\u043e\u0433\u0443\u0442 \u043f\u0430\u0440\u0430\u043b\u043b\u0435\u043b\u044c\u043d\u043e \u0447\u0438\u0442\u0430\u0442\u044c\/\u043f\u0438\u0441\u0430\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442<\/b><span style=\"font-weight: 400;\"> \u0431\u0435\u0437 \u0437\u0430\u043c\u0435\u0442\u043d\u043e\u0433\u043e \u0432\u0437\u0430\u0438\u043c\u043d\u043e\u0433\u043e \u0432\u043b\u0438\u044f\u043d\u0438\u044f<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Concurrency%20presents%20an%20additional%20challenge,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[53]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=storage%20simultaneously,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[100]<\/span><\/a><span style=\"font-weight: 400;\">. \u041a\u0440\u043e\u043c\u0435 \u0442\u043e\u0433\u043e, \u043e\u0431\u0449\u0438\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0439 \u0441\u043b\u043e\u0439 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">\u0448\u0430\u0440\u0438\u0442\u044c \u043f\u0430\u043c\u044f\u0442\u044c \u043c\u0435\u0436\u0434\u0443 \u0438\u043d\u0441\u0442\u0430\u043d\u0441\u0430\u043c\u0438<\/span><\/i><span style=\"font-weight: 400;\">: \u043d\u0430\u043f\u0440. \u0435\u0441\u043b\u0438 \u0440\u0430\u0437\u043d\u044b\u0435 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0438 \u043e\u0431\u0440\u0430\u0449\u0430\u044e\u0442\u0441\u044f \u043a \u043e\u0434\u043d\u043e\u0439 \u0431\u043e\u043b\u044c\u0448\u043e\u0439 \u043c\u043e\u0434\u0435\u043b\u0438, \u0438\u0445 \u043e\u0431\u0449\u0438\u0435 \u0442\u043e\u043a\u0435\u043d\u044b (\u043d\u0430\u0447\u0430\u043b\u044c\u043d\u0430\u044f \u0447\u0430\u0441\u0442\u044c \u0437\u0430\u043f\u0440\u043e\u0441\u0430, \u043b\u0438\u0431\u043e \u0441\u0442\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0437\u043d\u0430\u043d\u0438\u0435 \u043c\u043e\u0434\u0435\u043b\u0438) \u043c\u043e\u0433\u0443\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c\u0441\u044f \u0435\u0434\u0438\u043d\u043e\u0436\u0434\u044b \u0438 \u043f\u0435\u0440\u0435\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f. <\/span><b>\u041c\u043d\u043e\u0433\u043e\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u0441\u043a\u0438\u0439 \u0440\u0435\u0436\u0438\u043c \u0432\u044b\u0438\u0433\u0440\u044b\u0432\u0430\u0435\u0442<\/b><span style=\"font-weight: 400;\"> \u0442\u0430\u043a\u0436\u0435 \u043e\u0442 \u0441\u043d\u0438\u0436\u0435\u043d\u0438\u044f \u0432\u0430\u0440\u0438\u0430\u0431\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 \u0437\u0430\u0434\u0435\u0440\u0436\u0435\u043a \u2013 \u043a\u0430\u043a \u043e\u0442\u043c\u0435\u0447\u0435\u043d\u043e \u0440\u0430\u043d\u0435\u0435, Spectrum-X \u0441\u0435\u0442\u044c \u0438 \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u044b\u0435 \u043e\u0447\u0435\u0440\u0435\u0434\u0438 BF4 \u0433\u0430\u0440\u0430\u043d\u0442\u0438\u0440\u0443\u044e\u0442 \u0438\u0437\u043e\u043b\u044f\u0446\u0438\u044e, \u043f\u043e\u044d\u0442\u043e\u043c\u0443 \u0434\u0430\u0436\u0435 \u201c\u0448\u0443\u043c\u043d\u044b\u0435 \u0441\u043e\u0441\u0435\u0434\u0438\u201d \u043f\u043e storage-tier \u043d\u0435 \u0443\u0432\u0435\u043b\u0438\u0447\u0430\u0442 \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0443 \u0434\u043b\u044f \u043a\u0440\u0438\u0442\u0438\u0447\u043d\u044b\u0445 \u0437\u0430\u043f\u0440\u043e\u0441\u043e\u0432<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=latency%2C%20and%20packet%20loss%20under,heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[62]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u0438\u0442\u043e\u0433\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0441\u043f\u043e\u0441\u043e\u0431\u0441\u0442\u0432\u0443\u0435\u0442 \u0431\u043e\u043b\u0435\u0435 \u043f\u043b\u043e\u0442\u043d\u043e\u0439 \u043a\u043e\u043d\u0441\u043e\u043b\u0438\u0434\u0430\u0446\u0438\u0438 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438 (<\/span><i><span style=\"font-weight: 400;\">\u0431\u043e\u043b\u044c\u0448\u0435 \u0430\u0433\u0435\u043d\u0442\u043e\u0432 \u043d\u0430 \u043e\u0434\u0438\u043d GPU \u043e\u0434\u043d\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e<\/span><\/i><span style=\"font-weight: 400;\">), \u0447\u0442\u043e \u0432\u0430\u0436\u043d\u043e \u0434\u043b\u044f \u044d\u043a\u043e\u043d\u043e\u043c\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u0438 (TCO) AI-\u0441\u0435\u0440\u0432\u0438\u0441\u0430<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20BlueField%E2%80%914%E2%80%93powered%20ICMS%20provides%20AI%E2%80%91native,shorter%20tail%20latencies%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[31]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u0435 \u043a\u0435\u0439\u0441\u0430 \u043c\u043e\u0436\u043d\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u043e\u0431\u043b\u0430\u0447\u043d\u044b\u0439 API, \u0433\u0434\u0435 \u0434\u0435\u0441\u044f\u0442\u043a\u0438 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 (\u0440\u0430\u0437\u043d\u044b\u0445 \u0440\u0430\u0437\u043c\u0435\u0440\u043e\u0432) \u043e\u0431\u0441\u043b\u0443\u0436\u0438\u0432\u0430\u044e\u0442\u0441\u044f \u043d\u0430 \u043e\u0431\u0449\u0435\u043c GPU-\u043f\u0443\u043b\u0435: \u0441 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u043e\u0439 \u043a\u0430\u0436\u0434\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c \u0431\u044b\u0441\u0442\u0440\u043e \u0437\u0430\u0433\u0440\u0443\u0436\u0430\u0435\u0442 \u0441\u0432\u043e\u0438 KV-\u0434\u0430\u043d\u043d\u044b\u0435 \u043f\u043e \u0442\u0440\u0435\u0431\u043e\u0432\u0430\u043d\u0438\u044e \u0438 \u0432\u044b\u0433\u0440\u0443\u0436\u0430\u0435\u0442 \u043e\u0431\u0440\u0430\u0442\u043d\u043e, \u043d\u0435 \u0440\u0435\u0437\u0435\u0440\u0432\u0438\u0440\u0443\u044f GPU-\u043f\u0430\u043c\u044f\u0442\u044c \u043f\u043e\u0441\u0442\u043e\u044f\u043d\u043d\u043e. \u042d\u0442\u043e \u043e\u0431\u043b\u0435\u0433\u0447\u0430\u0435\u0442 \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044e <\/span><i><span style=\"font-weight: 400;\">\u0434\u0438\u043d\u0430\u043c\u0438\u0447\u0435\u0441\u043a\u043e\u0433\u043e \u043f\u0443\u043b\u0430 \u043c\u043e\u0434\u0435\u043b\u0435\u0439<\/span><\/i><span style=\"font-weight: 400;\">, \u043f\u0435\u0440\u0435\u043a\u043b\u044e\u0447\u0430\u044e\u0449\u0438\u0445\u0441\u044f \u043f\u043e \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0435.<\/span><\/li>\n<li><b> \u041c\u043d\u043e\u0433\u043e\u043c\u043e\u0434\u0430\u043b\u044c\u043d\u044b\u0435 \u0438 \u0441\u043b\u043e\u0436\u043d\u044b\u0435 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u044b.<\/b><span style=\"font-weight: 400;\"> \u041f\u043e\u043c\u0438\u043c\u043e \u0447\u0438\u0441\u0442\u043e \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 LLM, \u0432 \u0441\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u044b\u0445 \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f\u0445 \u043f\u043e\u044f\u0432\u043b\u044f\u044e\u0442\u0441\u044f \u0441\u0432\u044f\u0437\u043a\u0438 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u2013 \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0430\u0433\u0435\u043d\u0442 \u043c\u043e\u0436\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0432\u0438\u0437\u0443\u0430\u043b\u044c\u043d\u0443\u044e \u043c\u043e\u0434\u0435\u043b\u044c (\u0430\u043d\u0430\u043b\u0438\u0437 \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u044f) \u0438\u043b\u0438 \u0434\u0435\u043b\u0430\u0442\u044c \u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u044b\u0435 \u0437\u0430\u043f\u0440\u043e\u0441\u044b \u043a \u043f\u043e\u0438\u0441\u043a\u043e\u0432\u043e\u043c\u0443 \u0434\u0432\u0438\u0436\u043a\u0443 (retrieval). \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u043f\u043e\u043b\u0435\u0437\u043d\u0430 \u0438 \u0437\u0434\u0435\u0441\u044c: \u043e\u043d\u0430 \u0441\u043f\u043e\u0441\u043e\u0431\u043d\u0430 \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u0442\u044c <\/span><i><span style=\"font-weight: 400;\">\u043f\u0440\u043e\u043c\u0435\u0436\u0443\u0442\u043e\u0447\u043d\u043e\u0435 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u043c\u0435\u0436\u0434\u0443 \u044d\u0442\u0430\u043f\u0430\u043c\u0438<\/span><\/i><span style=\"font-weight: 400;\">, \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u043e\u0435 \u0440\u0430\u0437\u043d\u044b\u043c \u043c\u043e\u0434\u0443\u043b\u044f\u043c. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442 \u0440\u0430\u0441\u043f\u043e\u0437\u043d\u0430\u0432\u0430\u043d\u0438\u044f \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0438\u044f (\u0432\u0435\u043a\u0442\u043e\u0440\u043d\u044b\u0435 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f) \u043c\u043e\u0436\u043d\u043e \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u043a\u0430\u043a \u0447\u0430\u0441\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0441\u0435\u0441\u0441\u0438\u0438, \u0447\u0442\u043e\u0431\u044b \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u0430\u044f \u043c\u043e\u0434\u0435\u043b\u044c \u043c\u043e\u0433\u043b\u0430 \u043a \u043d\u0438\u043c \u043e\u0431\u0440\u0430\u0449\u0430\u0442\u044c\u0441\u044f \u043f\u0440\u0438 \u0444\u043e\u0440\u043c\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043e\u0442\u0432\u0435\u0442\u0430. \u0418\u043b\u0438 \u0432 \u0437\u0430\u0434\u0430\u0447\u0430\u0445 <\/span><i><span style=\"font-weight: 400;\">Retrieval-Augmented Generation<\/span><\/i><span style=\"font-weight: 400;\"> (RAG) \u2013 \u043d\u0430\u0439\u0434\u0435\u043d\u043d\u044b\u0435 \u0444\u0430\u043a\u0442\u044b\/\u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u044b \u043c\u043e\u0433\u0443\u0442 \u0431\u044b\u0442\u044c \u0437\u0430\u043a\u044d\u0448\u0438\u0440\u043e\u0432\u0430\u043d\u044b \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 KV, \u0447\u0442\u043e\u0431\u044b \u043f\u0440\u0438 \u0433\u0435\u043d\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u043e\u0442\u0432\u0435\u0442\u0430 \u043c\u043e\u0434\u0435\u043b\u044c \u043c\u0433\u043d\u043e\u0432\u0435\u043d\u043d\u043e \u0438\u0445 \u0443\u0447\u0438\u0442\u044b\u0432\u0430\u043b\u0430, \u043d\u0435 \u0434\u043e\u0436\u0438\u0434\u0430\u044f\u0441\u044c \u043f\u043e\u0432\u0442\u043e\u0440\u043d\u043e\u0433\u043e \u0447\u0442\u0435\u043d\u0438\u044f \u0441 \u0434\u0438\u0441\u043a\u0430. <\/span><b>Inference Context Memory Platform<\/b><span style=\"font-weight: 400;\"> \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442 \u0442\u0430\u043a\u0438\u0435 \u043f\u0430\u0439\u043f\u043b\u0430\u0439\u043d\u044b, \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u044f \u043e\u0431\u0449\u0438\u0439 \u043d\u0438\u0437\u043a\u043e\u043b\u0430\u0442\u0435\u043d\u0442\u043d\u044b\u0439 \u043a\u044d\u0448 \u0434\u043b\u044f \u0440\u0430\u0437\u043d\u044b\u0445 \u043a\u043e\u043c\u043f\u043e\u043d\u0435\u043d\u0442\u043e\u0432 AI-\u0441\u0438\u0441\u0442\u0435\u043c\u044b. IBM, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043e\u0442\u043c\u0435\u0447\u0430\u0435\u0442, \u0447\u0442\u043e \u0438\u0445 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 \u0441 NVIDIA \u0443\u0441\u043a\u043e\u0440\u044f\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">multimodal \u0438 RAG<\/span><\/i><span style=\"font-weight: 400;\">-\u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438 \u0432 \u0435\u0434\u0438\u043d\u043e\u043c \u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0435 \u0434\u0430\u043d\u043d\u044b\u0445<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=The%20Storage%20Scale%20single%20namespace,systems%2C%20and%20RAG%20style%20pipelines\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[101]<\/span><\/a><span style=\"font-weight: 400;\">. \u042d\u0442\u043e \u043e\u0442\u043a\u0440\u044b\u0432\u0430\u0435\u0442 \u0434\u043e\u0440\u043e\u0433\u0443 \u0431\u043e\u043b\u0435\u0435 \u0441\u043b\u043e\u0436\u043d\u044b\u043c \u0430\u0433\u0435\u043d\u0442\u0430\u043c, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043c\u043e\u0433\u0443\u0442 \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u0441 \u0440\u0430\u0437\u043d\u044b\u043c\u0438 \u0438\u0441\u0442\u043e\u0447\u043d\u0438\u043a\u0430\u043c\u0438 \u0434\u0430\u043d\u043d\u044b\u0445, \u0441\u043e\u0445\u0440\u0430\u043d\u044f\u044f \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u0432\u0441\u0435 \u0432 \u043e\u0431\u0449\u0435\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430.<\/span><\/li>\n<li><b> \u041f\u043e\u0441\u0442-\u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435 \u0438 RLHF.<\/b><span style=\"font-weight: 400;\"> \u0415\u0449\u0435 \u043e\u0434\u0438\u043d \u043f\u043e\u0442\u0435\u043d\u0446\u0438\u0430\u043b\u044c\u043d\u044b\u0439 \u0441\u0446\u0435\u043d\u0430\u0440\u0438\u0439 \u2013 \u043f\u043e\u0441\u0442-\u0442\u0440\u0435\u043d\u0438\u0440\u043e\u0432\u043e\u0447\u043d\u044b\u0435 \u044d\u0442\u0430\u043f\u044b \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u043c\u043e\u0434\u0435\u043b\u044f\u043c\u0438, \u0442\u0430\u043a\u0438\u0435 \u043a\u0430\u043a <\/span><i><span style=\"font-weight: 400;\">\u043f\u043e\u0441\u0442-\u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435 (post-training tuning)<\/span><\/i><span style=\"font-weight: 400;\"> \u0438\u043b\u0438 <\/span><i><span style=\"font-weight: 400;\">reinforcement learning with human feedback (RLHF)<\/span><\/i><span style=\"font-weight: 400;\">. \u041d\u0430 \u044d\u0442\u0438\u0445 \u044d\u0442\u0430\u043f\u0430\u0445 \u0447\u0430\u0441\u0442\u043e \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u043f\u0440\u043e\u0433\u043e\u043d\u044f\u0442\u044c \u043c\u043e\u0434\u0435\u043b\u044c \u0447\u0435\u0440\u0435\u0437 \u043e\u0447\u0435\u043d\u044c \u0434\u043b\u0438\u043d\u043d\u044b\u0435 \u044d\u043f\u0438\u0437\u043e\u0434\u044b \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u044f (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043c\u043e\u0434\u0435\u043b\u044c \u0432\u0435\u0434\u0435\u0442 \u0434\u0438\u0430\u043b\u043e\u0433 \u0441 \u0441\u0438\u043c\u0443\u043b\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u043c \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c \u0438\u043b\u0438 \u043e\u0431\u0445\u043e\u0434\u0438\u0442 \u0431\u043e\u043b\u044c\u0448\u0438\u0435 \u043e\u0431\u044a\u0435\u043c\u044b \u0434\u0430\u043d\u043d\u044b\u0445). \u041a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u0430\u044f \u043f\u0430\u043c\u044f\u0442\u044c \u0442\u0430\u043a\u0436\u0435 \u043f\u043e\u043b\u0435\u0437\u043d\u0430: \u043e\u043d\u0430 \u043f\u043e\u0437\u0432\u043e\u043b\u044f\u0435\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0442\u0440\u0430\u0435\u043a\u0442\u043e\u0440\u0438\u0438 \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u044f, \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442\u044b \u043d\u0430 \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u0448\u0430\u0433\u0430\u0445 \u0438 \u043f\u0435\u0440\u0435\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c \u0438\u0445 \u043f\u0440\u0438 \u0430\u043d\u0430\u043b\u0438\u0437\u0435 \u0438\u043b\u0438 \u043e\u0431\u043d\u043e\u0432\u043b\u0435\u043d\u0438\u0438 \u043c\u043e\u0434\u0435\u043b\u0438. \u041a\u0440\u043e\u043c\u0435 \u0442\u043e\u0433\u043e, \u043f\u0440\u0438 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0435 \u0441 \u0434\u043e\u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435\u043c (on-the-fly adaptation) \u043c\u043e\u0436\u043d\u043e \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u043d\u043e\u0432\u044b\u0435 \u0437\u043d\u0430\u043d\u0438\u044f, \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u043d\u044b\u0435 \u043e\u0442 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044f, \u0432\u043e \u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 \u0430\u0433\u0435\u043d\u0442\u0430. \u0412\u0441\u0435 \u044d\u0442\u043e \u043f\u043e\u0432\u044b\u0448\u0430\u0435\u0442 \u0433\u0438\u0431\u043a\u043e\u0441\u0442\u044c \u043c\u043e\u0434\u0435\u043b\u0438 \u0432 \u043f\u0440\u043e\u0434\u0430\u043a\u0448\u0435\u043d\u0435 \u0431\u0435\u0437 \u043f\u043e\u043b\u043d\u043e\u0433\u043e \u043f\u0435\u0440\u0435\u0440\u0430\u0437\u0432\u0435\u0440\u0442\u044b\u0432\u0430\u043d\u0438\u044f.\u00a0<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">\u0412 \u0440\u0435\u0430\u043b\u044c\u043d\u044b\u0445 \u0432\u043d\u0435\u0434\u0440\u0435\u043d\u0438\u044f\u0445 \u043d\u0430 2025\u20132026 \u0433\u043e\u0434 \u043e\u0436\u0438\u0434\u0430\u0435\u0442\u0441\u044f, \u0447\u0442\u043e \u0441\u043d\u0430\u0447\u0430\u043b\u0430 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0431\u0443\u0434\u0435\u0442 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0432 \u043a\u0440\u0443\u043f\u043d\u044b\u0445 \u0434\u0430\u0442\u0430\u0446\u0435\u043d\u0442\u0440\u0430\u0445 (\u0442\u0430\u043a \u043d\u0430\u0437\u044b\u0432\u0430\u0435\u043c\u044b\u0435 <\/span><b>\u201cAI factories\u201d<\/b><span style=\"font-weight: 400;\">), \u0433\u0434\u0435 \u0440\u0430\u0437\u0432\u0435\u0440\u043d\u0443\u0442\u044b \u0431\u043e\u043b\u044c\u0448\u0438\u0435 \u044f\u0437\u044b\u043a\u043e\u0432\u044b\u0435 \u043c\u043e\u0434\u0435\u043b\u0438 \u0441 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u043e\u0439 \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u0441\u0435\u0441\u0441\u0438\u0439 \u0438 \u043c\u043d\u043e\u0436\u0435\u0441\u0442\u0432\u0430 \u0430\u0433\u0435\u043d\u0442\u043e\u0432<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20NVIDIA%20Rubin%20platform%20enables,building%20block%20for%20AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[102]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Within%20each%20pod%2C%20NVIDIA%20Inference,shared%20KV%20cache%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[103]<\/span><\/a><span style=\"font-weight: 400;\">. \u0418\u043c\u0435\u043d\u043d\u043e \u0442\u0430\u043c \u0432\u044b\u0438\u0433\u0440\u044b\u0448 5\u00d7 \u0432 throughput \u0438 \u044d\u043d\u0435\u0440\u0433\u043e\u043f\u043e\u0442\u0440\u0435\u0431\u043b\u0435\u043d\u0438\u0438 \u043f\u0440\u0438\u043d\u043e\u0441\u0438\u0442 \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u044b\u0439 \u044d\u0444\u0444\u0435\u043a\u0442 \u0432 \u043f\u0435\u0440\u0435\u0441\u0447\u0435\u0442\u0435 \u043d\u0430 \u0431\u044e\u0434\u0436\u0435\u0442 \u0438 \u043a\u0430\u0447\u0435\u0441\u0442\u0432\u043e \u0441\u0435\u0440\u0432\u0438\u0441\u0430. \u041f\u043e\u0441\u0442\u0435\u043f\u0435\u043d\u043d\u043e, \u0441 \u043f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u0435\u043c \u0431\u043e\u043b\u0435\u0435 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b\u0445 \u0440\u0435\u0448\u0435\u043d\u0438\u0439 \u043e\u0442 \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u043e\u0432, \u0442\u0435\u0445\u043d\u043e\u043b\u043e\u0433\u0438\u044f \u043c\u043e\u0436\u0435\u0442 \u043d\u0430\u0439\u0442\u0438 \u043f\u0440\u0438\u043c\u0435\u043d\u0435\u043d\u0438\u0435 \u0438 \u0432 \u043c\u0435\u043d\u044c\u0448\u0438\u0445 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0430\u0445 \u2013 \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432 \u043a\u043e\u0440\u043f\u043e\u0440\u0430\u0442\u0438\u0432\u043d\u044b\u0445 AI-\u0441\u0438\u0441\u0442\u0435\u043c\u0430\u0445, \u0433\u0434\u0435 \u0435\u0441\u0442\u044c \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0441\u0442\u044c \u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0438\u0441\u0442\u043e\u0440\u0438\u044e \u0432\u0437\u0430\u0438\u043c\u043e\u0434\u0435\u0439\u0441\u0442\u0432\u0438\u044f \u0441 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u043c (\u043f\u0435\u0440\u0441\u043e\u043d\u0430\u043b\u044c\u043d\u044b\u0439 \u043f\u043e\u043c\u043e\u0449\u043d\u0438\u043a \u043d\u0430 \u0441\u0430\u0439\u0442\u0435, \u0430\u043d\u0430\u043b\u0438\u0437 \u043f\u043e\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 \u0434\u0435\u0439\u0441\u0442\u0432\u0438\u0439 \u0432 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430\u0445 \u0438 \u043f\u0440.).<\/span><\/p>\n<h2><strong>\u041e\u0442\u043a\u0440\u044b\u0442\u044b\u0435 \u0432\u043e\u043f\u0440\u043e\u0441\u044b \u0438 \u043f\u0435\u0440\u0441\u043f\u0435\u043a\u0442\u0438\u0432\u044b<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">\u041d\u0435\u0441\u043c\u043e\u0442\u0440\u044f \u043d\u0430 \u044f\u0432\u043d\u044b\u0435 \u043f\u0440\u0435\u0438\u043c\u0443\u0449\u0435\u0441\u0442\u0432\u0430, \u043d\u043e\u0432\u0430\u044f \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043f\u043e\u0434\u043d\u0438\u043c\u0430\u0435\u0442 \u0440\u044f\u0434 <\/span><b>\u0442\u0435\u0445\u043d\u0438\u0447\u0435\u0441\u043a\u0438\u0445 \u0438 \u0441\u0442\u0440\u0430\u0442\u0435\u0433\u0438\u0447\u0435\u0441\u043a\u0438\u0445 \u0432\u043e\u043f\u0440\u043e\u0441\u043e\u0432<\/b><span style=\"font-weight: 400;\"> \u0434\u043b\u044f \u0438\u043d\u0434\u0443\u0441\u0442\u0440\u0438\u0438:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>\u041c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u043c\u043e\u0441\u0442\u044c \u0438 \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0440\u0430\u0437\u0432\u0435\u0440\u0442\u044b\u0432\u0430\u043d\u0438\u044f.<\/b><span style=\"font-weight: 400;\"> \u041a\u0430\u043a \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u043e\u0432\u0430\u0442\u044c ICMS \u0437\u0430 \u043f\u0440\u0435\u0434\u0435\u043b\u044b \u043e\u0434\u043d\u043e\u0433\u043e pod (\u0441\u0442\u043e\u0439\u043a\u0438 \u0438\u043b\u0438 \u0441\u0443\u043f\u0435\u0440\u043a\u043e\u043c\u043f\u044c\u044e\u0442\u0435\u0440\u0430)? \u0421\u0435\u0439\u0447\u0430\u0441 \u043e\u0431\u044a\u044f\u0432\u043b\u0435\u043d\u043e, \u0447\u0442\u043e \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0431\u0443\u0434\u0435\u0442 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u0430 \u0432\u043e 2-\u0439 \u043f\u043e\u043b\u043e\u0432\u0438\u043d\u0435 2026 \u0433\u043e\u0434\u0430, \u0438 \u0440\u0435\u0444\u0435\u0440\u0435\u043d\u0441\u043d\u044b\u0435 \u0434\u0438\u0437\u0430\u0439\u043d\u044b \u0440\u0430\u0441\u0441\u0447\u0438\u0442\u0430\u043d\u044b \u043d\u0430 SuperPod \u0441 ~1152 GPU \u0438 ~9.6 \u041f\u0411 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=Ozery%20says%20there%20are%2016,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[55]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412\u043e\u043f\u0440\u043e\u0441 \u2013 \u043a\u0430\u043a \u043e\u0431\u044a\u0435\u0434\u0438\u043d\u044f\u0442\u044c \u043d\u0435\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0442\u0430\u043a\u0438\u0445 pod\u2019\u043e\u0432, \u0431\u0443\u0434\u0435\u0442 \u043b\u0438 \u0435\u0434\u0438\u043d\u043e\u0435 \u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u0441\u0442\u0432\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u043c\u0435\u0436\u0434\u0443 \u0434\u0430\u0442\u0430\u0446\u0435\u043d\u0442\u0440\u0430\u043c\u0438? IBM \u0443\u043a\u0430\u0437\u044b\u0432\u0430\u0435\u0442, \u0447\u0442\u043e \u0438\u0445 Storage Scale \u043c\u043e\u0436\u0435\u0442 \u0440\u0430\u0441\u0442\u044f\u0433\u0438\u0432\u0430\u0442\u044c namespace \u043d\u0430 \u043e\u0431\u043b\u0430\u043a\u043e \u0438 \u043a\u0440\u0430\u0439<\/span><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=reduces%20recomputation%20and%20improves%20cache,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[89]<\/span><\/a><span style=\"font-weight: 400;\">, \u043d\u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u0441\u044f \u043b\u0438 \u043f\u0440\u0438 \u044d\u0442\u043e\u043c \u043d\u0438\u0437\u043a\u0430\u044f \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430 \u2013 \u043e\u0442\u043a\u0440\u044b\u0442\u043e. \u0412\u043e\u0437\u043c\u043e\u0436\u043d\u043e, \u043f\u043e\u044f\u0432\u044f\u0442\u0441\u044f \u043c\u043d\u043e\u0433\u043e\u0443\u0440\u043e\u0432\u043d\u0435\u0432\u044b\u0435 \u0438\u0435\u0440\u0430\u0440\u0445\u0438\u0438 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0433\u043b\u043e\u0431\u0430\u043b\u044c\u043d\u044b\u0439 G4, \u043b\u043e\u043a\u0430\u043b\u044c\u043d\u044b\u0439 G3.5 \u0432 \u043a\u0430\u0436\u0434\u043e\u043c \u0440\u0435\u0433\u0438\u043e\u043d\u0435). \u0422\u0430\u043a\u0436\u0435, \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u044b \u043e\u0442\u043c\u0435\u0447\u0430\u044e\u0442, \u0447\u0442\u043e \u043a\u043b\u0438\u0435\u043d\u0442\u044b \u0441 \u0433\u0435\u0442\u0435\u0440\u043e\u0433\u0435\u043d\u043d\u044b\u043c\u0438 \u0441\u0440\u0435\u0434\u0430\u043c\u0438 (\u0440\u0430\u0437\u043d\u044b\u0435 GPU \u0438\u043b\u0438 \u043d\u0430\u043b\u0438\u0447\u0438\u0435 AMD\/Intel \u0443\u0441\u043a\u043e\u0440\u0438\u0442\u0435\u043b\u0435\u0439) \u0434\u043e\u043b\u0436\u043d\u044b \u0442\u0449\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u043f\u0440\u043e\u0434\u0443\u043c\u0430\u0442\u044c \u0441\u0432\u043e\u0439 \u0432\u044b\u0431\u043e\u0440: NVIDIA ICMS \u043f\u0440\u0435\u043a\u0440\u0430\u0441\u043d\u043e \u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442 <\/span><b>\u0432\u043d\u0443\u0442\u0440\u0438 NVIDIA-\u0441\u0442\u0435\u043a\u0430<\/b><span style=\"font-weight: 400;\">, \u043d\u043e \u043a\u0430\u043a \u0431\u044b\u0442\u044c, \u0435\u0441\u043b\u0438 \u0432 \u0447\u0430\u0441\u0442\u0438 \u0437\u0430\u0434\u0430\u0447 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0434\u0440\u0443\u0433\u0438\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b?<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%2C%20with%20this%20announcement%2C%20validates,think%20carefully%20about%20their%20choices\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[104]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=BlueField,comprehensive%20KV%20cache%20management%20strategy\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[105]<\/span><\/a><span style=\"font-weight: 400;\"> \u041c\u043e\u0436\u0435\u0442 \u043f\u043e\u0442\u0440\u0435\u0431\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0442\u044c \u043f\u0430\u0440\u0430\u043b\u043b\u0435\u043b\u044c\u043d\u043e \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u0432\u0440\u043e\u0434\u0435 LMCache, \u0447\u0442\u043e \u0434\u043e\u0431\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043b\u043e\u0436\u043d\u043e\u0441\u0442\u0438.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Latency vs. \u043a\u043e\u043d\u0441\u0438\u0441\u0442\u0435\u043d\u0442\u043d\u043e\u0441\u0442\u044c \u0434\u0430\u043d\u043d\u044b\u0445.<\/b><span style=\"font-weight: 400;\"> \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043f\u043e\u0437\u0438\u0446\u0438\u043e\u043d\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043d\u0438\u0437\u043a\u043e-\u043b\u0430\u0442\u0435\u043d\u0442\u043d\u0430\u044f, \u043d\u043e \u0440\u0435\u0430\u043b\u044c\u043d\u0430\u044f \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430 \u0434\u043e\u0441\u0442\u0443\u043f\u0430 \u043a G3.5-tier \u0432\u0441\u0435 \u0436\u0435 \u0432\u044b\u0448\u0435, \u0447\u0435\u043c HBM. \u0412 \u043a\u0440\u0438\u0442\u0438\u0447\u043d\u044b\u0445 \u043f\u043e \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0435 \u0437\u0430\u0434\u0430\u0447\u0430\u0445 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, real-time \u043f\u0440\u0438\u043b\u043e\u0436\u0435\u043d\u0438\u044f) \u043c\u043e\u0436\u0435\u0442 \u0432\u0441\u0442\u0430\u0442\u044c \u0432\u043e\u043f\u0440\u043e\u0441: \u0434\u043e\u043f\u0443\u0441\u0442\u0438\u043c\u0430 \u043b\u0438 \u0434\u0430\u0436\u0435 \u0441\u043e\u0442\u043d\u044f \u043c\u0438\u043a\u0440\u043e\u0441\u0435\u043a\u0443\u043d\u0434 \u043d\u0430 \u0434\u043e\u0441\u0442\u0443\u043f \u043a \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0443? NVIDIA \u0437\u0430\u044f\u0432\u043b\u044f\u0435\u0442 &lt;1 \u043c\u0441 RDMA-\u0434\u043e\u0441\u0442\u0443\u043f \u0432 \u0441\u0440\u0435\u0434\u043d\u0435\u043c<\/span><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[106]<\/span><\/a><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[107]<\/span><\/a><span style=\"font-weight: 400;\">, \u0438 \u044d\u0442\u043e \u0432\u0435\u0441\u044c\u043c\u0430 \u043c\u0430\u043b\u043e, \u043d\u043e <\/span><i><span style=\"font-weight: 400;\">tail latency<\/span><\/i><span style=\"font-weight: 400;\"> (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432 \u043f\u0435\u0440\u0435\u0433\u0440\u0443\u0437\u043a\u0435 \u0441\u0435\u0442\u0438) \u0432\u0441\u0435 \u0440\u0430\u0432\u043d\u043e \u043c\u043e\u0436\u0435\u0442 \u0440\u0430\u0441\u0442\u0438. \u0414\u043b\u044f \u043c\u0438\u043d\u0438\u043c\u0438\u0437\u0430\u0446\u0438\u0438 \u043d\u0443\u0436\u043d\u044b \u043e\u0442\u043b\u0430\u0436\u0435\u043d\u043d\u044b\u0435 \u043c\u0435\u0445\u0430\u043d\u0438\u0437\u043c\u044b <\/span><b>\u043a\u044d\u0448\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0438 \u043f\u0440\u0435\u0434\u0432\u044b\u0431\u043e\u0440\u043a\u0438<\/b><span style=\"font-weight: 400;\">: \u0442.\u0435. \u043a\u0430\u043a\u0438\u0435 \u0431\u043b\u043e\u043a\u0438 KV \u0434\u0435\u0440\u0436\u0430\u0442\u044c \u0432 GPU\/CPU \u043f\u0430\u043c\u044f\u0442\u0438 \u043f\u043e\u0441\u0442\u043e\u044f\u043d\u043d\u043e, \u0430 \u043a\u0430\u043a\u0438\u0435 \u043c\u043e\u0436\u043d\u043e \u0432\u044b\u0433\u0440\u0443\u0437\u0438\u0442\u044c. \u042d\u0442\u0438 \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c\u044b \u043f\u043e\u043a\u0430 \u0437\u0430\u043a\u0440\u044b\u0442\u044b (\u0432 \u043d\u0435\u0434\u0440\u0430\u0445 Dynamo), \u0438 \u0431\u0443\u0434\u0435\u0442 \u0432\u0430\u0436\u043d\u043e, \u043d\u0430\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u043e\u043d\u0438 \u0443\u043c\u043d\u044b \u2013 \u043e\u0448\u0438\u0431\u043a\u0438 \u043c\u043e\u0433\u0443\u0442 \u043f\u0440\u0438\u0432\u043e\u0434\u0438\u0442\u044c \u043a \u0432\u043d\u0435\u0437\u0430\u043f\u043d\u044b\u043c \u0437\u0430\u0434\u0435\u0440\u0436\u043a\u0430\u043c (cache miss, \u0442\u0440\u0435\u0431\u0443\u044e\u0449\u0438\u0439 \u0441\u0440\u043e\u0447\u043d\u043e\u0433\u043e \u0447\u0442\u0435\u043d\u0438\u044f \u0431\u043e\u043b\u044c\u0448\u043e\u0433\u043e \u0431\u043b\u043e\u043a\u0430 \u0441 \u0444\u043b\u0435\u0448\u0430). \u0422\u0443\u0442 \u0436\u0435 \u0432\u043e\u043f\u0440\u043e\u0441 \u043a\u043e\u043d\u0441\u0438\u0441\u0442\u0435\u043d\u0442\u043d\u043e\u0441\u0442\u0438: \u0445\u043e\u0442\u044f KV-\u043a\u044d\u0448 \u043e\u0431\u044a\u044f\u0432\u043b\u0435\u043d <\/span><i><span style=\"font-weight: 400;\">stateless<\/span><\/i><span style=\"font-weight: 400;\">, \u043d\u0430 \u043f\u0440\u0430\u043a\u0442\u0438\u043a\u0435 \u043e\u043d \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 <\/span><b>\u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u0441\u0435\u0441\u0441\u0438\u0439<\/b><span style=\"font-weight: 400;\">, \u0438 \u0435\u0433\u043e \u0446\u0435\u043b\u043e\u0441\u0442\u043d\u043e\u0441\u0442\u044c \u0432\u0430\u0436\u043d\u0430<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%20describes%20BlueField,though%20this%20characterization%20requires%20qualification\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[108]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=overall%20inference%20context%20maintained%20across,layers%20must%20track%20and%20coordinate\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[109]<\/span><\/a><span style=\"font-weight: 400;\">. \u041e\u0440\u043a\u0435\u0441\u0442\u0440\u0430\u0442\u043e\u0440\u044b \u0434\u043e\u043b\u0436\u043d\u044b \u0441\u043b\u0435\u0434\u0438\u0442\u044c \u0437\u0430 \u0432\u0430\u043b\u0438\u0434\u043d\u043e\u0441\u0442\u044c\u044e \u0434\u0430\u043d\u043d\u044b\u0445: \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0435\u0441\u043b\u0438 \u0447\u0430\u0441\u0442\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u0438\u0437\u043c\u0435\u043d\u0438\u043b\u0430\u0441\u044c (\u043d\u043e\u0432\u044b\u0439 \u0448\u0430\u0433 \u0432 \u0434\u0438\u0430\u043b\u043e\u0433\u0435), \u0441\u0442\u0430\u0440\u044b\u0435 \u043a\u043e\u043f\u0438\u0438 \u0434\u043e\u043b\u0436\u043d\u044b \u0438\u043d\u0432\u0430\u043b\u0438\u0434\u0438\u0440\u043e\u0432\u0430\u0442\u044c\u0441\u044f. \u0420\u0435\u0448\u0430\u0435\u0442\u0441\u044f \u043b\u0438 \u044d\u0442\u043e \u0441\u0440\u0435\u0434\u0441\u0442\u0432\u0430\u043c\u0438 Dynamo\/NIXL? \u0412\u0435\u0440\u043e\u044f\u0442\u043d\u043e \u0434\u0430, \u043d\u043e \u043f\u043e\u0434\u0440\u043e\u0431\u043d\u043e\u0441\u0442\u0438 \u043d\u0435 \u0440\u0430\u0441\u043a\u0440\u044b\u0442\u044b. \u042d\u0442\u043e \u043d\u0430\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0434\u043b\u044f \u0434\u0430\u043b\u044c\u043d\u0435\u0439\u0448\u0435\u0439 \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u2013 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e, \u043f\u043e\u044f\u0432\u044f\u0442\u0441\u044f \u043f\u0440\u043e\u0442\u043e\u043a\u043e\u043b\u044b \u043a\u043e\u0433\u0435\u0440\u0435\u043d\u0442\u043d\u043e\u0441\u0442\u0438 \u0434\u043b\u044f \u0440\u0430\u0441\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0433\u043e KV-\u043a\u044d\u0448\u0430, \u0430\u043d\u0430\u043b\u043e\u0433\u0438\u0447\u043d\u044b\u0435 cache coherence \u0432 \u0441\u0443\u043f\u0435\u0440\u043a\u043e\u043c\u043f\u044c\u044e\u0442\u0435\u0440\u0430\u0445, \u043d\u043e \u0440\u0430\u0431\u043e\u0442\u0430\u044e\u0449\u0438\u0435 \u043f\u043e RDMA.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>\u0421\u0442\u043e\u0438\u043c\u043e\u0441\u0442\u044c \u0438 \u044d\u043a\u043e\u043d\u043e\u043c\u0438\u0447\u0435\u0441\u043a\u0430\u044f \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u044c.<\/b><span style=\"font-weight: 400;\"> \u0412\u043d\u0435\u0434\u0440\u0435\u043d\u0438\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u0430\u043f\u043f\u0430\u0440\u0430\u0442\u043d\u043e\u0433\u043e \u0443\u0440\u043e\u0432\u043d\u044f \u0432\u043b\u0435\u0447\u0435\u0442 \u0440\u0430\u0441\u0445\u043e\u0434\u044b: DPU BlueField-4, \u0432\u044b\u0441\u043e\u043a\u043e\u0441\u043a\u043e\u0440\u043e\u0441\u0442\u043d\u044b\u0435 SSD, \u043d\u043e\u0432\u044b\u0435 \u0441\u0435\u0442\u0435\u0432\u044b\u0435 \u043a\u043e\u043c\u043c\u0443\u0442\u0430\u0442\u043e\u0440\u044b. \u041f\u043b\u044e\u0441, \u0443\u0432\u0435\u043b\u0438\u0447\u0435\u043d\u0438\u0435 \u043e\u0431\u0449\u0435\u0439 \u0441\u043b\u043e\u0436\u043d\u043e\u0441\u0442\u0438 \u0441\u0438\u0441\u0442\u0435\u043c\u044b (\u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u043c\u0435\u0441\u0442\u043e \u0432 \u0441\u0442\u043e\u0439\u043a\u0430\u0445, \u043f\u0438\u0442\u0430\u043d\u0438\u0435, \u0430\u0434\u043c\u0438\u043d\u0438\u0441\u0442\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435 \u043d\u043e\u0432\u043e\u0433\u043e \u0441\u043b\u043e\u044f). \u0425\u043e\u0442\u044f NVIDIA \u0443\u0442\u0432\u0435\u0440\u0436\u0434\u0430\u0435\u0442, \u0447\u0442\u043e \u0432\u044b\u0433\u043e\u0434\u044b \u043f\u0435\u0440\u0435\u0432\u0435\u0448\u0438\u0432\u0430\u044e\u0442 \u2013 \u0437\u0430 \u0441\u0447\u0435\u0442 \u044d\u043a\u043e\u043d\u043e\u043c\u0438\u0438 \u043c\u043e\u0449\u043d\u043e\u0441\u0442\u0438 GPU \u0438 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u0438 \u043d\u0435 \u043f\u043e\u043a\u0443\u043f\u0430\u0442\u044c \u043b\u0438\u0448\u043d\u0438\u0435 GPU \u0442\u043e\u043b\u044c\u043a\u043e \u0440\u0430\u0434\u0438 \u043f\u0430\u043c\u044f\u0442\u0438<\/span><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=The%20ability%20to%20offload%20KV,engine%E2%80%94transforms%20the%20economics%20of%20AI\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[99]<\/span><\/a><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=,conversations%20across%20multiple%20user%20sessions\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[110]<\/span><\/a><span style=\"font-weight: 400;\">, \u2013 \u043d\u043e \u0442\u0449\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u0439 \u0430\u043d\u0430\u043b\u0438\u0437 TCO \u043f\u0440\u0435\u0434\u0441\u0442\u043e\u0438\u0442. \u0423 AMD, \u043a \u0441\u043b\u043e\u0432\u0443, \u0441\u0442\u0440\u0430\u0442\u0435\u0433\u0438\u044f \u0438\u043d\u0430\u044f: \u0438\u0445 \u043d\u043e\u0432\u044b\u0439 <\/span><b>MI300X<\/b><span style=\"font-weight: 400;\"> \u043f\u0440\u043e\u0441\u0442\u043e \u0438\u043c\u0435\u0435\u0442 <\/span><i><span style=\"font-weight: 400;\">192 \u0413\u0411 HBM3<\/span><\/i><span style=\"font-weight: 400;\"> \u043d\u0430 \u0431\u043e\u0440\u0442\u0443, \u0447\u0442\u043e \u0432 \u0440\u044f\u0434\u0435 \u0441\u043b\u0443\u0447\u0430\u0435\u0432 \u043f\u043e\u0437\u0432\u043e\u043b\u0438\u0442 \u043e\u0431\u043e\u0439\u0442\u0438\u0441\u044c \u0431\u0435\u0437 offload\u2019\u0430, \u043f\u0440\u0430\u0432\u0434\u0430 \u0446\u0435\u043d\u043e\u0439 \u0433\u043e\u0440\u0430\u0437\u0434\u043e \u0431\u043e\u043b\u0435\u0435 \u0434\u043e\u0440\u043e\u0433\u043e\u0433\u043e \u0443\u0441\u043a\u043e\u0440\u0438\u0442\u0435\u043b\u044f<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=AMD%20MI300X\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[111]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=The%20MI300X%20provides%20192GB%20of,preventing%20model%20splitting%20across%20GPUs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[112]<\/span><\/a><span style=\"font-weight: 400;\">. AMD \u043a\u0430\u043a \u0431\u044b \u0433\u043e\u0432\u043e\u0440\u0438\u0442: \u201c\u0443 \u043d\u0430\u0441 \u0438 \u0442\u0430\u043a \u043f\u0430\u043c\u044f\u0442\u044c \u0431\u043e\u043b\u044c\u0448\u0435, \u043f\u043e\u043c\u0435\u0441\u0442\u0438\u0442\u0435 \u0432\u0435\u0441\u044c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0442\u0443\u0434\u0430\u201d. \u041e\u0434\u043d\u0430\u043a\u043e \u044d\u0442\u043e\u0442 \u043f\u0443\u0442\u044c \u043d\u0435 \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u0442\u0441\u044f \u0437\u0430 \u043f\u0440\u0435\u0434\u0435\u043b\u044b \u043e\u0434\u043d\u043e\u0433\u043e GPU \u0438 \u043d\u0435 \u0440\u0435\u0448\u0430\u0435\u0442 \u0448\u0430\u0440\u0438\u043d\u0433 \u043c\u0435\u0436\u0434\u0443 \u0443\u0437\u043b\u0430\u043c\u0438<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=AMD%E2%80%99s%20memory,cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[113]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=as%20eliminating%20the%20need%20for,cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[114]<\/span><\/a><span style=\"font-weight: 400;\">. NVIDIA \u0434\u0435\u043b\u0430\u0435\u0442 \u0441\u0442\u0430\u0432\u043a\u0443 \u043d\u0430 <\/span><b>\u0434\u0435\u0448\u0435\u0432\u0443\u044e Flash vs \u0434\u043e\u0440\u043e\u0433\u0430\u044f HBM<\/b><span style=\"font-weight: 400;\">. \u0427\u0442\u043e \u0432\u044b\u0433\u043e\u0434\u043d\u0435\u0435 \u2013 \u0431\u0443\u0434\u0435\u0442 \u0437\u0430\u0432\u0438\u0441\u0435\u0442\u044c \u043e\u0442 \u043a\u043e\u043d\u043a\u0440\u0435\u0442\u043d\u044b\u0445 \u0440\u0430\u0431\u043e\u0447\u0438\u0445 \u043d\u0430\u0433\u0440\u0443\u0437\u043e\u043a. \u0412\u043f\u043e\u043b\u043d\u0435 \u0432\u0435\u0440\u043e\u044f\u0442\u043d\u043e \u043f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u0435 <\/span><b>\u0433\u0438\u0431\u0440\u0438\u0434\u043d\u044b\u0445 \u043c\u0435\u0442\u0440\u0438\u043a \u044d\u043a\u043e\u043d\u043e\u043c\u0438\u0447\u043d\u043e\u0441\u0442\u0438<\/b><span style=\"font-weight: 400;\">, \u043d\u0430\u043f\u0440. \u201c\u0441\u0442\u043e\u0438\u043c\u043e\u0441\u0442\u044c \u0437\u0430 \u0442\u043e\u043a\u0435\u043d\/\u0441\u0435\u043a\u201d \u0438\u043b\u0438 \u201c\u0442\u043e\u043a\u0435\u043d\u043e\u0432 \u043d\u0430 \u0432\u0430\u0442\u0442\u201d<\/span><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Unlike%20immutable%20enterprise%20records%2C%20inference,useful%20tokens%20it%20can%20deliver\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[115]<\/span><\/a><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a><span style=\"font-weight: 400;\">, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u043f\u043e\u043a\u0430\u0436\u0443\u0442, \u043e\u043f\u0440\u0430\u0432\u0434\u0430\u043d \u043b\u0438 \u043d\u043e\u0432\u044b\u0439 \u0441\u043b\u043e\u0439. \u0415\u0441\u043b\u0438 \u0434\u0430, \u0442\u043e \u0434\u0430\u0436\u0435 \u0434\u043e\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u0432\u043b\u043e\u0436\u0435\u043d\u0438\u044f \u043e\u043a\u0443\u043f\u044f\u0442\u0441\u044f \u0443\u043b\u0443\u0447\u0448\u0435\u043d\u0438\u0435\u043c throughput.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>\u041f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0430 \u0440\u0430\u0437\u043d\u044b\u0445 \u0444\u043e\u0440\u043c\u0430\u0442\u043e\u0432 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u0438 \u043d\u043e\u0432\u044b\u0445 \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c\u043e\u0432.<\/b><span style=\"font-weight: 400;\"> \u041f\u043e\u043a\u0430 \u043a\u043e\u043d\u0446\u0435\u043f\u0446\u0438\u044f \u044f\u0432\u043d\u043e \u0437\u0430\u0442\u043e\u0447\u0435\u043d\u0430 \u043f\u043e\u0434 <\/span><i><span style=\"font-weight: 400;\">\u0442\u0440\u0430\u043d\u0441\u0444\u043e\u0440\u043c\u0435\u0440\u044b<\/span><\/i><span style=\"font-weight: 400;\"> \u0438 \u0438\u0445 KV-\u043a\u044d\u0448\u0438. \u0412\u043e\u043f\u0440\u043e\u0441 \u2013 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u043c\u0430 \u043b\u0438 \u043e\u043d\u0430 \u043a \u0434\u0440\u0443\u0433\u0438\u043c \u0442\u0438\u043f\u0430\u043c \u043c\u043e\u0434\u0435\u043b\u0435\u0439? \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432\u0441\u043f\u043e\u043c\u0438\u043d\u0430\u0435\u043c\u044b\u0435 \u043d\u0435\u0439\u0440\u043e\u043d\u043d\u044b\u0435 \u0441\u0435\u0442\u0438 (RNN) \u0438\u043b\u0438 \u0441\u043e\u0432\u0441\u0435\u043c \u0438\u043d\u044b\u0435 \u043f\u043e\u0434\u0445\u043e\u0434\u044b \u043a \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u0438 (\u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0438\u0441\u0441\u043b\u0435\u0434\u0443\u044e\u0442 \u0432\u043d\u0435\u0448\u043d\u0438\u0435 \u0434\u0438\u0444\u0444\u0435\u0440\u0435\u043d\u0446\u0438\u0440\u0443\u0435\u043c\u044b\u0435 \u043f\u0430\u043c\u044f\u0442\u0438, \u0431\u0430\u0437\u044b \u0437\u043d\u0430\u043d\u0438\u0439 \u0438 \u0442.\u043f.). \u0412\u0435\u0440\u043e\u044f\u0442\u043d\u043e, \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0433\u0438\u0431\u043a\u0430\u044f: \u0440\u0430\u0437 \u043e\u043d\u0430 \u043e\u043f\u0435\u0440\u0438\u0440\u0443\u0435\u0442 \u0430\u0431\u0441\u0442\u0440\u0430\u043a\u0442\u043d\u044b\u043c\u0438 \u0431\u043b\u043e\u043a\u0430\u043c\u0438 \u201c\u043a\u043b\u044e\u0447-\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435\u201d, \u0442\u043e \u043d\u0435 \u043f\u0440\u0438\u0432\u044f\u0437\u0430\u043d\u0430 \u0441\u0442\u0440\u043e\u0433\u043e \u043a Transformers. \u041e\u0434\u043d\u0430\u043a\u043e \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u044c \u0434\u043b\u044f \u0434\u0440\u0443\u0433\u0438\u0445 \u043f\u043e\u0434\u0445\u043e\u0434\u043e\u0432 \u043d\u0430\u0434\u043e \u0438\u0437\u0443\u0447\u0430\u0442\u044c. \u041a\u0440\u043e\u043c\u0435 \u0442\u043e\u0433\u043e, \u0444\u043e\u0440\u043c\u0430\u0442\u044b \u0441\u0430\u043c\u0438\u0445 \u0434\u0430\u043d\u043d\u044b\u0445 \u043c\u043e\u0433\u0443\u0442 \u043c\u0435\u043d\u044f\u0442\u044c\u0441\u044f: \u0441\u0435\u0439\u0447\u0430\u0441 KV \u2013 \u044d\u0442\u043e \u0442\u0435\u043d\u0437\u043e\u0440\u044b \u0441 \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u043e\u0439 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u043e\u0439 (\u0433\u043e\u043b\u043e\u0432\u044b \u0432\u043d\u0438\u043c\u0430\u043d\u0438\u044f, \u0438 \u043f\u0440.), \u0430 \u0432 \u0431\u0443\u0434\u0443\u0449\u0435\u043c \u043c\u043e\u0434\u0435\u043b\u0438 \u043c\u043e\u0433\u0443\u0442 \u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0438 \u0434\u0440\u0443\u0433\u0438\u0435 \u0432\u0438\u0434\u044b \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u044f. \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u0434\u043e\u043b\u0436\u043d\u0430 \u0430\u0434\u0430\u043f\u0442\u0438\u0440\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u2013 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e, \u043f\u043e\u044f\u0432\u044f\u0442\u0441\u044f \u0440\u0430\u0441\u0448\u0438\u0440\u0435\u043d\u0438\u044f NVMe KV \u043f\u043e\u0434 \u0434\u0440\u0443\u0433\u0438\u0435 \u0442\u0438\u043f\u044b \u0437\u0430\u043f\u0438\u0441\u0435\u0439 \u0438\u043b\u0438 DOCA \u0431\u0443\u0434\u0435\u0442 \u0434\u043e\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u0430. \u0421\u0442\u0440\u0430\u0442\u0435\u0433\u0438\u0447\u0435\u0441\u043a\u0438 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u043c \u043e\u0441\u0442\u0430\u0435\u0442\u0441\u044f \u0438 \u0432\u043e\u043f\u0440\u043e\u0441 <\/span><b>\u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u043e\u0432<\/b><span style=\"font-weight: 400;\">: \u0437\u0430\u043a\u0440\u0435\u043f\u0438\u0442\u0441\u044f \u043b\u0438 NVMe Key-Value \u043a\u0430\u043a \u043e\u0441\u043d\u043e\u0432\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0440\u0444\u0435\u0439\u0441 \u0434\u043b\u044f \u0442\u0430\u043a\u0438\u0445 \u0437\u0430\u0434\u0430\u0447 \u0438\u043b\u0438 \u0432\u043e\u0437\u043d\u0438\u043a\u043d\u0443\u0442 \u043d\u043e\u0432\u044b\u0435 \u0441\u043f\u0435\u0446\u0438\u0444\u0438\u043a\u0430\u0446\u0438\u0438? NVIDIA \u0432\u044b\u0431\u0440\u0430\u043b\u0430 NVMe KV, \u0447\u0442\u043e\u0431\u044b \u0431\u044b\u0442\u044c \u0441\u043e\u0432\u043c\u0435\u0441\u0442\u0438\u043c\u043e\u0439 \u0441 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0443\u044e\u0449\u0438\u043c (\u043f\u0443\u0441\u0442\u044c \u0438 \u043c\u0430\u043b\u043e\u0440\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u0435\u043d\u043d\u044b\u043c) \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u043e\u043c<\/span><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=logos%20were%20displayed,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[116]<\/span><\/a><span style=\"font-weight: 400;\">. \u0415\u0441\u043b\u0438 \u044d\u0442\u043e\u0442 \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442 \u043f\u043e\u043b\u0443\u0447\u0438\u0442 \u0440\u0430\u0441\u043f\u0440\u043e\u0441\u0442\u0440\u0430\u043d\u0435\u043d\u0438\u0435 (\u0435\u0433\u043e \u0443\u0436\u0435 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0435\u0442, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, RocksDB \u0438 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0435 KV-SSD), \u0442\u043e \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u0430 \u043c\u043e\u0436\u0435\u0442 \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u0438\u043c\u043f\u0443\u043b\u044c\u0441 \u043a \u0440\u0430\u0437\u0432\u0438\u0442\u0438\u044e \u0432\u043d\u0435 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u0438 \u043e\u0442 NVIDIA. \u0415\u0441\u043b\u0438 \u0436\u0435 \u043d\u0435\u0442, \u043c\u043e\u0436\u0435\u0442 \u043f\u043e\u0442\u0440\u0435\u0431\u043e\u0432\u0430\u0442\u044c\u0441\u044f \u0443\u0447\u0430\u0441\u0442\u0438\u0435 \u043e\u0442\u043a\u0440\u044b\u0442\u044b\u0445 \u043e\u0440\u0433\u0430\u043d\u0438\u0437\u0430\u0446\u0438\u0439 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, OpenCompute) \u0434\u043b\u044f \u0432\u044b\u0440\u0430\u0431\u043e\u0442\u043a\u0438 cross-vendor \u0441\u0442\u0430\u043d\u0434\u0430\u0440\u0442\u0430 \u201cMemory Tier for AI\u201d.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>\u0414\u043e\u0441\u0442\u0443\u043f\u043d\u043e\u0441\u0442\u044c \u0438 \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u0430.<\/b><span style=\"font-weight: 400;\"> \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 \u043e\u0431\u044a\u044f\u0432\u043b\u0435\u043d\u0430, \u043d\u043e \u043f\u043e\u043a\u0430 \u043d\u0430\u0445\u043e\u0434\u0438\u0442\u0441\u044f \u043d\u0430 \u044d\u0442\u0430\u043f\u0435 \u0432\u043d\u0435\u0434\u0440\u0435\u043d\u0438\u044f. NVIDIA \u0433\u043e\u0432\u043e\u0440\u0438\u0442 \u043e \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u043e\u0441\u0442\u0438 \u0432 2026 \u0433\u043e\u0434\u0443, \u0438 \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0430 \u043f\u0430\u0440\u0442\u043d\u0435\u0440\u043e\u0432 \u0432\u044b\u0433\u043b\u044f\u0434\u0438\u0442 \u0448\u0438\u0440\u043e\u043a\u043e\u0457<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%20highlighted%20during%20its%20launch,called%20out%20NetApp%20specifically%20during\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[76]<\/span><\/a><span style=\"font-weight: 400;\">. \u041e\u0434\u043d\u0430\u043a\u043e \u0440\u0435\u0430\u043b\u044c\u043d\u044b\u0445 \u0432\u043d\u0435\u0434\u0440\u0435\u043d\u0438\u0439 \u0435\u0449\u0435 \u043d\u0435 \u0431\u044b\u043b\u043e \u043f\u0443\u0431\u043b\u0438\u0447\u043d\u043e \u043f\u0440\u043e\u0434\u0435\u043c\u043e\u043d\u0441\u0442\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u043e (\u043f\u043e \u043a\u0440\u0430\u0439\u043d\u0435\u0439 \u043c\u0435\u0440\u0435, \u0432 \u043e\u0431\u0449\u0435\u043c \u0434\u043e\u0441\u0442\u0443\u043f\u0435 \u043d\u0435\u0442 \u043a\u0435\u0439\u0441\u043e\u0432 \u043d\u0430 \u043d\u0430\u0447\u0430\u043b\u043e 2026 \u0433.). \u041f\u0435\u0440\u0432\u044b\u0435 \u043f\u0440\u043e\u0435\u043a\u0442\u044b, \u0441\u043a\u043e\u0440\u0435\u0435 \u0432\u0441\u0435\u0433\u043e, \u0431\u0443\u0434\u0443\u0442 \u043f\u0438\u043b\u043e\u0442\u043d\u044b\u043c\u0438 \u0432 \u0441\u043e\u0442\u0440\u0443\u0434\u043d\u0438\u0447\u0435\u0441\u0442\u0432\u0435 \u0441 \u043a\u0440\u0443\u043f\u043d\u0435\u0439\u0448\u0438\u043c\u0438 \u043e\u0431\u043b\u0430\u0447\u043d\u044b\u043c\u0438 \u043f\u0440\u043e\u0432\u0430\u0439\u0434\u0435\u0440\u0430\u043c\u0438 (<\/span><i><span style=\"font-weight: 400;\">\u043d\u0435\u043e\u043a\u043b\u0430\u0443\u0434\u044b<\/span><\/i><span style=\"font-weight: 400;\">, \u043a\u0440\u0443\u043f\u043d\u044b\u0435 \u043a\u043e\u0440\u043f\u043e\u0440\u0430\u0446\u0438\u0438)<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Enterprises%20operating%20at%20sufficient%20scale,model%20parameters%20continue%20to%20grow\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[117]<\/span><\/a><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=efficiency%20constraints%20,model%20parameters%20continue%20to%20grow\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[118]<\/span><\/a><span style=\"font-weight: 400;\">. \u0412\u043e\u043f\u0440\u043e\u0441 \u2013 \u043d\u0430\u0441\u043a\u043e\u043b\u044c\u043a\u043e \u0433\u043b\u0430\u0434\u043a\u043e \u043f\u0440\u043e\u0439\u0434\u0435\u0442 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u044f: \u0431\u0443\u0434\u0443\u0442 \u043b\u0438, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u043f\u043e\u043f\u0443\u043b\u044f\u0440\u043d\u044b\u0435 \u0444\u0440\u0435\u0439\u043c\u0432\u043e\u0440\u043a\u0438 (PyTorch, JAX) \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u0442\u044c \u043f\u0440\u043e\u0437\u0440\u0430\u0447\u043d\u043e \u0432\u044b\u0433\u0440\u0443\u0437\u043a\u0443 KV, \u043f\u043e\u044f\u0432\u044f\u0442\u0441\u044f \u043b\u0438 \u0438\u043d\u0441\u0442\u0440\u0443\u043c\u0435\u043d\u0442\u044b \u043c\u043e\u043d\u0438\u0442\u043e\u0440\u0438\u043d\u0433\u0430 \u0438 \u043e\u0442\u043b\u0430\u0434\u043a\u0438 \u0434\u043b\u044f \u044d\u0442\u043e\u0433\u043e \u0443\u0440\u043e\u0432\u043d\u044f (\u043e\u0442\u043c\u0435\u0442\u0438\u043c, \u0447\u0442\u043e \u0432 NIXL \u0443\u0436\u0435 \u0437\u0430\u043b\u043e\u0436\u0435\u043d\u044b telemetry hooks<\/span><a href=\"https:\/\/github.com\/ai-dynamo\/nixl#:~:text=Documentation%20and%20Resources\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[119]<\/span><\/a><span style=\"font-weight: 400;\">). \u0422\u0430\u043a\u0436\u0435 \u043f\u043e\u0441\u043c\u043e\u0442\u0440\u0438\u043c, <\/span><b>\u043a\u0430\u043a \u043e\u0442\u0432\u0435\u0442\u044f\u0442 \u043a\u043e\u043d\u043a\u0443\u0440\u0435\u043d\u0442\u044b<\/b><span style=\"font-weight: 400;\">: AMD \u0438 Intel \u043f\u043e\u043a\u0430 \u043d\u0435 \u0438\u043c\u0435\u044e\u0442 \u043f\u0440\u044f\u043c\u043e\u0433\u043e \u0430\u043d\u0430\u043b\u043e\u0433\u0430, \u043d\u043e \u043c\u043e\u0433\u0443\u0442 \u0443\u0441\u0438\u043b\u0438\u0442\u044c \u0434\u0440\u0443\u0433\u0438\u0435 \u0430\u0441\u043f\u0435\u043a\u0442\u044b (\u0431\u043e\u043b\u044c\u0448\u0435 HBM, \u0431\u044b\u0441\u0442\u0440\u0435\u0435 interconnect). \u0414\u043b\u044f \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439 \u0436\u0435 \u0431\u043b\u0438\u0436\u0430\u0439\u0448\u0430\u044f \u043f\u0435\u0440\u0441\u043f\u0435\u043a\u0442\u0438\u0432\u0430 \u2013 \u0432\u043d\u0438\u043c\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0446\u0435\u043d\u0438\u0442\u044c \u0441\u0432\u043e\u0438 \u043d\u0430\u0433\u0440\u0443\u0437\u043a\u0438: \u0435\u0441\u043b\u0438 \u0432\u044b \u043f\u043b\u0430\u043d\u0438\u0440\u0443\u0435\u0442\u0435 \u0432\u043d\u0435\u0434\u0440\u044f\u0442\u044c \u0431\u043e\u043b\u044c\u0448\u0438\u0435 LLM \u0441 \u043c\u043d\u043e\u0433\u043e\u043c\u0438\u043b\u043b\u0438\u043e\u043d\u043d\u044b\u043c \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043e\u043c, \u0441\u0442\u043e\u0438\u0442 \u0437\u0430\u043a\u043b\u0430\u0434\u044b\u0432\u0430\u0442\u044c \u0432 \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0443 \u043f\u043e\u0434\u043e\u0431\u043d\u044b\u0439 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0439 \u0443\u0440\u043e\u0432\u0435\u043d\u044c. NVIDIA \u0443\u0436\u0435 \u201c\u0443\u0437\u0430\u043a\u043e\u043d\u0438\u043b\u0430\u201d \u0435\u0433\u043e \u043f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u0435 \u0432 AI-\u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0435<\/span><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20approach%20is%20sound%20in,context%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[120]<\/span><\/a><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=platform%20targets%20the%20metrics%20that,now%20define%20inference%20success\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[121]<\/span><\/a><span style=\"font-weight: 400;\">, \u0438 \u0438\u043d\u0434\u0443\u0441\u0442\u0440\u0438\u044f, \u0432\u0435\u0440\u043e\u044f\u0442\u043d\u043e, \u043f\u043e\u0441\u043b\u0435\u0434\u0443\u0435\u0442 \u0432 \u044d\u0442\u043e\u043c \u043d\u0430\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u0438, \u0434\u0435\u043b\u0430\u044f \u0434\u043e\u043b\u0433\u043e\u0441\u0440\u043e\u0447\u043d\u0443\u044e \u043f\u0430\u043c\u044f\u0442\u044c \u043d\u0435\u043e\u0442\u044a\u0435\u043c\u043b\u0435\u043c\u043e\u0439 \u0447\u0430\u0441\u0442\u044c\u044e \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u043c\u043e\u0433\u043e \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430.<\/span><\/li>\n<\/ul>\n<p><b>\u0412\u044b\u0432\u043e\u0434:<\/b><span style=\"font-weight: 400;\"> NVIDIA Inference Context Memory Platform \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u0441\u043e\u0431\u043e\u0439 \u0432\u0430\u0436\u043d\u044b\u0439 \u0448\u0430\u0433 \u0432 \u0440\u0430\u0437\u0432\u0438\u0442\u0438\u0438 \u0438\u043d\u0444\u0440\u0430\u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u044b \u0434\u043b\u044f \u0418\u0418. \u041e\u043d\u0430 \u043d\u0430\u0446\u0435\u043b\u0435\u043d\u0430 \u043d\u0430 \u0440\u0435\u0448\u0435\u043d\u0438\u0435 \u043a\u043e\u043d\u043a\u0440\u0435\u0442\u043d\u043e\u0433\u043e \u0443\u0437\u043a\u043e\u0433\u043e \u043c\u0435\u0441\u0442\u0430 \u2013 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u043d\u043e\u0441\u0442\u0438 \u043f\u0430\u043c\u044f\u0442\u0438 \u0434\u043b\u044f \u0434\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 \u2013 \u0438 \u0434\u0435\u043b\u0430\u0435\u0442 \u044d\u0442\u043e \u043f\u043e\u0441\u0440\u0435\u0434\u0441\u0442\u0432\u043e\u043c \u0438\u043d\u043d\u043e\u0432\u0430\u0446\u0438\u043e\u043d\u043d\u043e\u0439 \u0438\u043d\u0442\u0435\u0433\u0440\u0430\u0446\u0438\u0438 GPU, \u0441\u0435\u0442\u0435\u0439 \u0438 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f. \u041f\u0435\u0440\u0432\u044b\u0435 \u043e\u0446\u0435\u043d\u043a\u0438 \u043e\u0431\u0435\u0449\u0430\u044e\u0442 \u0437\u043d\u0430\u0447\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0439 \u043f\u0440\u0438\u0440\u043e\u0441\u0442 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438 \u0438 \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0441\u0442\u0438 \u0434\u043b\u044f \u0434\u043e\u043b\u0433\u0438\u0445 \u0434\u0438\u0430\u043b\u043e\u0433\u043e\u0432 \u0438 <\/span><i><span style=\"font-weight: 400;\">agentic<\/span><\/i><span style=\"font-weight: 400;\"> AI. \u0412 \u0442\u043e \u0436\u0435 \u0432\u0440\u0435\u043c\u044f, \u043f\u0435\u0440\u0435\u0434 \u0432\u043d\u0435\u0434\u0440\u0435\u043d\u0438\u0435\u043c \u0442\u0430\u043a\u043e\u0433\u043e \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u0441\u0442\u043e\u0438\u0442 \u0443\u0447\u0438\u0442\u044b\u0432\u0430\u0442\u044c \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u043d\u044b\u0435 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f \u0438 \u0437\u0440\u0435\u043b\u043e\u0441\u0442\u044c \u044d\u043a\u043e\u0441\u0438\u0441\u0442\u0435\u043c\u044b. \u041f\u043e \u043c\u0435\u0440\u0435 \u0442\u043e\u0433\u043e \u043a\u0430\u043a \u0442\u0435\u0445\u043d\u043e\u043b\u043e\u0433\u0438\u0438 \u0432\u043e\u043a\u0440\u0443\u0433 \u0434\u043b\u0438\u043d\u043d\u043e\u0433\u043e \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u0430 (LMCache, \u043d\u043e\u0432\u044b\u0435 GPU \u0441 \u0431\u043e\u043b\u044c\u0448\u0435\u0439 \u043f\u0430\u043c\u044f\u0442\u044c\u044e, \u041f\u041e \u0434\u043b\u044f \u0443\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u043f\u0430\u043c\u044f\u0442\u044c\u044e) \u0431\u0443\u0434\u0443\u0442 \u0440\u0430\u0437\u0432\u0438\u0432\u0430\u0442\u044c\u0441\u044f, \u043c\u043e\u0436\u043d\u043e \u043e\u0436\u0438\u0434\u0430\u0442\u044c \u043f\u043e\u044f\u0432\u043b\u0435\u043d\u0438\u0435 \u0432\u0441\u0435 \u0431\u043e\u043b\u0435\u0435 \u0443\u043d\u0438\u0444\u0438\u0446\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0445 \u0438 \u043e\u043f\u0442\u0438\u043c\u0430\u043b\u044c\u043d\u044b\u0445 \u0440\u0435\u0448\u0435\u043d\u0438\u0439. \u041d\u0430 \u0434\u0430\u043d\u043d\u044b\u0439 \u043c\u043e\u043c\u0435\u043d\u0442 NVIDIA \u0444\u0430\u043a\u0442\u0438\u0447\u0435\u0441\u043a\u0438 \u0437\u0430\u0434\u0430\u0435\u0442 \u0442\u0440\u0435\u043d\u0434: <\/span><i><span style=\"font-weight: 400;\">\u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442 \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u0441\u0442\u0430\u043b \u0442\u0430\u043a\u043e\u0439 \u0436\u0435 \u0432\u0430\u0436\u043d\u043e\u0439 \u0447\u0430\u0441\u0442\u044c\u044e \u0441\u0438\u0441\u0442\u0435\u043c\u044b, \u043a\u0430\u043a \u0438 \u0441\u0430\u043c\u0438 \u043c\u043e\u0434\u0435\u043b\u0438<\/span><\/i><span style=\"font-weight: 400;\">, \u0438 \u043e\u043d \u0442\u0440\u0435\u0431\u0443\u0435\u0442 \u0441\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e \u043c\u0430\u0441\u0448\u0442\u0430\u0431\u0438\u0440\u0443\u0435\u043c\u043e\u0433\u043e \u201c\u0445\u0440\u0430\u043d\u0438\u043b\u0438\u0449\u0430 \u043f\u0430\u043c\u044f\u0442\u0438\u201d. \u041f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u0430 Context Memory Storage \u2013 \u043f\u0435\u0440\u0432\u0430\u044f \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u044d\u0442\u043e\u0439 \u0438\u0434\u0435\u0438 \u043d\u0430 \u0443\u0440\u043e\u0432\u043d\u0435 \u0434\u0430\u0442\u0430\u0446\u0435\u043d\u0442\u0440\u0430, \u0438, \u0441\u0443\u0434\u044f \u043f\u043e \u043f\u043e\u0434\u0434\u0435\u0440\u0436\u043a\u0435 \u0438\u043d\u0434\u0443\u0441\u0442\u0440\u0438\u0438, \u0437\u0430 \u043d\u0435\u0439 \u0441\u043b\u0435\u0434\u0443\u0435\u0442 \u043d\u043e\u0432\u0430\u044f \u044d\u0440\u0430 \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440 AI, \u0433\u0434\u0435 \u0432\u044b\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u044f \u0438 \u0434\u0430\u043d\u043d\u044b\u0435 \u043d\u0435\u0440\u0430\u0437\u0440\u044b\u0432\u043d\u043e \u0441\u043b\u0438\u0442\u044b \u0432\u043e\u0435\u0434\u0438\u043d\u043e \u0434\u043b\u044f \u0434\u043e\u0441\u0442\u0438\u0436\u0435\u043d\u0438\u044f \u043c\u0430\u043a\u0441\u0438\u043c\u0430\u043b\u044c\u043d\u043e\u0439 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0438<\/span><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=NVIDIA%20Creates%20a%20New%20Infrastructure,Inference%20Context%20Memory%20Storage%20Platform\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[122]<\/span><\/a><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=platform%20targets%20the%20metrics%20that,now%20define%20inference%20success\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[121]<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=In%20transformer,be%20shared%20across%20inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[1]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20increases%20pressure%20on%20existing,and%20leaving%20expensive%20GPUs%20underutilized\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[5]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[6]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Powered%20by%20the%20NVIDIA%20BlueField,power%20efficient%20than%20traditional%20storage\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[11]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20reliable%20prestaging%2C%20backed%20by,oF%20and%20object%2FRDMA%20protocols\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[13]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[14]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=orchestration%20frameworks%2C%20such%20as%20NVIDIA,context%20across%20these%20storage%20tiers\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[16]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=storage%20hierarchy\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[17]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=G1%20is%20optimized%20for%20access,both%20cost%20and%20power%20consumption\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[18]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=the%20highest%20efficiency%2C%20making%20it,overhead%20that%20reduces%20overall%20efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[19]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=58%20Image%3A%20A%20four,token%20overhead%20increase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[20]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Inference%20frameworks%20like%20NVIDIA%20Dynamo,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[24]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Power%20availability%20is%20the%20primary,for%20ephemeral%2C%20reconstructable%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[25]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=KV%20cache%20fundamentally%20differs%20from,purpose%20storage%20approaches\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[26]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=chatbots%20to%20complex%2C%20multiturn%20agentic,services%20and%20revisited%20over%20time\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[27]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=sessions%20and%20be%20shared%20across,inference%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[28]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=This%20efficiency%20extends%20beyond%20the,for%20the%20entire%20AI%20pod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[30]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20BlueField%E2%80%914%E2%80%93powered%20ICMS%20provides%20AI%E2%80%91native,shorter%20tail%20latencies%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[31]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Inference%20frameworks%20like%20NVIDIA%20Dynamo,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[33]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=These%20crypto%20and%20integrity%20accelerators,performance%20required%20for%20KV%20cache\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[35]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=with%20NVIDIA%20Inference%20Transfer%20Library,ahead%20of%20the%20decode%20phase\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[39]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=At%20the%20inference%20layer%2C%20NVIDIA,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[40]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=NVIDIA%20BlueField,at%20up%20to%20800%20Gb%2Fs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[46]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20architecture%20uses%20BlueField%E2%80%914%20to,with%20predictable%2C%20low%E2%80%91latency%2C%20high%E2%80%91bandwidth%20connectivity\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[48]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Spectrum,packet%20loss%20under%20heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[61]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=latency%2C%20and%20packet%20loss%20under,heavy%20load\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[62]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=NVIDIA%20Dynamo%20and%20NIXL%20coordinate,generation%20AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[66]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=orchestration%20layer%20using%20NVIDIA%20Grove,as%20they%20move%20between%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[67]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Additionally%2C%20the%20NVIDIA%20DOCA%20framework,from%20the%20underlying%20flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[68]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=flash%20media\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[69]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=The%20NVIDIA%20Rubin%20platform%20enables,building%20block%20for%20AI%20factories\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[102]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Within%20each%20pod%2C%20NVIDIA%20Inference,shared%20KV%20cache%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[103]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/#:~:text=Unlike%20immutable%20enterprise%20records%2C%20inference,useful%20tokens%20it%20can%20deliver\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[115]<\/span><\/a><span style=\"font-weight: 400;\"> Introducing NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform for the Next Frontier of AI | NVIDIA Technical Blog<\/span><\/p>\n<p><a href=\"https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/developer.nvidia.com\/blog\/introducing-nvidia-bluefield-4-powered-inference-context-memory-storage-platform-for-the-next-frontier-of-ai\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Jensen%20Huang%20explicitly%20framed%20this,be%20designed%20for%20AI%20systems\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[2]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,also%20the%20context%20window%20length\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[3]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,a%20single%20GPU%E2%80%99s%20local%20memory\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[4]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20Inference%20Context%20Memory%20Storage,a%20transient%20byproduct%20of%20computation\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[9]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=longer%20a%20one,be%20designed%20for%20AI%20systems\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[29]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=L1%20%E2%80%94%20GPU,Context\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[41]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=L2%20%E2%80%94%20Near,Vera%20CPU\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[44]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=%2A%20Interconnect%3A%20NVLink\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[45]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Spectrum,Access\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[63]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Memory%20disaggregation%20only%20works%20if,millisecond%20latency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[64]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=Hardware%20needs%20software%20to%20manage,software%20stack%20provides%20that%20through\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[71]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,across%20nodes%20and%20maintains%20consistency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[72]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[106]<\/span><\/a> <a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/#:~:text=,efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[107]<\/span><\/a><span style=\"font-weight: 400;\"> NVIDIA Unveils the Inference Context Memory Storage Platform \u2014 A New Era for Long-Context AI &#8212; BuySellRam<\/span><\/p>\n<p><a href=\"https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/www.buysellram.com\/blog\/nvidia-unveils-the-inference-context-memory-storage-platform\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20establishes%20what%20NVIDIA%20terms,memory%20at%20the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[7]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=manage%20flash,the%20pod%20level\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[12]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=The%20technology%20targets%20a%20specific,penalties%20that%20degrade%20inference%20efficiency\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[15]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=ICMS%20introduces%20an%20intermediate%20%E2%80%9CG3,for%20KV%20cache%20data%20characteristics\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[21]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Open\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[37]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Unlike%20NVIDIA%E2%80%99s%20ICMS%2C%20which%20requires,4\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[38]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=BlueField,accelerator%20in%20Rubin%20compute%20nodes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[47]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=%2A%20%C2%A0BlueFied,CPU%20resources%20for%20inference%20serving\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[49]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Traditional%20storage%20architectures%20introduce%20multiple,metrics%20that%20determine%20inference%20responsiveness\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[50]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=storage%20controller%2C%20controller%20to%20file,metrics%20that%20determine%20inference%20responsiveness\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[51]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=the%20memory%20capacity%20of%20individual,GPU%20and%20CPU%20resources\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[52]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Concurrency%20presents%20an%20additional%20challenge,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[53]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,fabrics%2C%20orchestration%20frameworks%2C%20and%20storage\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[59]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=platforms\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[60]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20ICMS%20platform%20highlights%20broader,infrastructure%20requirements%20are%20fundamentally%20changing\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[65]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=align%20with%20the%20block,as%20a%20distinct%20data%20class\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[70]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=,across%20these%20three%20storage%20engines\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[74]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[75]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%20highlighted%20during%20its%20launch,called%20out%20NetApp%20specifically%20during\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[76]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=supporting%20the%20effort%2C%20including%20AIC%2C%C2%A0Cloudian%2C%C2%A0DDN%2C%C2%A0Dell,NetApp%20specifically%20during%20his%20keynote\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[77]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%2C%20with%20this%20announcement%2C%20validates,think%20carefully%20about%20their%20choices\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[96]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=storage%20simultaneously,wide%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[100]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%2C%20with%20this%20announcement%2C%20validates,think%20carefully%20about%20their%20choices\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[104]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=BlueField,comprehensive%20KV%20cache%20management%20strategy\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[105]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%20describes%20BlueField,though%20this%20characterization%20requires%20qualification\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[108]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=overall%20inference%20context%20maintained%20across,layers%20must%20track%20and%20coordinate\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[109]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=AMD%20MI300X\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[111]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=The%20MI300X%20provides%20192GB%20of,preventing%20model%20splitting%20across%20GPUs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[112]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=AMD%E2%80%99s%20memory,cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[113]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=as%20eliminating%20the%20need%20for,cache%20across%20multiple%20inference%20instances\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[114]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=Enterprises%20operating%20at%20sufficient%20scale,model%20parameters%20continue%20to%20grow\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[117]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=efficiency%20constraints%20,model%20parameters%20continue%20to%20grow\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[118]<\/span><\/a> <a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/#:~:text=NVIDIA%E2%80%99s%20approach%20is%20sound%20in,context%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[120]<\/span><\/a><span style=\"font-weight: 400;\"> Research Note: Improving Inference with NVIDIA\u2019s Inference Context Memory Storage Platform &#8212; NAND Research<\/span><\/p>\n<p><a href=\"https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/nand-research.com\/research-note-improving-inference-nvidias-inference-context-memory-storage-platform\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=ImageNvidia%20diagram%20showing%20KV%20cache,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[8]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=In%20other%20words%2C%20this%20infrastructure,It%20doesn%E2%80%99t%20do%20anything%20else\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[22]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=A%20KV%20cache,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[23]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=The%20ICMSP%20is%20a%20G3,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[34]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=Nvidia%20notes%3A%20%E2%80%9CBy%20leveraging%20standard,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[36]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=rack%20contains%2016%20storage%20enclosures,%3D%209%2C600%20TB\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[54]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=Ozery%20says%20there%20are%2016,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[55]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[56]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=many%20storage%20suppliers%20are%20partnering,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[78]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=3,4%20and%20we%20have%20a\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[83]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=program%20in%20plan%20to%20support,%E2%80%9D\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[84]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=serve%20inference%20context%20memory,doesn%E2%80%99t%20do%20anything%20else\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[97]<\/span><\/a> <a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/#:~:text=logos%20were%20displayed,presentation%20during%20his%20ICMSP%20pitch\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[116]<\/span><\/a><span style=\"font-weight: 400;\"> Nvidia&#8217;s basic context memory extension infrastructure<\/span><\/p>\n<p><a href=\"https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/blocksandfiles.com\/2026\/01\/12\/nvidias-basic-context-memory-extension-infrastructure\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Inference%20has%20become%20stateful,and%20becomes%20a%20platform%20requirement\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[10]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=infrastructure\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[79]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Augmented%20Memory%20Grid%20extends%20GPU,protocols%20or%20heavyweight%20storage%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[80]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=through%20slow%20protocols%20or%20heavyweight,storage%20services\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[81]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=As%20the%20ecosystem%20evolves%2C%20NVIDIA,approach%20allows%20inference%20platforms%20to\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[82]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=Agentic%20systems%20don%E2%80%99t%20answer%20once,across%20minutes%2C%20hours%2C%20or%20longer\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[98]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=platform%20targets%20the%20metrics%20that,now%20define%20inference%20success\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[121]<\/span><\/a> <a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/#:~:text=NVIDIA%20Creates%20a%20New%20Infrastructure,Inference%20Context%20Memory%20Storage%20Platform\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[122]<\/span><\/a><span style=\"font-weight: 400;\"> The Context Era: AI Inference &amp; Augmented Memory Grid &#8212; WEKA<\/span><\/p>\n<p><a href=\"https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/www.weka.io\/blog\/ai-ml\/the-context-era-has-begun\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/github.com\/ai-dynamo\/nixl#:~:text=NVIDIA%20Inference%20Xfer%20Library%20,in%20architecture\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[32]<\/span><\/a> <a href=\"https:\/\/github.com\/ai-dynamo\/nixl#:~:text=Documentation%20and%20Resources\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[119]<\/span><\/a><span style=\"font-weight: 400;\"> GitHub &#8212; ai-dynamo\/nixl: NVIDIA Inference Xfer Library (NIXL)<\/span><\/p>\n<p><a href=\"https:\/\/github.com\/ai-dynamo\/nixl\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/github.com\/ai-dynamo\/nixl<\/span><\/a><\/p>\n<p><a href=\"https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/#:~:text=next%20generation%20of%20AI%2C%20delivering,The%20Rubin%20GPU%20is\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[42]<\/span><\/a> <a href=\"https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/#:~:text=highlights%20a%20336,Rubin%20GPU\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[43]<\/span><\/a><span style=\"font-weight: 400;\"> Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer | NVIDIA Technical Blog<\/span><\/p>\n<p><a href=\"https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/developer.nvidia.com\/blog\/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer\/<\/span><\/a><\/p>\n<p><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=On%20the%20NVIDIA%20Rubin%20platform%2C,efficient%20scaling%20of%20agentic%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[57]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=KV%20cache%20capacity%20at%20the,efficient%20scaling%20of%20agentic%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[58]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=IBM%20Storage%20Scale%20provides%20a,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[85]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=awareness,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[86]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=NVIDIA%20BlueField,RDMA%20access%20to%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[87]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=enables%20high%20bandwidth%2C%20low%20latency,RDMA%20access%20to%20KV%20data\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[88]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=reduces%20recomputation%20and%20improves%20cache,throughput%20and%20efficiency%20at%20scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[89]<\/span><\/a> <a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale#:~:text=The%20Storage%20Scale%20single%20namespace,systems%2C%20and%20RAG%20style%20pipelines\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[101]<\/span><\/a><span style=\"font-weight: 400;\">\u00a0 Accelerating NVIDIA Dynamo with IBM Storage Scale and NVIDIA BlueField\u20114\u2011Powered Inference Context Memory Storage Platform\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/community.ibm.com\/community\/user\/blogs\/vincent-hsu\/2026\/01\/05\/accelerating-nvidia-dynamo-with-ibm-storage-scale<\/span><\/a><\/p>\n<p><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=For%20environments%20without%20NVIDIA%20BlueField,extension%20of%20your%20GPU%20memory\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[73]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=At%20the%20same%20time%2C%20for,number%20of%20queries%20per%20second\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[90]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=accelerate%20inference%2C%20delivering%20a%2019x,number%20of%20queries%20per%20second\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[91]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=Introducing%20Context%20Memory%20Storage%20Platform,4\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[92]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=the%20concept%20of%20CMS%20to,CMS%20to%20further%20accelerate%20inference\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[93]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=We%20support%20this%20offloading%20capability,storage%20for%20your%20specific%20needs\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[94]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=RDMA%20technology%2C%20you%20get%20the,minimizing%20latency%20and%20maximizing%20throughput\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[95]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=The%20ability%20to%20offload%20KV,engine%E2%80%94transforms%20the%20economics%20of%20AI\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[99]<\/span><\/a> <a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/#:~:text=,conversations%20across%20multiple%20user%20sessions\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">[110]<\/span><\/a><span style=\"font-weight: 400;\"> Dell and NVIDIA Expand the Horizons of AI Inference | Dell<\/span><\/p>\n<p><a href=\"https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/www.dell.com\/en-us\/blog\/dell-and-nvidia-expand-the-horizons-of-ai-inference\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u0410\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0438 \u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u0435 \u043f\u043b\u0430\u0442\u0444\u043e\u0440\u043c\u044b Context Memory Storage NVIDIA Inference Context Memory Storage Platform \u2013 \u044d\u0442\u043e \u043d\u043e\u0432\u0430\u044f \u0430\u0440\u0445\u0438\u0442\u0435\u043a\u0442\u0443\u0440\u0430 \u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0434\u0430\u043d\u043d\u044b\u0445, \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u043e \u0440\u0430\u0437\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u043d\u0430\u044f \u0434\u043b\u044f \u0443\u0441\u043a\u043e\u0440\u0435\u043d\u0438\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 \u043a\u0440\u0443\u043f\u043d\u044b\u0445 \u043c\u043e\u0434\u0435\u043b\u0435\u0439 \u0437\u0430 \u0441\u0447\u0435\u0442 \u044d\u0444\u0444\u0435\u043a\u0442\u0438\u0432\u043d\u043e\u0439 \u0440\u0430\u0431\u043e\u0442\u044b \u0441 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u043e\u0439 \u043f\u0430\u043c\u044f\u0442\u044c\u044e (inference context). \u0420\u0435\u0447\u044c \u0438\u0434\u0435\u0442 \u043e \u0434\u0430\u043d\u043d\u044b\u0445 \u0442\u0438\u043f\u0430 Key-Value (KV) cache, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0445\u0440\u0430\u043d\u044f\u0442 \u0438\u0441\u0442\u043e\u0440\u0438\u044e \u0438 \u0441\u043e\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u043c\u043e\u0434\u0435\u043b\u0438 \u0432\u043e \u0432\u0440\u0435\u043c\u044f \u0438\u043d\u0444\u0435\u0440\u0435\u043d\u0441\u0430 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0434\u043b\u0438\u043d\u043d\u044b\u0435 \u043a\u043e\u043d\u0442\u0435\u043a\u0441\u0442\u043d\u044b\u0435 [&hellip;]<\/p>\n","protected":false},"author":330,"featured_media":5207,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[260],"tags":[209,210,211,126,130,136,164,168,207,208],"class_list":["post-5203","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-industriya","tag-kv_cache","tag-long-context_llm","tag-ai_datacenter","tag-ai","tag-shd","tag-nvme","tag-gpu","tag-rdma","tag-nvidia","tag-inference"],"_links":{"self":[{"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/posts\/5203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/users\/330"}],"replies":[{"embeddable":true,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/comments?post=5203"}],"version-history":[{"count":2,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/posts\/5203\/revisions"}],"predecessor-version":[{"id":5209,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/posts\/5203\/revisions\/5209"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/media\/5207"}],"wp:attachment":[{"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/media?parent=5203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/categories?post=5203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/baum.ru\/blog\/wp-json\/wp\/v2\/tags?post=5203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}