{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":612354784,"defaultBranch":"master","name":"llama.cpp","ownerLogin":"ggerganov","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2023-03-10T18:58:00.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1991296?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1717288421.0","currentOid":""},"activityList":{"items":[{"before":"2c3d0b42f378824f0555f3ec1062f1c6a97e5b62","after":"fe3f6958bd64ecce4ac6548f69af7594fb8913db","ref":"refs/heads/0cc4m/vulkan-moe","pushedAt":"2024-06-02T07:50:10.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"0cc4m","name":null,"path":"/0cc4m","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/11707594?s=80&v=4"},"commit":{"message":"Fix crash when using split mode none and setting a main GPU","shortMessageHtmlLink":"Fix crash when using split mode none and setting a main GPU"}},{"before":"61200ef29fc0e76f264ada583b77e9228120779f","after":"eb589d5e3664b784aef5326aa14dd21889eb1948","ref":"refs/heads/compilade/refactor-kv-cache","pushedAt":"2024-06-02T04:20:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"llama : avoid copies for simple batch splits","shortMessageHtmlLink":"llama : avoid copies for simple batch splits"}},{"before":"79fd76c5a322d685c485ecb8f27d7527f0ad9106","after":"3af93718117d4c185bef78cae05898f9881c9c77","ref":"refs/heads/compilade/convert-hf-model-part-prefix","pushedAt":"2024-06-02T00:55:17.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"convert-hf : match model part name prefix and suffix","shortMessageHtmlLink":"convert-hf : match model part name prefix and suffix"}},{"before":"e09f7b41d382b6da7e0ee2c5156166e83844bc45","after":"79fd76c5a322d685c485ecb8f27d7527f0ad9106","ref":"refs/heads/compilade/convert-hf-model-part-prefix","pushedAt":"2024-06-02T00:51:43.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"convert-hf : match model part prefix and suffix","shortMessageHtmlLink":"convert-hf : match model part prefix and suffix"}},{"before":null,"after":"2db033fb785dad9d420b55064ce28d69f72dbc97","ref":"refs/heads/update_flake_lock_action","pushedAt":"2024-06-02T00:33:41.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"flake.lock: Update\n\nFlake lock file updates:\n\n• Updated input 'flake-parts':\n 'github:hercules-ci/flake-parts/8dc45382d5206bd292f9c2768b8058a8fd8311d9?narHash=sha256-/GJvTdTpuDjNn84j82cU6bXztE0MSkdnTWClUCRub78%3D' (2024-05-16)\n → 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01)\n• Updated input 'flake-parts/nixpkgs-lib':\n 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02)\n → 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01)\n• Updated input 'nixpkgs':\n 'github:NixOS/nixpkgs/bfb7a882678e518398ce9a31a881538679f6f092?narHash=sha256-4zSIhSRRIoEBwjbPm3YiGtbd8HDWzFxJjw5DYSDy1n8%3D' (2024-05-24)\n → 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)","shortMessageHtmlLink":"flake.lock: Update"}},{"before":"c322bf6a1f5cf3f6ea359ba02741ecda73d1c762","after":"e09f7b41d382b6da7e0ee2c5156166e83844bc45","ref":"refs/heads/compilade/convert-hf-model-part-prefix","pushedAt":"2024-06-02T00:33:01.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"convert-hf : match model part prefix, not only suffix","shortMessageHtmlLink":"convert-hf : match model part prefix, not only suffix"}},{"before":null,"after":"c322bf6a1f5cf3f6ea359ba02741ecda73d1c762","ref":"refs/heads/compilade/convert-hf-model-part-prefix","pushedAt":"2024-06-02T00:29:29.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"convert-hf : match model part prefix, not only suffix","shortMessageHtmlLink":"convert-hf : match model part prefix, not only suffix"}},{"before":"2e666832e6ac78194edf030bd1c295e21bdb022c","after":"e141ce624af57bdffbaf57014a044eb1d9689230","ref":"refs/heads/master","pushedAt":"2024-06-01T21:26:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"JohannesGaessler","name":"Johannes Gäßler","path":"/JohannesGaessler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18492268?s=80&v=4"},"commit":{"message":"Fix FlashAttention debug test, FP32 assert (#7684)","shortMessageHtmlLink":"Fix FlashAttention debug test, FP32 assert (#7684)"}},{"before":"18d1c140471da9443db9e0b67f61ccf540e113c0","after":"61200ef29fc0e76f264ada583b77e9228120779f","ref":"refs/heads/compilade/refactor-kv-cache","pushedAt":"2024-06-01T20:46:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"llama : fix edge case finding batch seq_id of split recurrent cell\n\nThis otherwise was a problem when running the HellaSwag benchmark\nwith small batch sizes, making it crash.","shortMessageHtmlLink":"llama : fix edge case finding batch seq_id of split recurrent cell"}},{"before":"2ac95c9d5678d05e253691fb1f26471675bff5ad","after":"2e666832e6ac78194edf030bd1c295e21bdb022c","ref":"refs/heads/master","pushedAt":"2024-06-01T19:31:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"server : new UI (#7633)\n\n* ic\r\n\r\n* migrate my eary work\r\n\r\n* add the belonging stuff: css,favicon etc\r\n\r\n* de prompts\r\n\r\n* chore: Update HTML meta tags in index.html file\r\n\r\n* add api-key css classes\r\n\r\n* some necessary fixes\r\n\r\n* Add API key CSS classes and update styling in style.css\r\n\r\n* clean the code\r\n\r\n* move API to the top, rearrange param sliders. update css\r\n\r\n* add tooltips to the parameters with comprehensible explanations\r\n\r\n* fix FloatField and BoolField tooltips\r\n\r\n* fix grammar field width\r\n\r\n* use template literales for promptFormats.js\r\n\r\n* update const ModelGenerationInfo\r\n\r\n* remove ms per token, since not relevant for most webui users and use cases\r\n\r\n* add phi-3 prompt template\r\n\r\n* add phi3 to dropdown\r\n\r\n* add css class\r\n\r\n* update forgotten css theme\r\n\r\n* add user message suffix\r\n\r\n* fix chatml & add llama3 format\r\n\r\n* fix llama3 prompt template\r\n\r\n* more prompt format fixes\r\n\r\n* add more comon stop tokens\r\n\r\n* add missing char\r\n\r\n* do not separate with new line or comma\r\n\r\n* move prompt style\r\n\r\n* add hacky llama2 prompt solution, reduce redundancy in promptFormats.js\r\n\r\n* fix toggle state localstorage\r\n\r\n* add cmd-r prompt et reduce redundancy\r\n\r\n* set default prompt to empty\r\n\r\n* move files, clean code\r\n\r\n* fix css path\r\n\r\n* add a button to the new ui\r\n\r\n* move new ui to \"/public\" due to otherwise problematic CORS behaviour\r\n\r\n* include new ui in cpp\r\n\r\n* fix wrong link to old ui\r\n\r\n* renaming to ensure consistency\r\n\r\n* fix typos \"prompt-format\" -> \"prompt-formats\"\r\n\r\n* use correct indent\r\n\r\n* add new ui files to makefile\r\n\r\n* fix typo","shortMessageHtmlLink":"server : new UI (#7633)"}},{"before":"72eea49224e5b90263de08f8cddc6010353841eb","after":"18d1c140471da9443db9e0b67f61ccf540e113c0","ref":"refs/heads/compilade/refactor-kv-cache","pushedAt":"2024-06-01T19:11:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"llama : minimize swaps when reordering logits\n\nThis reduces overhead when running hellaswag\non thousands of sequences with very small 100k params Mamba models.","shortMessageHtmlLink":"llama : minimize swaps when reordering logits"}},{"before":"c8f93774c925552fd84b49e7af21fef2af9f330f","after":"2c3d0b42f378824f0555f3ec1062f1c6a97e5b62","ref":"refs/heads/0cc4m/vulkan-moe","pushedAt":"2024-06-01T17:36:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"0cc4m","name":null,"path":"/0cc4m","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/11707594?s=80&v=4"},"commit":{"message":"Fix MUL_MAT_ID matrix vector shader and dispatch code","shortMessageHtmlLink":"Fix MUL_MAT_ID matrix vector shader and dispatch code"}},{"before":"181dadf294d9495b54a86a23299fc15b282dac1d","after":"72eea49224e5b90263de08f8cddc6010353841eb","ref":"refs/heads/compilade/refactor-kv-cache","pushedAt":"2024-06-01T16:48:57.000Z","pushType":"push","commitsCount":78,"pusher":{"login":"compilade","name":null,"path":"/compilade","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/113953597?s=80&v=4"},"commit":{"message":"llama : fix batch split output count for embeddings","shortMessageHtmlLink":"llama : fix batch split output count for embeddings"}},{"before":"750f60c03e4d3f53fa51910551ce87a3d508d2d7","after":"2ac95c9d5678d05e253691fb1f26471675bff5ad","ref":"refs/heads/master","pushedAt":"2024-06-01T16:20:18.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"mofosyne","name":"Brian","path":"/mofosyne","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/827793?s=80&v=4"},"commit":{"message":"SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548)\n\n* SimpleChat:DU:BringIn local helper js modules using importmap\r\n\r\nUse it to bring in a simple trim garbage at end logic, which is\r\nused to trim received response.\r\n\r\nAlso given that importmap assumes esm / standard js modules, so\r\nalso global variables arent implicitly available outside the\r\nmodules. So add it has a member of document for now\r\n\r\n* SimpleChat:DU: Add trim garbage at end in loop helper\r\n\r\n* SimpleChat:DU:TrimGarbage if unable try skip char and retry\r\n\r\n* SimpleChat:DU: Try trim using histogram based info\r\n\r\nTODO: May have to add max number of uniq chars in histogram at\r\nend of learning phase.\r\n\r\n* SimpleChat:DU: Switch trim garbage hist based to maxUniq simple\r\n\r\nInstead of blindly building histogram for specified substring\r\nlength, and then checking if any new char within specified min\r\ngarbage length limit, NOW exit learn state when specified maxUniq\r\nchars are found. Inturn there should be no new chars with in\r\nthe specified min garbage length required limit.\r\n\r\nTODO: Need to track char classes like alphabets, numerals and\r\nspecial/other chars.\r\n\r\n* SimpleChat:DU: Bring in maxType to the mix along with maxUniq\r\n\r\nAllow for more uniq chars, but then ensure that a given type of\r\nchar ie numerals or alphabets or other types dont cross the\r\nspecified maxType limit. This allows intermixed text garbage\r\nto be identified and trimmed.\r\n\r\n* SimpleChat:DU: Cleanup debug log messages\r\n\r\n* SimpleChat:UI: Move html ui base helpers into its own module\r\n\r\n* SimpleChat:DU:Avoid setting frequence/Presence penalty\r\n\r\nSome models like llama3 found to try to be over intelligent by\r\nrepeating garbage still, but by tweaking the garbage a bit so that\r\nit is not exactly same. So avoid setting these penalties and let\r\nthe model's default behaviour work out, as is.\r\n\r\nAlso the simple minded histogram based garbage trimming from end,\r\nworks to an extent, when the garbage is more predictable and\r\nrepeatative.\r\n\r\n* SimpleChat:UI: Add and use a para-create-append helper\r\n\r\nAlso update the config params dump to indicate that now one needs\r\nto use document to get hold of gMe global object, this is bcas of\r\nmoving to module type js.\r\n\r\nAlso add ui.mjs to importmap\r\n\r\n* SimpleChat:UI: Helper to create bool button and use it wrt settings\r\n\r\n* SimpleChat:UI: Add Select helper and use it wrt ChatHistoryInCtxt\r\n\r\n* SimpleChat:UI:Select: dict-name-value, value wrt default, change\r\n\r\nTake a dict/object of name-value pairs instead of just names.\r\nInturn specify the actual value wrt default, rather than the\r\nstring representing that value.\r\n\r\nTrap the needed change event rather than click wrt select.\r\n\r\n* SimpleChat:UI: Add Div wrapped label+element helpers\r\n\r\nMove settings related elements to use the new div wrapped ones.\r\n\r\n* SimpleChat:UI:Add settings button and bring in settings ui\r\n\r\n* SimpleChat:UI:Settings make boolean button text show meaning\r\n\r\n* SimpleChat: Update a bit wrt readme and notes in du\r\n\r\n* SimpleChat: GarbageTrim enable/disable, show trimmed part ifany\r\n\r\n* SimpleChat: highlight trim, garbage trimming bitmore aggressive\r\n\r\nMake it easy for end user to identified the trimmed text.\r\n\r\nMake garbage trimming logic, consider a longer repeat garbage\r\nsubstring.\r\n\r\n* SimpleChat: Cleanup a bit wrt Api end point related flow\r\n\r\nConsolidate many of the Api end point related basic meta data into\r\nApiEP class.\r\n\r\nRemove the hardcoded ApiEP/Mode settings from html+js, instead use\r\nthe generic select helper logic, inturn in the settings block.\r\n\r\nMove helper to generate the appropriate request json string based\r\non ApiEP into SimpleChat class itself.\r\n\r\n* SimpleChat:Move extracting assistant response to SimpleChat class\r\n\r\nso also the trimming of garbage.\r\n\r\n* SimpleChat:DU: Bring in both trim garbage logics to try trim\r\n\r\n* SimpleChat: Cleanup readme a bit, add one more chathistory length\r\n\r\n* SimpleChat:Stream:Initial handshake skeleton\r\n\r\nParse the got stream responses and try extract the data from it.\r\n\r\nIt allows for a part read to get a single data line or multiple\r\ndata line. Inturn extract the json body and inturn the delta\r\ncontent/message in it.\r\n\r\n* SimpleChat: Move handling oneshot mode server response\r\n\r\nMove handling of the oneshot mode server response into SimpleChat.\r\n\r\nAlso add plumbing for moving multipart server response into same.\r\n\r\n* SimpleChat: Move multi part server response handling in\r\n\r\n* SimpleChat: Add MultiPart Response handling, common trimming\r\n\r\nAdd logic to call into multipart/stream server response handling.\r\n\r\nMove trimming of garbage at the end into the common handle_response\r\nhelper.\r\n\r\nAdd new global flag to control between oneshot and multipart/stream\r\nmode of fetching response. Allow same to be controlled by user.\r\n\r\nIf in multipart/stream mode, send the stream flag to the server.\r\n\r\n* SimpleChat: show streamed generative text as it becomes available\r\n\r\nNow that the extracting of streamed generated text is implemented,\r\nadd logic to show the same on the screen.\r\n\r\n* SimpleChat:DU: Add NewLines helper class\r\n\r\nTo work with an array of new lines. Allow adding, appending,\r\nshifting, ...\r\n\r\n* SimpleChat:DU: Make NewLines shift more robust and flexible\r\n\r\n* SimpleChat:HandleResponseMultiPart using NewLines helper\r\n\r\nMake handle_response_multipart logic better and cleaner. Now it\r\nallows for working with the situation, where the delta data line\r\ngot from server in stream mode, could be split up when recving,\r\nbut still the logic will handle it appropriately.\r\n\r\nALERT: Rather except (for now) for last data line wrt a request's\r\nresponse.\r\n\r\n* SimpleChat: Disable console debug by default by making it dummy\r\n\r\nParallely save a reference to the original func.\r\n\r\n* SimpleChat:MultiPart/Stream flow cleanup\r\n\r\nDont try utf8-decode and newlines-add_append if no data to work on.\r\n\r\nIf there is no more data to get (ie done is set), then let NewLines\r\ninstance return line without newline at end, So that we dont miss\r\nout on any last-data-line without newline kind of scenario.\r\n\r\nPass stream flag wrt utf-8 decode, so that if any multi-byte char\r\nis only partly present in the passed buffer, it can be accounted\r\nfor along with subsequent buffer. At sametime, bcas of utf-8's\r\ncharacteristics there shouldnt be any unaccounted bytes at end,\r\nfor valid block of utf8 data split across chunks, so not bothering\r\ncalling with stream set to false at end. LATER: Look at TextDecoder's\r\nimplementation, for any over intelligence, it may be doing..\r\nIf needed, one can use done flag to account wrt both cases.\r\n\r\n* SimpleChat: Move baseUrl to Me and inturn gMe\r\n\r\nThis should allow easy updating of the base url at runtime by the\r\nend user.\r\n\r\n* SimpleChat:UI: Add input element helper\r\n\r\n* SimpleChat: Add support for changing the base url\r\n\r\nThis ensures that if the user is running the server with a\r\ndifferent port or wants to try connect to server on a different\r\nmachine, then this can be used.\r\n\r\n* SimpleChat: Move request headers into Me and gMe\r\n\r\nInturn allow Authorization to be sent, if not empty.\r\n\r\n* SimpleChat: Rather need to use append to insert headers\r\n\r\n* SimpleChat: Allow Authorization header to be set by end user\r\n\r\n* SimpleChat:UI+: Return div and element wrt creatediv helpers\r\n\r\nuse it to set placeholder wrt Authorization header.\r\n\r\nAlso fix copy-paste oversight.\r\n\r\n* SimpleChat: readme wrt authorization, maybe minimal openai testing\r\n\r\n* SimpleChat: model request field for openai/equivalent compat\r\n\r\nMay help testing with openai/equivalent web services, if they\r\nrequire this field.\r\n\r\n* SimpleChat: readme stream-utf-8 trim-english deps, exception2error\r\n\r\n* Readme: Add a entry for simplechat in the http server section\r\n\r\n* SimpleChat:WIP:Collate internally, Stream mode Trap exceptions\r\n\r\nThis can help ensure that data fetched till that point, can be\r\nmade use of, rather than losing it.\r\n\r\nOn some platforms, the time taken wrt generating a long response,\r\nmay lead to the network connection being broken when it enters\r\nsome user-no-interaction related power saving mode.\r\n\r\n* SimpleChat:theResp-origMsg: Undo a prev change to fix non trim\r\n\r\nWhen the response handling was moved into SimpleChat, I had changed\r\na flow bit unnecessarily and carelessly, which resulted in the non\r\ntrim flow, missing out on retaining the ai assistant response.\r\n\r\nThis has been fixed now.\r\n\r\n* SimpleChat: Save message internally in handle_response itself\r\n\r\nThis ensures that throwing the caught exception again for higher\r\nup logic, doesnt lose the response collated till that time.\r\n\r\nGo through theResp.assistant in catch block, just to keep simple\r\nconsistency wrt backtracing just in case.\r\n\r\nUpdate the readme file.\r\n\r\n* SimpleChat:Cleanup: Add spacing wrt shown req-options\r\n\r\n* SimpleChat:UI: CreateDiv Divs map to GridX2 class\r\n\r\nThis allows the settings ui to be cleaner structured.\r\n\r\n* SimpleChat: Show Non SettingsUI config field by default\r\n\r\n* SimpleChat: Allow for multiline system prompt\r\n\r\nConvert SystemPrompt into a textarea with 2 rows. Reduce\r\nuser-input-textarea to 2 rows from 3, so that overall\r\nvertical space usage remains same.\r\n\r\nShorten usage messages a bit, cleanup to sync with settings ui.\r\n\r\n* SimpleChat: Add basic skeleton for saving and loading chat\r\n\r\nInturn when ever a chat message (system/user/model) is added,\r\nthe chat will be saved into browser's localStorage.\r\n\r\n* SimpleChat:ODS: Add a prefix to chatid wrt ondiskstorage key\r\n\r\n* SimpleChat:ODS:WIP:TMP: Add UI to load previously saved chat\r\n\r\nThis is a temporary flow\r\n\r\n* SimpleChat:ODS:Move restore/load saved chat btn setup to Me\r\n\r\nThis also allows being able to set the common system prompt\r\nui element to loaded chat's system prompt.\r\n\r\n* SimpleChat:Readme updated wrt save and restore chat session info\r\n\r\n* SimpleChat:Show chat session restore button, only if saved session\r\n\r\n* SimpleChat: AutoCreate ChatRequestOptions settings to an extent\r\n\r\n* SimpleChat: Update main README wrt usage with server","shortMessageHtmlLink":"SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, S…"}},{"before":"9b596417af11c9ac44fcae0fcfbc6f3665089083","after":"750f60c03e4d3f53fa51910551ce87a3d508d2d7","ref":"refs/heads/master","pushedAt":"2024-06-01T13:47:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"JohannesGaessler","name":"Johannes Gäßler","path":"/JohannesGaessler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18492268?s=80&v=4"},"commit":{"message":"CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)","shortMessageHtmlLink":"CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)"}},{"before":"a323ec60af14a33d560df98f2cc41b4112cb4f80","after":"9b596417af11c9ac44fcae0fcfbc6f3665089083","ref":"refs/heads/master","pushedAt":"2024-06-01T06:44:14.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"JohannesGaessler","name":"Johannes Gäßler","path":"/JohannesGaessler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18492268?s=80&v=4"},"commit":{"message":"CUDA: quantized KV support for FA vec (#7527)\n\n* CUDA: quantized KV support for FA vec\r\n\r\n* try CI fix\r\n\r\n* fix commented-out kernel variants\r\n\r\n* add q8_0 q4_0 tests\r\n\r\n* fix nwarps > batch size\r\n\r\n* split fattn compile via extern templates\r\n\r\n* fix flake8\r\n\r\n* fix metal tests\r\n\r\n* fix cmake\r\n\r\n* make generate_cu_files.py executable\r\n\r\n* add autogenerated .cu files\r\n\r\n* fix AMD\r\n\r\n* error if type_v != FP16 and not flash_attn\r\n\r\n* remove obsolete code","shortMessageHtmlLink":"CUDA: quantized KV support for FA vec (#7527)"}},{"before":"0515ad93f48df63bbff204eddb0cac75e8585c65","after":"a323ec60af14a33d560df98f2cc41b4112cb4f80","ref":"refs/heads/master","pushedAt":"2024-05-31T19:23:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"server : update js (#7670)","shortMessageHtmlLink":"server : update js (#7670)"}},{"before":"036813c181f2beb24102a367891bb92f578e3df4","after":"f3256085f470b2069776ef337211f26fc8031c03","ref":"refs/heads/gg/gpt-params-refactor","pushedAt":"2024-05-31T15:54:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"common : rework usage print (wip)","shortMessageHtmlLink":"common : rework usage print (wip)"}},{"before":"c8047d538f3addab40e3112be60bb92e70ce1a50","after":"0515ad93f48df63bbff204eddb0cac75e8585c65","ref":"refs/heads/master","pushedAt":"2024-05-31T15:42:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"Galunid","name":null,"path":"/Galunid","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/10298730?s=80&v=4"},"commit":{"message":"convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660)","shortMessageHtmlLink":"convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660)"}},{"before":"a7060dffdd405f40a5a55a9363109269285c73f9","after":"5f8720fb7b019f8591fa805265052fef433dbd52","ref":"refs/heads/sl/rpc-backend-cpy","pushedAt":"2024-05-31T15:22:11.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"slaren","name":null,"path":"/slaren","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2141330?s=80&v=4"},"commit":{"message":"add rpc-server to Makefile","shortMessageHtmlLink":"add rpc-server to Makefile"}},{"before":null,"after":"a7060dffdd405f40a5a55a9363109269285c73f9","ref":"refs/heads/sl/rpc-backend-cpy","pushedAt":"2024-05-31T15:06:57.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"slaren","name":null,"path":"/slaren","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/2141330?s=80&v=4"},"commit":{"message":"- fix copy_tensor being called on the src buffer instead of the dst buffer\n\n- always initialize views in the view_src buffer\n\n- add RPC backend to Makefile build\n\n- add endpoint to all RPC object names","shortMessageHtmlLink":"- fix copy_tensor being called on the src buffer instead of the dst b…"}},{"before":null,"after":"036813c181f2beb24102a367891bb92f578e3df4","ref":"refs/heads/gg/gpt-params-refactor","pushedAt":"2024-05-31T14:27:51.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"common : gpt_params_parse do not print usage","shortMessageHtmlLink":"common : gpt_params_parse do not print usage"}},{"before":"30e238b246f8002cc6eb7cb79afe242243f1f66d","after":"c8047d538f3addab40e3112be60bb92e70ce1a50","ref":"refs/heads/master","pushedAt":"2024-05-31T14:26:21.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"JohannesGaessler","name":"Johannes Gäßler","path":"/JohannesGaessler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18492268?s=80&v=4"},"commit":{"message":"scripts: update compare_llama_bench.py [no ci] (#7673)","shortMessageHtmlLink":"scripts: update compare_llama_bench.py [no ci] (#7673)"}},{"before":"16926dff92d6d0efa8cbc0f44d30d63349532b38","after":"30e238b246f8002cc6eb7cb79afe242243f1f66d","ref":"refs/heads/master","pushedAt":"2024-05-31T14:00:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"JohannesGaessler","name":"Johannes Gäßler","path":"/JohannesGaessler","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18492268?s=80&v=4"},"commit":{"message":"Improve HIP compatibility (#7672)","shortMessageHtmlLink":"Improve HIP compatibility (#7672)"}},{"before":null,"after":"956af1552adf8e96c5bbf7e05d5cd628e523b07b","ref":"refs/heads/gg/server-update-js","pushedAt":"2024-05-31T12:47:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"server : update js","shortMessageHtmlLink":"server : update js"}},{"before":"0c27e6f62eea80140daf152d7b6c154466614e5c","after":"16926dff92d6d0efa8cbc0f44d30d63349532b38","ref":"refs/heads/master","pushedAt":"2024-05-31T12:04:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"readme : link homebrew discussion","shortMessageHtmlLink":"readme : link homebrew discussion"}},{"before":"2e32f874e675f7bc5307cb7b4470ddbe090bab8f","after":"0c27e6f62eea80140daf152d7b6c154466614e5c","ref":"refs/heads/master","pushedAt":"2024-05-31T11:17:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"ggml : fix loongson compile warnings (#7537)\n\n* ggml : fix loongson compile warnings\r\n\r\nggml-ci\r\n\r\n* Fix loongarch quantize test fail.\r\n\r\nFix unexpected error introduced during rebase code.\r\n\r\n* tests : disable json test due to lack of python on the CI node\r\n\r\nggml-ci\r\n\r\n---------\r\n\r\nCo-authored-by: junchao-loongson ","shortMessageHtmlLink":"ggml : fix loongson compile warnings (#7537)"}},{"before":"98f4c12dd8ab8f5c5193cc0fa6c0a55b198a82c5","after":"77c16ee0d4ab8ec96c02005155c6a9e98280f0a8","ref":"refs/heads/gg/ci-loongson","pushedAt":"2024-05-31T11:17:03.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"tests : disable json test due to lack of python on the CI node\n\nggml-ci","shortMessageHtmlLink":"tests : disable json test due to lack of python on the CI node"}},{"before":"50fb3d347f88ebf42ff810c60b6749bcd20eb3a8","after":"98f4c12dd8ab8f5c5193cc0fa6c0a55b198a82c5","ref":"refs/heads/gg/ci-loongson","pushedAt":"2024-05-31T11:04:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ggerganov","name":"Georgi Gerganov","path":"/ggerganov","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1991296?s=80&v=4"},"commit":{"message":"tests : disable json test due to lack of python on the CI node\n\nggml-ci","shortMessageHtmlLink":"tests : disable json test due to lack of python on the CI node"}},{"before":null,"after":"d32a8f61421f2f27e0c1f5b12215ecb47e1fbddd","ref":"refs/heads/sycl-global-variables","pushedAt":"2024-05-31T08:53:12.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"airMeng","name":"Meng, Hengyu","path":"/airMeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/39229107?s=80&v=4"},"commit":{"message":"backup","shortMessageHtmlLink":"backup"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEWhE7ogA","startCursor":null,"endCursor":null}},"title":"Activity · ggerganov/llama.cpp"}