I’d like to self host a large language model, LLM.

I don’t mind if I need a GPU and all that, at least it will be running on my own hardware, and probably even cheaper than the $20 everyone is charging per month.

What LLMs are you self hosting? And what are you using to do it?

  • Avid Amoeba
    link
    fedilink
    English
    12 months ago

    If you need to serve only one user at the time, ollama +Webui works great. If you need multiple users at the same time, check out vLLM.

    Why can’t it serve multiple users? Open Web UI seems to support multiple users.

    • The Hobbyist
      link
      fedilink
      English
      32 months ago

      I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.