• @raldone01@lemmy.world
    link
    fedilink
    English
    2
    edit-2
    4 months ago

    I regularly run llama3 70b unqantized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.

    • sunzu
      link
      fedilink
      14 months ago

      so there is no way a 24gb and 64gb can run thing?

      • @raldone01@lemmy.world
        link
        fedilink
        English
        2
        edit-2
        4 months ago

        My specs because you asked:

        CPU: Intel(R) Xeon(R) E5-2699 v3 (72) @ 3.60 GHz
        GPU 1: NVIDIA Tesla P40 [Discrete]
        GPU 2: NVIDIA Tesla P40 [Discrete]
        GPU 3: Matrox Electronics Systems Ltd. MGA G200EH
        Memory: 66.75 GiB / 251.75 GiB (27%)
        Swap: 75.50 MiB / 40.00 GiB (0%)
        
        • sunzu
          link
          fedilink
          14 months ago

          ok this is a server. 48gb cards and 67gb ram? for model alone?

          • @raldone01@lemmy.world
            link
            fedilink
            English
            24 months ago

            Each card has 24GB so 48GB vram total. I use ollama it fills whatever vrams is available on both cards and runs the rest on the CPU cores.

      • @raldone01@lemmy.world
        link
        fedilink
        English
        14 months ago

        What are you asking exactly?

        What do you want to run? I assume you have a 24GB GPU and 64GB host RAM?