初试Ollama本地大模型

准备工作

机器配置：

CPU i5-10400

内存 16GB

硬盘 SSD 480GB

显卡 GTX 1660
系统：Ubuntu 18.04 Server

CPU	i5-10400
内存	16GB
硬盘	SSD 480GB
显卡	GTX 1660

NVIDIA驱动安装

- 下载

驱动下载地址：https://www.nvidia.cn/geforce/drivers/

在这里插入图片描述

- 获取下载链接

GTX 1660驱动下载链接：https://cn.download.nvidia.cn/XFree86/Linux-x86_64/550.100/NVIDIA-Linux-x86_64-550.100.run

- 安装驱动所需依赖

1	sudo apt install gcc g++ make

- 安装驱动

1 2	chmod +x NVIDIA-Linux-x86_64-550.100.run ./NVIDIA-Linux-x86_64-550.100.run

- 验证驱动

在这里插入图片描述

安装 Ollama

- 安装方式：

1.通过官网bash脚本安装 curl -fsSL https://ollama.com/install.sh | sh
2.通过github仓库步骤手动安装仓库地址

在这里插入图片描述

- 修改ollama.service环境变量

由于ollama.service默认启动地址是127.0.0.1，仅限本机访问，所以要改成0.0.0.0,使其能被其他主机连接，后边docker部署open-webui也会用到这个地址。

chain@ubuntu:~$ cat /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/data/envs/go/bin"
Environment="OLLAMA_HOST=0.0.0.0"  # 增加这一行

[Install]
WantedBy=default.target

- 重载并重启服务

1 2	systemctl daemon-reload systemctl restart ollama.service

- 验证是否正常运行

1 2	chain@ubuntu:~$ curl localhost:11434 Ollama is running

- ollama常用命令

ollama list：显示模型列表。
ollama show：显示模型的信息
ollama pull：拉取模型
ollama push：推送模型
ollama cp：  拷贝一个模型
ollama rm：  删除一个模型
ollama run： 运行一个模型

大模型库下载

下载地址：https://ollama.com/library 模型版本数值B数越低，最低要求的资源就少，B数越高，最低要求的资源就多，可根据自己主机配置，酌情选择。

- 例子：llama2模型使用

模型地址文档：https://ollama.com/library/llama2，由文档中可知模型有三个版本

7b通常最低需要8GB内存
13b通常最低需要16GB内存
70b通常最低需要64GB内存

在这里插入图片描述

我的主机配置是16GB，我选择7b这个版本

- 开始下载

# 直接运行下载命令默认为7 ，模型较大3.8GB，耐心等待下载完毕即可
ollama run llama2	

# 下载完毕查看列表
chain@ubuntu:~$ ollama list
NAME            ID              SIZE    MODIFIED     
llama2:latest   78e26419b446    3.8 GB  4 hours ago

- 开始运行

问一个AI模型中最近比较热的问题，看起来答的还可以，只不过中文回答支持不太好。

1
2
3

chain@ubuntu:~$ ollama run llama2
>>> 9.9和9.11哪个大
 Both 9.9 and 9.11 are less than 10. Therefore, the smallest of the two is 9.11.

- 资源使用情况

chain@ubuntu:~$ nvidia-smi
Tue Jul 23 06:54:50 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.100                Driver Version: 550.100        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660        Off |   00000000:01:00.0 Off |                  N/A |
| 46%   38C    P0             22W /  120W |    5071MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     23520      C   ...unners/cuda_v11/ollama_llama_server       5068MiB |
+-----------------------------------------------------------------------------------------+
chain@ubuntu:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        1.7G        553M         27M         13G         13G
Swap:            0B          0B          0B
chain@ubuntu:~$ w
 06:58:22 up 1 day, 18 min,  2 users,  load average: 0.03, 0.02, 0.00
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
chain    pts/0    192.168.1.110    Mon09    3:41   0.92s  0.70s ollama run llama2

由此看到显卡资源：5G/6G，内存：1.7G/15G ,CPU: 0.02。由于我们装了显卡驱动，优先计算使用资源的是显卡。如果无显卡，则使用CPU和内存资源。

open-webui部署

官方仓库地址：https://github.com/open-webui/open-webui

docker部署

docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui --restart always \
ghcr.io/open-webui/open-webui:main

–add-host：将host.docker.internal=host-gateway添加到容器内/etc/hosts中，即容器内的hosts文件会增加一条记录172.17.0.1 host.docker.internal。host.docker.internal=host-gateway是固定用法。
-v open-webui:/app/backend/data : 将宿主机的open-webui目录映射到容器/app/backend/data目录.

【宿主机的open-webui目录存放位置：/var/lib/docker/volumes/open-webui，这里我尝试过改到其他目录，即 -v /data/open-webui:/app/backend/data ,但启动后查看日志会报OSError: We couldn’t connect to ‘https://huggingface.co' to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.此问题暂未解决。】
其他参数为docker常用参数，不再细嗦。

web访问

地址：http:// ip 或者 localhost: 3000，首次访问需要离线注册，没有验证码。

界面展示：

在这里插入图片描述