10 分钟快速上手 NVIDIA DALI!
从 MinIO 对象存储读取图像数据,构建高性能数据流水线用于深度学习训练。
# 检查环境
python -c "import nvidia.dali as dali; print(f'DALI: {dali.__version__}')"
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
nvidia-smi
# 运行基础示例
python basic/01_hello_dali.py # 15 min - 理解 Pipeline 概念
python basic/02_basic_image_processing.py # 15 min - 图像处理操作
关键概念:
python basic/03_augmentation.py # 30 min - 数据增强技术
关键概念:
python basic/04_pytorch_integration.py # 30 min - 与 PyTorch 集成
关键概念:
python basic/05_external_source.py # 15 min - 自定义数据源
python basic/06_to_08_advanced_features.py # 15 min - 并行处理、多 GPU
关键概念:
# 使用 Docker 启动 MinIO
docker run -d -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"
# 访问 Web UI
# http://localhost:9001
# 用户名: minioadmin
# 密码: minioadmin
# 安装 MinIO 客户端
pip install minio
# 运行 MinIO 基础示例
python basic/09_minio_basic.py # 20 min - MinIO 基础
关键概念:
python basic/10_minio_production_pipeline.py # 30 min - 生产级实现
关键概念:
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
@pipeline_def
def simple_pipeline(data_dir):
images, labels = fn.readers.file(file_root=data_dir)
images = fn.decoders.image(images, device="mixed")
images = fn.resize(images, size=224)
return images, labels
pipe = simple_pipeline(data_dir="/path/to/data", batch_size=32, num_threads=4, device_id=0)
pipe.build()
outputs = pipe.run()
@pipeline_def
def training_pipeline(data_dir):
images, labels = fn.readers.file(file_root=data_dir, random_shuffle=True)
images = fn.decoders.image(images, device="mixed")
# 数据增强
images = fn.random_resized_crop(images, size=224, random_area=[0.08, 1.0])
images = fn.flip(images, horizontal=fn.random.coin_flip(probability=0.5))
images = fn.brightness_contrast(images,
brightness=fn.random.uniform(range=[0.8, 1.2]),
contrast=fn.random.uniform(range=[0.8, 1.2])
)
# 归一化
images = fn.cast(images, dtype=types.FLOAT) / 255.0
images = fn.normalize(images,
mean=[0.485, 0.456, 0.406],
stddev=[0.229, 0.224, 0.225],
axes=(2,)
)
images = fn.transpose(images, perm=[2, 0, 1]) # CHW
return images, labels
from nvidia.dali.plugin.pytorch import DALIGenericIterator
pipe = training_pipeline(data_dir="/data", batch_size=64, num_threads=8, device_id=0)
pipe.build()
train_loader = DALIGenericIterator(
pipelines=[pipe],
output_map=["images", "labels"],
size=num_samples,
auto_reset=True
)
for batch in train_loader:
data = batch[0]
images = data["images"] # PyTorch Tensor on GPU
labels = data["labels"]
# 训练代码...
from minio import Minio
# 连接 MinIO
client = Minio("localhost:9000", access_key="minioadmin", secret_key="minioadmin", secure=False)
# 自定义数据源
class MinIOSource:
def __init__(self, client, bucket, objects):
self.client = client
self.bucket = bucket
self.objects = objects
def __call__(self, sample_info):
idx = sample_info.idx_in_epoch
obj = self.objects[idx]
# 从 MinIO 读取
response = self.client.get_object(self.bucket, obj)
data = response.read()
# 解码图像
img = Image.open(io.BytesIO(data))
return np.array(img), label
@pipeline_def
def minio_pipeline(source):
images, labels = fn.external_source(source=source, num_outputs=2, batch=False)
# 处理...
return images, labels
Q: DALI 比 PyTorch DataLoader 快多少? A: 通常 2-5x,取决于数据增强复杂度和硬件配置。
Q: 何时使用 device=”cpu” vs “mixed”? A: “mixed” 使用 GPU 硬件加速解码,通常更快。CPU 解码适合特殊格式或调试。
Q: 如何调试 Pipeline? A: 使用小 batch_size,检查输出形状,使用 fn.dump_image 保存中间结果。
Q: 内存不足怎么办? A: 减少 batch_size、prefetch_queue_depth,或使用更小的图像尺寸。
Q: MinIO 性能优化? A: 使用缓存、增加线程数、启用预取、考虑数据本地性。
完成基础教程后:
遇到问题?
祝学习愉快!🚀