TensorFlow Model Deployment

TensorFlow模型部署

Deploy TensorFlow models to production environments using SavedModel format, TensorFlow Lite for mobile and edge devices, quantization techniques, and serving infrastructure. This skill covers model export, optimization, conversion, and deployment strategies.

使用SavedModel格式、面向移动和边缘设备的TensorFlow Lite、量化技术以及托管基础设施，将TensorFlow模型部署到生产环境。本技能涵盖模型导出、优化、转换和部署策略。

SavedModel Export

SavedModel导出

Basic SavedModel Export

基础SavedModel导出

python

undefined

python

undefined

Save model to TensorFlow SavedModel format

model.save('path/to/saved_model')

Load SavedModel

loaded_model = tf.keras.models.load_model('path/to/saved_model')

Make predictions with loaded model

predictions = loaded_model.predict(test_data)

undefined

predictions = loaded_model.predict(test_data)

undefined

Create Serving Model

创建可托管模型

python

undefined

python

undefined

Create serving model from classifier

serving_model = classifier.create_serving_model()

Inspect model inputs and outputs

print(f'Model's input shape and type: {serving_model.inputs}') print(f'Model's output shape and type: {serving_model.outputs}')

Save serving model

serving_model.save('model_path')

undefined

serving_model.save('model_path')

undefined

Export with Signatures

带签名导出

python

undefined

python

undefined

Define serving signature

@tf.function(input_signature=[tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.float32)]) def serve(images): return model(images, training=False)

Save with signature

tf.saved_model.save( model, 'saved_model_dir', signatures={'serving_default': serve} )

undefined

tf.saved_model.save( model, 'saved_model_dir', signatures={'serving_default': serve} )

undefined

TensorFlow Lite Conversion

TensorFlow Lite转换

Basic TFLite Conversion

基础TFLite转换

python

undefined

python

undefined

Convert SavedModel to TFLite

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir') tflite_model = converter.convert()

Save TFLite model

with open('model.tflite', 'wb') as f: f.write(tflite_model)

undefined

with open('model.tflite', 'wb') as f: f.write(tflite_model)

undefined

From Keras Model

从Keras模型转换

python

undefined

python

undefined

Convert Keras model directly to TFLite

converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert()

Save to file

import pathlib tflite_models_dir = pathlib.Path("tflite_models/") tflite_models_dir.mkdir(exist_ok=True, parents=True)

tflite_model_file = tflite_models_dir / "mnist_model.tflite" tflite_model_file.write_bytes(tflite_model)

undefined

import pathlib tflite_models_dir = pathlib.Path("tflite_models/") tflite_models_dir.mkdir(exist_ok=True, parents=True)

tflite_model_file = tflite_models_dir / "mnist_model.tflite" tflite_model_file.write_bytes(tflite_model)

undefined

From Concrete Functions

从具体函数转换

python

undefined

python

undefined

Convert from concrete function

concrete_function = model.signatures['serving_default']

converter = tf.lite.TFLiteConverter.from_concrete_functions( [concrete_function] ) tflite_model = converter.convert()

undefined

concrete_function = model.signatures['serving_default']

converter = tf.lite.TFLiteConverter.from_concrete_functions( [concrete_function] ) tflite_model = converter.convert()

undefined

Export with Model Maker

使用Model Maker导出

python

undefined

python

undefined

Export trained model to TFLite with metadata

model.export( export_dir='output/', tflite_filename='model.tflite', label_filename='labels.txt', vocab_filename='vocab.txt' )

Export multiple formats

model.export( export_dir='output/', export_format=[ mm.ExportFormat.TFLITE, mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL ] )

undefined

model.export( export_dir='output/', export_format=[ mm.ExportFormat.TFLITE, mm.ExportFormat.SAVED_MODEL, mm.ExportFormat.LABEL ] )

undefined

Model Quantization

模型量化

Post-Training Float16 Quantization

训练后Float16量化

python

from tflite_model_maker.config import QuantizationConfig

python

from tflite_model_maker.config import QuantizationConfig

Create float16 quantization config

config = QuantizationConfig.for_float16()

Export with quantization

model.export( export_dir='.', tflite_filename='model_fp16.tflite', quantization_config=config )

undefined

model.export( export_dir='.', tflite_filename='model_fp16.tflite', quantization_config=config )

undefined

Dynamic Range Quantization

动态范围量化

python

undefined

python

undefined

Convert with dynamic range quantization

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir') converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert()

Save quantized model

with open('model_quantized.tflite', 'wb') as f: f.write(tflite_model)

undefined

with open('model_quantized.tflite', 'wb') as f: f.write(tflite_model)

undefined

Full Integer Quantization

全整数量化

python

def representative_dataset():
    """Generate representative dataset for calibration."""
    for i in range(100):
        yield [np.random.rand(1, 224, 224, 3).astype(np.float32)]

python

def representative_dataset():
    """Generate representative dataset for calibration."""
    for i in range(100):
        yield [np.random.rand(1, 224, 224, 3).astype(np.float32)]

Convert with full integer quantization

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir') converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8

tflite_model = converter.convert()

undefined

converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir') converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8

tflite_model = converter.convert()

undefined

Debug Quantization

量化调试

python

from tensorflow.lite.python import convert

python

from tensorflow.lite.python import convert

Create debug model with numeric verification

converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = calibration_gen converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

Calibrate and quantize with verification

converter._experimental_calibrate_only = True calibrated = converter.convert() debug_model = convert.mlir_quantize(calibrated, enable_numeric_verify=True)

undefined

converter._experimental_calibrate_only = True calibrated = converter.convert() debug_model = convert.mlir_quantize(calibrated, enable_numeric_verify=True)

undefined

Get Quantization Converter

获取量化转换器

python

undefined

python

undefined

Apply quantization settings to converter

def get_converter_with_quantization(converter, **kwargs): """Apply quantization configuration to converter.""" config = QuantizationConfig(**kwargs) return config.get_converter_with_quantization(converter)

Use with custom settings

converter = tf.lite.TFLiteConverter.from_keras_model(model) quantized_converter = get_converter_with_quantization( converter, optimizations=[tf.lite.Optimize.DEFAULT], representative_dataset=representative_dataset ) tflite_model = quantized_converter.convert()

undefined

converter = tf.lite.TFLiteConverter.from_keras_model(model) quantized_converter = get_converter_with_quantization( converter, optimizations=[tf.lite.Optimize.DEFAULT], representative_dataset=representative_dataset ) tflite_model = quantized_converter.convert()

undefined

JAX to TFLite Conversion

JAX转TFLite转换

Basic JAX Conversion

基础JAX转换

python

from orbax.export import ExportManager
from orbax.export import JaxModule
from orbax.export import ServingConfig
import tensorflow as tf
import jax.numpy as jnp

def model_fn(_, x):
    return jnp.sin(jnp.cos(x))

jax_module = JaxModule({}, model_fn, input_polymorphic_shape='b, ...')

python

from orbax.export import ExportManager
from orbax.export import JaxModule
from orbax.export import ServingConfig
import tensorflow as tf
import jax.numpy as jnp

def model_fn(_, x):
    return jnp.sin(jnp.cos(x))

jax_module = JaxModule({}, model_fn, input_polymorphic_shape='b, ...')

Option 1: Direct SavedModel conversion

tf.saved_model.save( jax_module, '/some/directory', signatures=jax_module.methods[JaxModule.DEFAULT_METHOD_KEY].get_concrete_function( tf.TensorSpec(shape=(None,), dtype=tf.float32, name="input") ), options=tf.saved_model.SaveOptions(experimental_custom_gradients=True), ) converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory') tflite_model = converter.convert()

undefined

tf.saved_model.save( jax_module, '/some/directory', signatures=jax_module.methods[JaxModule.DEFAULT_METHOD_KEY].get_concrete_function( tf.TensorSpec(shape=(None,), dtype=tf.float32, name="input") ), options=tf.saved_model.SaveOptions(experimental_custom_gradients=True), ) converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory') tflite_model = converter.convert()

undefined

JAX with Pre/Post Processing

带前后处理的JAX转换

python

undefined

python

undefined

Option 2: With preprocessing and postprocessing

serving_config = ServingConfig( 'Serving_default', input_signature=[tf.TensorSpec(shape=(None,), dtype=tf.float32, name='input')], tf_preprocessor=lambda x: x, tf_postprocessor=lambda out: {'output': out} ) export_mgr = ExportManager(jax_module, [serving_config]) export_mgr.save('/some/directory')

converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory') tflite_model = converter.convert()

undefined

serving_config = ServingConfig( 'Serving_default', input_signature=[tf.TensorSpec(shape=(None,), dtype=tf.float32, name='input')], tf_preprocessor=lambda x: x, tf_postprocessor=lambda out: {'output': out} ) export_mgr = ExportManager(jax_module, [serving_config]) export_mgr.save('/some/directory')

converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory') tflite_model = converter.convert()

undefined

JAX ResNet50 Example

JAX ResNet50示例

python

from orbax.export import ExportManager, JaxModule, ServingConfig

python

from orbax.export import ExportManager, JaxModule, ServingConfig

Wrap the model params and function into a JaxModule

jax_module = JaxModule({}, jax_model.apply, trainable=False)

Specify the serving configuration and export the model

serving_config = ServingConfig( "serving_default", input_signature=[tf.TensorSpec([480, 640, 3], tf.float32, name="inputs")], tf_preprocessor=resnet_image_processor, tf_postprocessor=lambda x: tf.argmax(x, axis=-1), )

export_manager = ExportManager(jax_module, [serving_config])

saved_model_dir = "resnet50_saved_model" export_manager.save(saved_model_dir)

serving_config = ServingConfig( "serving_default", input_signature=[tf.TensorSpec([480, 640, 3], tf.float32, name="inputs")], tf_preprocessor=resnet_image_processor, tf_postprocessor=lambda x: tf.argmax(x, axis=-1), )

export_manager = ExportManager(jax_module, [serving_config])

saved_model_dir = "resnet50_saved_model" export_manager.save(saved_model_dir)

Convert to TFLite

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) tflite_model = converter.convert()

undefined

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) tflite_model = converter.convert()

undefined

Model Optimization

模型优化

Graph Transformation

图转换

bash

undefined

bash

undefined

Build graph transformation tool

bazel build tensorflow/tools/graph_transforms:transform_graph

Optimize for deployment

bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

undefined

bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

undefined

Fix Mobile Kernel Errors

修复移动端内核错误

bash

undefined

bash

undefined

Optimize for mobile deployment

bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

undefined

bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'

undefined

EfficientDet Deployment

EfficientDet部署

Export SavedModel

导出SavedModel

python

def export_saved_model(
    model: tf.keras.Model,
    saved_model_dir: str,
    batch_size: Optional[int] = None,
    pre_mode: Optional[str] = 'infer',
    post_mode: Optional[str] = 'global'
) -> None:
    """Export EfficientDet model to SavedModel format.

    Args:
        model: The EfficientDetNet model used for training
        saved_model_dir: Folder path for saved model
        batch_size: Batch size to be saved in saved_model
        pre_mode: Pre-processing mode ('infer' or None)
        post_mode: Post-processing mode ('global', 'per_class', 'tflite', or None)
    """
    # Implementation exports model with specified configuration
    tf.saved_model.save(model, saved_model_dir)

python

def export_saved_model(
    model: tf.keras.Model,
    saved_model_dir: str,
    batch_size: Optional[int] = None,
    pre_mode: Optional[str] = 'infer',
    post_mode: Optional[str] = 'global'
) -> None:
    """Export EfficientDet model to SavedModel format.

    Args:
        model: The EfficientDetNet model used for training
        saved_model_dir: Folder path for saved model
        batch_size: Batch size to be saved in saved_model
        pre_mode: Pre-processing mode ('infer' or None)
        post_mode: Post-processing mode ('global', 'per_class', 'tflite', or None)
    """
    # Implementation exports model with specified configuration
    tf.saved_model.save(model, saved_model_dir)

Complete Export Pipeline

完整导出流程

python

undefined

python

undefined

Export model with all formats

export_saved_model( model=my_keras_model, saved_model_dir="./saved_model_export", batch_size=1, pre_mode='infer', post_mode='global' )

Convert to TFLite

converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model_export') tflite_model = converter.convert()

Save TFLite model

with open('efficientdet.tflite', 'wb') as f: f.write(tflite_model)

undefined

with open('efficientdet.tflite', 'wb') as f: f.write(tflite_model)

undefined

Mobile Deployment

移动端部署

Deploy to Android

部署到Android

bash

undefined

bash

undefined

Push TFLite model to Android device

adb push mobilenet_quant_v1_224.tflite /data/local/tmp

Run benchmark on device

adb shell /data/local/tmp/benchmark_model
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4

undefined

adb shell /data/local/tmp/benchmark_model
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4

undefined

TFLite Interpreter Usage

TFLite解释器使用

python

undefined

python

undefined

Load TFLite model and allocate tensors

interpreter = tf.lite.Interpreter(model_path='model.tflite') interpreter.allocate_tensors()

Get input and output details

input_details = interpreter.get_input_details() output_details = interpreter.get_output_details()

Prepare input data

input_shape = input_details[0]['shape'] input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)

Run inference

interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke()

Get predictions

output_data = interpreter.get_tensor(output_details[0]['index']) print(output_data)

undefined

output_data = interpreter.get_tensor(output_details[0]['index']) print(output_data)

undefined

Distributed Training and Serving

分布式训练与托管

MirroredStrategy for Multi-GPU

多GPU的MirroredStrategy

python

undefined

python

undefined

Create the strategy instance. It will automatically detect all the GPUs.

mirrored_strategy = tf.distribute.MirroredStrategy()

Create and compile the keras model under strategy.scope()

with mirrored_strategy.scope(): model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))]) model.compile(loss='mse', optimizer='sgd')

Call model.fit and model.evaluate as before.

dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10) model.fit(dataset, epochs=2) model.evaluate(dataset)

Save distributed model

model.save('distributed_model')

undefined

model.save('distributed_model')

undefined

TPU Variable Optimization

TPU变量优化

python

undefined

python

undefined

Optimized TPU variable reformatting in MLIR

Before optimization:

var0 = ... var1 = ... tf.while_loop(..., var0, var1) { tf_device.replicate([var0, var1] as rvar) { compile = tf._TPUCompileMlir() tf.TPUExecuteAndUpdateVariablesOp(rvar, compile) } }

After optimization with state variables:

var0 = ... var1 = ... state_var0 = ... state_var1 = ... tf.while_loop(..., var0, var1, state_var0, state_var1) { tf_device.replicate( [var0, var1] as rvar, [state_var0, state_var1] as rstate ) { compile = tf._TPUCompileMlir() tf.TPUReshardVariablesOp(rvar, compile, rstate) tf.TPUExecuteAndUpdateVariablesOp(rvar, compile) } }

undefined

var0 = ... var1 = ... state_var0 = ... state_var1 = ... tf.while_loop(..., var0, var1, state_var0, state_var1) { tf_device.replicate( [var0, var1] as rvar, [state_var0, state_var1] as rstate ) { compile = tf._TPUCompileMlir() tf.TPUReshardVariablesOp(rvar, compile, rstate) tf.TPUExecuteAndUpdateVariablesOp(rvar, compile) } }

undefined

Model Serving with TensorFlow Serving

使用TensorFlow Serving进行模型托管

Export for TensorFlow Serving

为TensorFlow Serving导出模型

python

undefined

python

undefined

Export model with version number

export_path = os.path.join('serving_models', 'my_model', '1') tf.saved_model.save(model, export_path)

Export multiple versions

for version in [1, 2, 3]: export_path = os.path.join('serving_models', 'my_model', str(version)) tf.saved_model.save(model, export_path)

undefined

for version in [1, 2, 3]: export_path = os.path.join('serving_models', 'my_model', str(version)) tf.saved_model.save(model, export_path)

undefined

Docker Deployment

Docker部署

bash

undefined

bash

undefined

Pull TensorFlow Serving image

docker pull tensorflow/serving

Run TensorFlow Serving container

docker run -p 8501:8501
--mount type=bind,source=/path/to/my_model,target=/models/my_model
-e MODEL_NAME=my_model
-t tensorflow/serving

Test REST API

curl -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
-X POST http://localhost:8501/v1/models/my_model:predict

undefined

curl -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
-X POST http://localhost:8501/v1/models/my_model:predict

undefined

Model Validation and Testing

模型验证与测试

Validate TFLite Model

验证TFLite模型

python

undefined

python

undefined

Compare TFLite predictions with original model

def validate_tflite_model(model, tflite_model_path, test_data): """Validate TFLite model against original.""" # Original model predictions original_predictions = model.predict(test_data)

# TFLite model predictions
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

tflite_predictions = []
for sample in test_data:
    interpreter.set_tensor(input_details[0]['index'], sample[np.newaxis, ...])
    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])
    tflite_predictions.append(output[0])

tflite_predictions = np.array(tflite_predictions)

# Compare predictions
difference = np.abs(original_predictions - tflite_predictions)
print(f"Mean absolute difference: {np.mean(difference):.6f}")
print(f"Max absolute difference: {np.max(difference):.6f}")

undefined

def validate_tflite_model(model, tflite_model_path, test_data): """Validate TFLite model against original.""" # Original model predictions original_predictions = model.predict(test_data)

# TFLite model predictions
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

tflite_predictions = []
for sample in test_data:
    interpreter.set_tensor(input_details[0]['index'], sample[np.newaxis, ...])
    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])
    tflite_predictions.append(output[0])

tflite_predictions = np.array(tflite_predictions)

# Compare predictions
difference = np.abs(original_predictions - tflite_predictions)
print(f"Mean absolute difference: {np.mean(difference):.6f}")
print(f"Max absolute difference: {np.max(difference):.6f}")

undefined

Model Size Comparison

模型大小对比

python

import os

def compare_model_sizes(saved_model_path, tflite_model_path):
    """Compare sizes of SavedModel and TFLite."""
    # SavedModel size (sum of all files)
    saved_model_size = sum(
        os.path.getsize(os.path.join(dirpath, filename))
        for dirpath, _, filenames in os.walk(saved_model_path)
        for filename in filenames
    )

    # TFLite model size
    tflite_size = os.path.getsize(tflite_model_path)

    print(f"SavedModel size: {saved_model_size / 1e6:.2f} MB")
    print(f"TFLite model size: {tflite_size / 1e6:.2f} MB")
    print(f"Size reduction: {(1 - tflite_size / saved_model_size) * 100:.1f}%")

python

import os

def compare_model_sizes(saved_model_path, tflite_model_path):
    """Compare sizes of SavedModel and TFLite."""
    # SavedModel size (sum of all files)
    saved_model_size = sum(
        os.path.getsize(os.path.join(dirpath, filename))
        for dirpath, _, filenames in os.walk(saved_model_path)
        for filename in filenames
    )

    # TFLite model size
    tflite_size = os.path.getsize(tflite_model_path)

    print(f"SavedModel size: {saved_model_size / 1e6:.2f} MB")
    print(f"TFLite model size: {tflite_size / 1e6:.2f} MB")
    print(f"Size reduction: {(1 - tflite_size / saved_model_size) * 100:.1f}%")

When to Use This Skill

何时使用本技能

Use the tensorflow-model-deployment skill when you need to:

Export trained models for production serving
Deploy models to mobile devices (iOS, Android)
Optimize models for edge devices and IoT
Convert models to TensorFlow Lite format
Apply post-training quantization for model compression
Set up TensorFlow Serving infrastructure
Deploy models with Docker containers
Create model serving APIs with REST or gRPC
Optimize inference latency and throughput
Reduce model size for bandwidth-constrained environments
Convert JAX or PyTorch models to TensorFlow format
Implement A/B testing with multiple model versions
Deploy models to cloud platforms (GCP, AWS, Azure)
Create on-device ML applications
Optimize models for specific hardware accelerators

当你需要以下操作时，使用tensorflow-model-deployment技能：

导出训练好的模型用于生产环境托管
将模型部署到移动设备（iOS、Android）
为边缘设备和IoT优化模型
将模型转换为TensorFlow Lite格式
应用训练后量化进行模型压缩
搭建TensorFlow Serving基础设施
使用Docker容器部署模型
创建基于REST或gRPC的模型托管API
优化推理延迟和吞吐量
为带宽受限环境减小模型体积
将JAX或PyTorch模型转换为TensorFlow格式
使用多模型版本实现A/B测试
将模型部署到云平台（GCP、AWS、Azure）
创建端侧机器学习应用
为特定硬件加速器优化模型

Best Practices

最佳实践

Always validate converted models - Compare TFLite predictions with original model to ensure accuracy
Use SavedModel format - Standard format for production deployment and serving
Apply appropriate quantization - Float16 for balanced speed/accuracy, INT8 for maximum compression
Include preprocessing in model - Embed preprocessing in SavedModel for consistent inference
Version your models - Use version numbers in export paths for model management
Test on target devices - Validate performance on actual deployment hardware
Monitor model size - Track model size before and after optimization
Use representative datasets - Provide calibration data for accurate quantization
Enable GPU delegation - Use GPU/TPU acceleration on supported devices
Optimize batch sizes - Tune batch size for throughput vs latency tradeoffs
Cache frequently used models - Load models once and reuse for multiple predictions
Use TensorFlow Serving - Leverage built-in serving infrastructure for scalability
Implement model warmup - Run dummy predictions to initialize serving systems
Monitor inference metrics - Track latency, throughput, and error rates in production
Use metadata in TFLite - Include labels and preprocessing info in model metadata

始终验证转换后的模型 - 对比TFLite与原模型的预测结果，确保精度
使用SavedModel格式 - 生产部署与托管的标准格式
应用合适的量化方式 - Float16兼顾速度与精度，INT8实现最大压缩
将预处理逻辑嵌入模型 - 在SavedModel中集成预处理，确保推理一致性
为模型添加版本号 - 在导出路径中使用版本号，便于模型管理
在目标设备上测试 - 在实际部署硬件上验证性能
监控模型体积 - 跟踪优化前后的模型大小
使用代表性数据集 - 提供校准数据以实现精准量化
启用GPU委托 - 在支持设备上使用GPU/TPU加速
优化批次大小 - 调整批次大小以平衡吞吐量与延迟
缓存常用模型 - 加载一次模型并重复用于多次预测
使用TensorFlow Serving - 利用内置托管基础设施实现可扩展性
实现模型预热 - 运行虚拟预测以初始化托管系统
监控推理指标 - 在生产环境中跟踪延迟、吞吐量和错误率
在TFLite中添加元数据 - 在模型元数据中包含标签和预处理信息

Common Pitfalls

常见陷阱

Not validating converted models - TFLite conversion can introduce accuracy degradation
Over-aggressive quantization - INT8 quantization without calibration causes accuracy loss
Missing representative dataset - Quantization without calibration produces poor results
Ignoring model size - Large models fail to deploy on memory-constrained devices
Not testing on target hardware - Performance varies significantly across devices
Hardcoded preprocessing - Client-side preprocessing causes inconsistencies
Wrong input/output types - Type mismatches between model and inference code
Not using batch inference - Single-sample inference is inefficient for high throughput
Missing error handling - Production systems need robust error handling
Not monitoring model drift - Model performance degrades over time without monitoring
Incorrect tensor shapes - Shape mismatches cause runtime errors
Not optimizing for target device - Generic optimization doesn't leverage device-specific features
Forgetting model versioning - Difficult to rollback or A/B test without versions
Not using GPU acceleration - CPU-only inference is much slower on capable devices
Deploying untested models - Always validate models before production deployment

未验证转换后的模型 - TFLite转换可能导致精度下降
过度激进的量化 - 未校准的INT8量化会造成精度损失
缺少代表性数据集 - 无校准的量化效果不佳
忽略模型体积 - 大模型无法部署到内存受限设备
未在目标硬件上测试 - 不同设备的性能差异显著
硬编码预处理逻辑 - 客户端预处理会导致不一致
输入/输出类型错误 - 模型与推理代码的类型不匹配
未使用批量推理 - 单样本推理的吞吐量极低
缺少错误处理 - 生产系统需要健壮的错误处理机制
未监控模型漂移 - 模型性能会随时间推移而下降
张量形状错误 - 形状不匹配会导致运行时错误
未针对目标设备优化 - 通用优化无法利用设备特定特性
忘记模型版本控制 - 无版本号难以回滚或进行A/B测试
未使用GPU加速 - 仅用CPU推理在支持设备上速度极慢
部署未测试的模型 - 生产部署前务必验证模型

tensorflow-model-deployment

Original

Translation

TensorFlow Model Deployment

TensorFlow模型部署

SavedModel Export

SavedModel导出

Basic SavedModel Export

基础SavedModel导出

Save model to TensorFlow SavedModel format

Save model to TensorFlow SavedModel format

Load SavedModel

Load SavedModel

Make predictions with loaded model

Make predictions with loaded model

Create Serving Model

创建可托管模型

Create serving model from classifier

Create serving model from classifier

Inspect model inputs and outputs

Inspect model inputs and outputs

Save serving model

Save serving model

Export with Signatures

带签名导出

Define serving signature

Define serving signature

Save with signature

Save with signature

TensorFlow Lite Conversion

TensorFlow Lite转换

Basic TFLite Conversion

基础TFLite转换

Convert SavedModel to TFLite

Convert SavedModel to TFLite

Save TFLite model

Save TFLite model

From Keras Model

从Keras模型转换

Convert Keras model directly to TFLite

Convert Keras model directly to TFLite

Save to file

Save to file

From Concrete Functions

从具体函数转换

Convert from concrete function

Convert from concrete function

Export with Model Maker

使用Model Maker导出

Export trained model to TFLite with metadata

Export trained model to TFLite with metadata

Export multiple formats

Export multiple formats

Model Quantization

模型量化

Post-Training Float16 Quantization

训练后Float16量化

Create float16 quantization config

Create float16 quantization config

Export with quantization

Export with quantization

Dynamic Range Quantization

动态范围量化

Convert with dynamic range quantization

Convert with dynamic range quantization

Save quantized model

Save quantized model

Full Integer Quantization

全整数量化

Convert with full integer quantization

Convert with full integer quantization

Debug Quantization

量化调试

Create debug model with numeric verification

Create debug model with numeric verification

Calibrate and quantize with verification

Calibrate and quantize with verification

Get Quantization Converter

获取量化转换器

Apply quantization settings to converter