tensorflow-model-deployment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTensorFlow Model Deployment
TensorFlow模型部署
Deploy TensorFlow models to production environments using SavedModel format, TensorFlow Lite for mobile and edge devices, quantization techniques, and serving infrastructure. This skill covers model export, optimization, conversion, and deployment strategies.
使用SavedModel格式、面向移动和边缘设备的TensorFlow Lite、量化技术以及托管基础设施,将TensorFlow模型部署到生产环境。本技能涵盖模型导出、优化、转换和部署策略。
SavedModel Export
SavedModel导出
Basic SavedModel Export
基础SavedModel导出
python
undefinedpython
undefinedSave model to TensorFlow SavedModel format
Save model to TensorFlow SavedModel format
model.save('path/to/saved_model')
model.save('path/to/saved_model')
Load SavedModel
Load SavedModel
loaded_model = tf.keras.models.load_model('path/to/saved_model')
loaded_model = tf.keras.models.load_model('path/to/saved_model')
Make predictions with loaded model
Make predictions with loaded model
predictions = loaded_model.predict(test_data)
undefinedpredictions = loaded_model.predict(test_data)
undefinedCreate Serving Model
创建可托管模型
python
undefinedpython
undefinedCreate serving model from classifier
Create serving model from classifier
serving_model = classifier.create_serving_model()
serving_model = classifier.create_serving_model()
Inspect model inputs and outputs
Inspect model inputs and outputs
print(f'Model's input shape and type: {serving_model.inputs}')
print(f'Model's output shape and type: {serving_model.outputs}')
print(f'Model's input shape and type: {serving_model.inputs}')
print(f'Model's output shape and type: {serving_model.outputs}')
Save serving model
Save serving model
serving_model.save('model_path')
undefinedserving_model.save('model_path')
undefinedExport with Signatures
带签名导出
python
undefinedpython
undefinedDefine serving signature
Define serving signature
@tf.function(input_signature=[tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.float32)])
def serve(images):
return model(images, training=False)
@tf.function(input_signature=[tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.float32)])
def serve(images):
return model(images, training=False)
Save with signature
Save with signature
tf.saved_model.save(
model,
'saved_model_dir',
signatures={'serving_default': serve}
)
undefinedtf.saved_model.save(
model,
'saved_model_dir',
signatures={'serving_default': serve}
)
undefinedTensorFlow Lite Conversion
TensorFlow Lite转换
Basic TFLite Conversion
基础TFLite转换
python
undefinedpython
undefinedConvert SavedModel to TFLite
Convert SavedModel to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
tflite_model = converter.convert()
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
tflite_model = converter.convert()
Save TFLite model
Save TFLite model
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
undefinedwith open('model.tflite', 'wb') as f:
f.write(tflite_model)
undefinedFrom Keras Model
从Keras模型转换
python
undefinedpython
undefinedConvert Keras model directly to TFLite
Convert Keras model directly to TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
Save to file
Save to file
import pathlib
tflite_models_dir = pathlib.Path("tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)
tflite_model_file = tflite_models_dir / "mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)
undefinedimport pathlib
tflite_models_dir = pathlib.Path("tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)
tflite_model_file = tflite_models_dir / "mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)
undefinedFrom Concrete Functions
从具体函数转换
python
undefinedpython
undefinedConvert from concrete function
Convert from concrete function
concrete_function = model.signatures['serving_default']
converter = tf.lite.TFLiteConverter.from_concrete_functions(
[concrete_function]
)
tflite_model = converter.convert()
undefinedconcrete_function = model.signatures['serving_default']
converter = tf.lite.TFLiteConverter.from_concrete_functions(
[concrete_function]
)
tflite_model = converter.convert()
undefinedExport with Model Maker
使用Model Maker导出
python
undefinedpython
undefinedExport trained model to TFLite with metadata
Export trained model to TFLite with metadata
model.export(
export_dir='output/',
tflite_filename='model.tflite',
label_filename='labels.txt',
vocab_filename='vocab.txt'
)
model.export(
export_dir='output/',
tflite_filename='model.tflite',
label_filename='labels.txt',
vocab_filename='vocab.txt'
)
Export multiple formats
Export multiple formats
model.export(
export_dir='output/',
export_format=[
mm.ExportFormat.TFLITE,
mm.ExportFormat.SAVED_MODEL,
mm.ExportFormat.LABEL
]
)
undefinedmodel.export(
export_dir='output/',
export_format=[
mm.ExportFormat.TFLITE,
mm.ExportFormat.SAVED_MODEL,
mm.ExportFormat.LABEL
]
)
undefinedModel Quantization
模型量化
Post-Training Float16 Quantization
训练后Float16量化
python
from tflite_model_maker.config import QuantizationConfigpython
from tflite_model_maker.config import QuantizationConfigCreate float16 quantization config
Create float16 quantization config
config = QuantizationConfig.for_float16()
config = QuantizationConfig.for_float16()
Export with quantization
Export with quantization
model.export(
export_dir='.',
tflite_filename='model_fp16.tflite',
quantization_config=config
)
undefinedmodel.export(
export_dir='.',
tflite_filename='model_fp16.tflite',
quantization_config=config
)
undefinedDynamic Range Quantization
动态范围量化
python
undefinedpython
undefinedConvert with dynamic range quantization
Convert with dynamic range quantization
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
Save quantized model
Save quantized model
with open('model_quantized.tflite', 'wb') as f:
f.write(tflite_model)
undefinedwith open('model_quantized.tflite', 'wb') as f:
f.write(tflite_model)
undefinedFull Integer Quantization
全整数量化
python
def representative_dataset():
"""Generate representative dataset for calibration."""
for i in range(100):
yield [np.random.rand(1, 224, 224, 3).astype(np.float32)]python
def representative_dataset():
"""Generate representative dataset for calibration."""
for i in range(100):
yield [np.random.rand(1, 224, 224, 3).astype(np.float32)]Convert with full integer quantization
Convert with full integer quantization
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()
undefinedconverter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()
undefinedDebug Quantization
量化调试
python
from tensorflow.lite.python import convertpython
from tensorflow.lite.python import convertCreate debug model with numeric verification
Create debug model with numeric verification
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = calibration_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = calibration_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
Calibrate and quantize with verification
Calibrate and quantize with verification
converter._experimental_calibrate_only = True
calibrated = converter.convert()
debug_model = convert.mlir_quantize(calibrated, enable_numeric_verify=True)
undefinedconverter._experimental_calibrate_only = True
calibrated = converter.convert()
debug_model = convert.mlir_quantize(calibrated, enable_numeric_verify=True)
undefinedGet Quantization Converter
获取量化转换器
python
undefinedpython
undefinedApply quantization settings to converter
Apply quantization settings to converter
def get_converter_with_quantization(converter, **kwargs):
"""Apply quantization configuration to converter."""
config = QuantizationConfig(**kwargs)
return config.get_converter_with_quantization(converter)
def get_converter_with_quantization(converter, **kwargs):
"""Apply quantization configuration to converter."""
config = QuantizationConfig(**kwargs)
return config.get_converter_with_quantization(converter)
Use with custom settings
Use with custom settings
converter = tf.lite.TFLiteConverter.from_keras_model(model)
quantized_converter = get_converter_with_quantization(
converter,
optimizations=[tf.lite.Optimize.DEFAULT],
representative_dataset=representative_dataset
)
tflite_model = quantized_converter.convert()
undefinedconverter = tf.lite.TFLiteConverter.from_keras_model(model)
quantized_converter = get_converter_with_quantization(
converter,
optimizations=[tf.lite.Optimize.DEFAULT],
representative_dataset=representative_dataset
)
tflite_model = quantized_converter.convert()
undefinedJAX to TFLite Conversion
JAX转TFLite转换
Basic JAX Conversion
基础JAX转换
python
from orbax.export import ExportManager
from orbax.export import JaxModule
from orbax.export import ServingConfig
import tensorflow as tf
import jax.numpy as jnp
def model_fn(_, x):
return jnp.sin(jnp.cos(x))
jax_module = JaxModule({}, model_fn, input_polymorphic_shape='b, ...')python
from orbax.export import ExportManager
from orbax.export import JaxModule
from orbax.export import ServingConfig
import tensorflow as tf
import jax.numpy as jnp
def model_fn(_, x):
return jnp.sin(jnp.cos(x))
jax_module = JaxModule({}, model_fn, input_polymorphic_shape='b, ...')Option 1: Direct SavedModel conversion
Option 1: Direct SavedModel conversion
tf.saved_model.save(
jax_module,
'/some/directory',
signatures=jax_module.methods[JaxModule.DEFAULT_METHOD_KEY].get_concrete_function(
tf.TensorSpec(shape=(None,), dtype=tf.float32, name="input")
),
options=tf.saved_model.SaveOptions(experimental_custom_gradients=True),
)
converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory')
tflite_model = converter.convert()
undefinedtf.saved_model.save(
jax_module,
'/some/directory',
signatures=jax_module.methods[JaxModule.DEFAULT_METHOD_KEY].get_concrete_function(
tf.TensorSpec(shape=(None,), dtype=tf.float32, name="input")
),
options=tf.saved_model.SaveOptions(experimental_custom_gradients=True),
)
converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory')
tflite_model = converter.convert()
undefinedJAX with Pre/Post Processing
带前后处理的JAX转换
python
undefinedpython
undefinedOption 2: With preprocessing and postprocessing
Option 2: With preprocessing and postprocessing
serving_config = ServingConfig(
'Serving_default',
input_signature=[tf.TensorSpec(shape=(None,), dtype=tf.float32, name='input')],
tf_preprocessor=lambda x: x,
tf_postprocessor=lambda out: {'output': out}
)
export_mgr = ExportManager(jax_module, [serving_config])
export_mgr.save('/some/directory')
converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory')
tflite_model = converter.convert()
undefinedserving_config = ServingConfig(
'Serving_default',
input_signature=[tf.TensorSpec(shape=(None,), dtype=tf.float32, name='input')],
tf_preprocessor=lambda x: x,
tf_postprocessor=lambda out: {'output': out}
)
export_mgr = ExportManager(jax_module, [serving_config])
export_mgr.save('/some/directory')
converter = tf.lite.TFLiteConverter.from_saved_model('/some/directory')
tflite_model = converter.convert()
undefinedJAX ResNet50 Example
JAX ResNet50示例
python
from orbax.export import ExportManager, JaxModule, ServingConfigpython
from orbax.export import ExportManager, JaxModule, ServingConfigWrap the model params and function into a JaxModule
Wrap the model params and function into a JaxModule
jax_module = JaxModule({}, jax_model.apply, trainable=False)
jax_module = JaxModule({}, jax_model.apply, trainable=False)
Specify the serving configuration and export the model
Specify the serving configuration and export the model
serving_config = ServingConfig(
"serving_default",
input_signature=[tf.TensorSpec([480, 640, 3], tf.float32, name="inputs")],
tf_preprocessor=resnet_image_processor,
tf_postprocessor=lambda x: tf.argmax(x, axis=-1),
)
export_manager = ExportManager(jax_module, [serving_config])
saved_model_dir = "resnet50_saved_model"
export_manager.save(saved_model_dir)
serving_config = ServingConfig(
"serving_default",
input_signature=[tf.TensorSpec([480, 640, 3], tf.float32, name="inputs")],
tf_preprocessor=resnet_image_processor,
tf_postprocessor=lambda x: tf.argmax(x, axis=-1),
)
export_manager = ExportManager(jax_module, [serving_config])
saved_model_dir = "resnet50_saved_model"
export_manager.save(saved_model_dir)
Convert to TFLite
Convert to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
undefinedconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()
undefinedModel Optimization
模型优化
Graph Transformation
图转换
bash
undefinedbash
undefinedBuild graph transformation tool
Build graph transformation tool
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel build tensorflow/tools/graph_transforms:transform_graph
Optimize for deployment
Optimize for deployment
bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
undefinedbazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
undefinedFix Mobile Kernel Errors
修复移动端内核错误
bash
undefinedbash
undefinedOptimize for mobile deployment
Optimize for mobile deployment
bazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
undefinedbazel-bin/tensorflow/tools/graph_transforms/transform_graph
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb
--inputs='Mul'
--outputs='softmax'
--transforms=' strip_unused_nodes(type=float, shape="1,299,299,3") fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms'
undefinedEfficientDet Deployment
EfficientDet部署
Export SavedModel
导出SavedModel
python
def export_saved_model(
model: tf.keras.Model,
saved_model_dir: str,
batch_size: Optional[int] = None,
pre_mode: Optional[str] = 'infer',
post_mode: Optional[str] = 'global'
) -> None:
"""Export EfficientDet model to SavedModel format.
Args:
model: The EfficientDetNet model used for training
saved_model_dir: Folder path for saved model
batch_size: Batch size to be saved in saved_model
pre_mode: Pre-processing mode ('infer' or None)
post_mode: Post-processing mode ('global', 'per_class', 'tflite', or None)
"""
# Implementation exports model with specified configuration
tf.saved_model.save(model, saved_model_dir)python
def export_saved_model(
model: tf.keras.Model,
saved_model_dir: str,
batch_size: Optional[int] = None,
pre_mode: Optional[str] = 'infer',
post_mode: Optional[str] = 'global'
) -> None:
"""Export EfficientDet model to SavedModel format.
Args:
model: The EfficientDetNet model used for training
saved_model_dir: Folder path for saved model
batch_size: Batch size to be saved in saved_model
pre_mode: Pre-processing mode ('infer' or None)
post_mode: Post-processing mode ('global', 'per_class', 'tflite', or None)
"""
# Implementation exports model with specified configuration
tf.saved_model.save(model, saved_model_dir)Complete Export Pipeline
完整导出流程
python
undefinedpython
undefinedExport model with all formats
Export model with all formats
export_saved_model(
model=my_keras_model,
saved_model_dir="./saved_model_export",
batch_size=1,
pre_mode='infer',
post_mode='global'
)
export_saved_model(
model=my_keras_model,
saved_model_dir="./saved_model_export",
batch_size=1,
pre_mode='infer',
post_mode='global'
)
Convert to TFLite
Convert to TFLite
converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model_export')
tflite_model = converter.convert()
converter = tf.lite.TFLiteConverter.from_saved_model('./saved_model_export')
tflite_model = converter.convert()
Save TFLite model
Save TFLite model
with open('efficientdet.tflite', 'wb') as f:
f.write(tflite_model)
undefinedwith open('efficientdet.tflite', 'wb') as f:
f.write(tflite_model)
undefinedMobile Deployment
移动端部署
Deploy to Android
部署到Android
bash
undefinedbash
undefinedPush TFLite model to Android device
Push TFLite model to Android device
adb push mobilenet_quant_v1_224.tflite /data/local/tmp
adb push mobilenet_quant_v1_224.tflite /data/local/tmp
Run benchmark on device
Run benchmark on device
adb shell /data/local/tmp/benchmark_model
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4
undefinedadb shell /data/local/tmp/benchmark_model
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4
--graph=/data/local/tmp/mobilenet_quant_v1_224.tflite
--num_threads=4
undefinedTFLite Interpreter Usage
TFLite解释器使用
python
undefinedpython
undefinedLoad TFLite model and allocate tensors
Load TFLite model and allocate tensors
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
interpreter = tf.lite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
Get input and output details
Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
Prepare input data
Prepare input data
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
Run inference
Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
Get predictions
Get predictions
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
undefinedoutput_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)
undefinedDistributed Training and Serving
分布式训练与托管
MirroredStrategy for Multi-GPU
多GPU的MirroredStrategy
python
undefinedpython
undefinedCreate the strategy instance. It will automatically detect all the GPUs.
Create the strategy instance. It will automatically detect all the GPUs.
mirrored_strategy = tf.distribute.MirroredStrategy()
mirrored_strategy = tf.distribute.MirroredStrategy()
Create and compile the keras model under strategy.scope()
Create and compile the keras model under strategy.scope()
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
with mirrored_strategy.scope():
model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
model.compile(loss='mse', optimizer='sgd')
Call model.fit and model.evaluate as before.
Call model.fit and model.evaluate as before.
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)
dataset = tf.data.Dataset.from_tensors(([1.], [1.])).repeat(100).batch(10)
model.fit(dataset, epochs=2)
model.evaluate(dataset)
Save distributed model
Save distributed model
model.save('distributed_model')
undefinedmodel.save('distributed_model')
undefinedTPU Variable Optimization
TPU变量优化
python
undefinedpython
undefinedOptimized TPU variable reformatting in MLIR
Optimized TPU variable reformatting in MLIR
Before optimization:
Before optimization:
var0 = ...
var1 = ...
tf.while_loop(..., var0, var1) {
tf_device.replicate([var0, var1] as rvar) {
compile = tf._TPUCompileMlir()
tf.TPUExecuteAndUpdateVariablesOp(rvar, compile)
}
}
var0 = ...
var1 = ...
tf.while_loop(..., var0, var1) {
tf_device.replicate([var0, var1] as rvar) {
compile = tf._TPUCompileMlir()
tf.TPUExecuteAndUpdateVariablesOp(rvar, compile)
}
}
After optimization with state variables:
After optimization with state variables:
var0 = ...
var1 = ...
state_var0 = ...
state_var1 = ...
tf.while_loop(..., var0, var1, state_var0, state_var1) {
tf_device.replicate(
[var0, var1] as rvar,
[state_var0, state_var1] as rstate
) {
compile = tf._TPUCompileMlir()
tf.TPUReshardVariablesOp(rvar, compile, rstate)
tf.TPUExecuteAndUpdateVariablesOp(rvar, compile)
}
}
undefinedvar0 = ...
var1 = ...
state_var0 = ...
state_var1 = ...
tf.while_loop(..., var0, var1, state_var0, state_var1) {
tf_device.replicate(
[var0, var1] as rvar,
[state_var0, state_var1] as rstate
) {
compile = tf._TPUCompileMlir()
tf.TPUReshardVariablesOp(rvar, compile, rstate)
tf.TPUExecuteAndUpdateVariablesOp(rvar, compile)
}
}
undefinedModel Serving with TensorFlow Serving
使用TensorFlow Serving进行模型托管
Export for TensorFlow Serving
为TensorFlow Serving导出模型
python
undefinedpython
undefinedExport model with version number
Export model with version number
export_path = os.path.join('serving_models', 'my_model', '1')
tf.saved_model.save(model, export_path)
export_path = os.path.join('serving_models', 'my_model', '1')
tf.saved_model.save(model, export_path)
Export multiple versions
Export multiple versions
for version in [1, 2, 3]:
export_path = os.path.join('serving_models', 'my_model', str(version))
tf.saved_model.save(model, export_path)
undefinedfor version in [1, 2, 3]:
export_path = os.path.join('serving_models', 'my_model', str(version))
tf.saved_model.save(model, export_path)
undefinedDocker Deployment
Docker部署
bash
undefinedbash
undefinedPull TensorFlow Serving image
Pull TensorFlow Serving image
docker pull tensorflow/serving
docker pull tensorflow/serving
Run TensorFlow Serving container
Run TensorFlow Serving container
docker run -p 8501:8501
--mount type=bind,source=/path/to/my_model,target=/models/my_model
-e MODEL_NAME=my_model
-t tensorflow/serving
--mount type=bind,source=/path/to/my_model,target=/models/my_model
-e MODEL_NAME=my_model
-t tensorflow/serving
docker run -p 8501:8501
--mount type=bind,source=/path/to/my_model,target=/models/my_model
-e MODEL_NAME=my_model
-t tensorflow/serving
--mount type=bind,source=/path/to/my_model,target=/models/my_model
-e MODEL_NAME=my_model
-t tensorflow/serving
Test REST API
Test REST API
curl -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
-X POST http://localhost:8501/v1/models/my_model:predict
-X POST http://localhost:8501/v1/models/my_model:predict
undefinedcurl -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
-X POST http://localhost:8501/v1/models/my_model:predict
-X POST http://localhost:8501/v1/models/my_model:predict
undefinedModel Validation and Testing
模型验证与测试
Validate TFLite Model
验证TFLite模型
python
undefinedpython
undefinedCompare TFLite predictions with original model
Compare TFLite predictions with original model
def validate_tflite_model(model, tflite_model_path, test_data):
"""Validate TFLite model against original."""
# Original model predictions
original_predictions = model.predict(test_data)
# TFLite model predictions
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
tflite_predictions = []
for sample in test_data:
interpreter.set_tensor(input_details[0]['index'], sample[np.newaxis, ...])
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
tflite_predictions.append(output[0])
tflite_predictions = np.array(tflite_predictions)
# Compare predictions
difference = np.abs(original_predictions - tflite_predictions)
print(f"Mean absolute difference: {np.mean(difference):.6f}")
print(f"Max absolute difference: {np.max(difference):.6f}")undefineddef validate_tflite_model(model, tflite_model_path, test_data):
"""Validate TFLite model against original."""
# Original model predictions
original_predictions = model.predict(test_data)
# TFLite model predictions
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
tflite_predictions = []
for sample in test_data:
interpreter.set_tensor(input_details[0]['index'], sample[np.newaxis, ...])
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
tflite_predictions.append(output[0])
tflite_predictions = np.array(tflite_predictions)
# Compare predictions
difference = np.abs(original_predictions - tflite_predictions)
print(f"Mean absolute difference: {np.mean(difference):.6f}")
print(f"Max absolute difference: {np.max(difference):.6f}")undefinedModel Size Comparison
模型大小对比
python
import os
def compare_model_sizes(saved_model_path, tflite_model_path):
"""Compare sizes of SavedModel and TFLite."""
# SavedModel size (sum of all files)
saved_model_size = sum(
os.path.getsize(os.path.join(dirpath, filename))
for dirpath, _, filenames in os.walk(saved_model_path)
for filename in filenames
)
# TFLite model size
tflite_size = os.path.getsize(tflite_model_path)
print(f"SavedModel size: {saved_model_size / 1e6:.2f} MB")
print(f"TFLite model size: {tflite_size / 1e6:.2f} MB")
print(f"Size reduction: {(1 - tflite_size / saved_model_size) * 100:.1f}%")python
import os
def compare_model_sizes(saved_model_path, tflite_model_path):
"""Compare sizes of SavedModel and TFLite."""
# SavedModel size (sum of all files)
saved_model_size = sum(
os.path.getsize(os.path.join(dirpath, filename))
for dirpath, _, filenames in os.walk(saved_model_path)
for filename in filenames
)
# TFLite model size
tflite_size = os.path.getsize(tflite_model_path)
print(f"SavedModel size: {saved_model_size / 1e6:.2f} MB")
print(f"TFLite model size: {tflite_size / 1e6:.2f} MB")
print(f"Size reduction: {(1 - tflite_size / saved_model_size) * 100:.1f}%")When to Use This Skill
何时使用本技能
Use the tensorflow-model-deployment skill when you need to:
- Export trained models for production serving
- Deploy models to mobile devices (iOS, Android)
- Optimize models for edge devices and IoT
- Convert models to TensorFlow Lite format
- Apply post-training quantization for model compression
- Set up TensorFlow Serving infrastructure
- Deploy models with Docker containers
- Create model serving APIs with REST or gRPC
- Optimize inference latency and throughput
- Reduce model size for bandwidth-constrained environments
- Convert JAX or PyTorch models to TensorFlow format
- Implement A/B testing with multiple model versions
- Deploy models to cloud platforms (GCP, AWS, Azure)
- Create on-device ML applications
- Optimize models for specific hardware accelerators
当你需要以下操作时,使用tensorflow-model-deployment技能:
- 导出训练好的模型用于生产环境托管
- 将模型部署到移动设备(iOS、Android)
- 为边缘设备和IoT优化模型
- 将模型转换为TensorFlow Lite格式
- 应用训练后量化进行模型压缩
- 搭建TensorFlow Serving基础设施
- 使用Docker容器部署模型
- 创建基于REST或gRPC的模型托管API
- 优化推理延迟和吞吐量
- 为带宽受限环境减小模型体积
- 将JAX或PyTorch模型转换为TensorFlow格式
- 使用多模型版本实现A/B测试
- 将模型部署到云平台(GCP、AWS、Azure)
- 创建端侧机器学习应用
- 为特定硬件加速器优化模型
Best Practices
最佳实践
- Always validate converted models - Compare TFLite predictions with original model to ensure accuracy
- Use SavedModel format - Standard format for production deployment and serving
- Apply appropriate quantization - Float16 for balanced speed/accuracy, INT8 for maximum compression
- Include preprocessing in model - Embed preprocessing in SavedModel for consistent inference
- Version your models - Use version numbers in export paths for model management
- Test on target devices - Validate performance on actual deployment hardware
- Monitor model size - Track model size before and after optimization
- Use representative datasets - Provide calibration data for accurate quantization
- Enable GPU delegation - Use GPU/TPU acceleration on supported devices
- Optimize batch sizes - Tune batch size for throughput vs latency tradeoffs
- Cache frequently used models - Load models once and reuse for multiple predictions
- Use TensorFlow Serving - Leverage built-in serving infrastructure for scalability
- Implement model warmup - Run dummy predictions to initialize serving systems
- Monitor inference metrics - Track latency, throughput, and error rates in production
- Use metadata in TFLite - Include labels and preprocessing info in model metadata
- 始终验证转换后的模型 - 对比TFLite与原模型的预测结果,确保精度
- 使用SavedModel格式 - 生产部署与托管的标准格式
- 应用合适的量化方式 - Float16兼顾速度与精度,INT8实现最大压缩
- 将预处理逻辑嵌入模型 - 在SavedModel中集成预处理,确保推理一致性
- 为模型添加版本号 - 在导出路径中使用版本号,便于模型管理
- 在目标设备上测试 - 在实际部署硬件上验证性能
- 监控模型体积 - 跟踪优化前后的模型大小
- 使用代表性数据集 - 提供校准数据以实现精准量化
- 启用GPU委托 - 在支持设备上使用GPU/TPU加速
- 优化批次大小 - 调整批次大小以平衡吞吐量与延迟
- 缓存常用模型 - 加载一次模型并重复用于多次预测
- 使用TensorFlow Serving - 利用内置托管基础设施实现可扩展性
- 实现模型预热 - 运行虚拟预测以初始化托管系统
- 监控推理指标 - 在生产环境中跟踪延迟、吞吐量和错误率
- 在TFLite中添加元数据 - 在模型元数据中包含标签和预处理信息
Common Pitfalls
常见陷阱
- Not validating converted models - TFLite conversion can introduce accuracy degradation
- Over-aggressive quantization - INT8 quantization without calibration causes accuracy loss
- Missing representative dataset - Quantization without calibration produces poor results
- Ignoring model size - Large models fail to deploy on memory-constrained devices
- Not testing on target hardware - Performance varies significantly across devices
- Hardcoded preprocessing - Client-side preprocessing causes inconsistencies
- Wrong input/output types - Type mismatches between model and inference code
- Not using batch inference - Single-sample inference is inefficient for high throughput
- Missing error handling - Production systems need robust error handling
- Not monitoring model drift - Model performance degrades over time without monitoring
- Incorrect tensor shapes - Shape mismatches cause runtime errors
- Not optimizing for target device - Generic optimization doesn't leverage device-specific features
- Forgetting model versioning - Difficult to rollback or A/B test without versions
- Not using GPU acceleration - CPU-only inference is much slower on capable devices
- Deploying untested models - Always validate models before production deployment
- 未验证转换后的模型 - TFLite转换可能导致精度下降
- 过度激进的量化 - 未校准的INT8量化会造成精度损失
- 缺少代表性数据集 - 无校准的量化效果不佳
- 忽略模型体积 - 大模型无法部署到内存受限设备
- 未在目标硬件上测试 - 不同设备的性能差异显著
- 硬编码预处理逻辑 - 客户端预处理会导致不一致
- 输入/输出类型错误 - 模型与推理代码的类型不匹配
- 未使用批量推理 - 单样本推理的吞吐量极低
- 缺少错误处理 - 生产系统需要健壮的错误处理机制
- 未监控模型漂移 - 模型性能会随时间推移而下降
- 张量形状错误 - 形状不匹配会导致运行时错误
- 未针对目标设备优化 - 通用优化无法利用设备特定特性
- 忘记模型版本控制 - 无版本号难以回滚或进行A/B测试
- 未使用GPU加速 - 仅用CPU推理在支持设备上速度极慢
- 部署未测试的模型 - 生产部署前务必验证模型