博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
聊聊flink的MemoryStateBackend
阅读量:6367 次
发布时间:2019-06-23

本文共 21991 字,大约阅读时间需要 73 分钟。

本文主要研究一下flink的MemoryStateBackend

StateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/StateBackend.java

@PublicEvolvingpublic interface StateBackend extends java.io.Serializable {	// ------------------------------------------------------------------------	//  Checkpoint storage - the durable persistence of checkpoint data	// ------------------------------------------------------------------------	/**	 * Resolves the given pointer to a checkpoint/savepoint into a checkpoint location. The location	 * supports reading the checkpoint metadata, or disposing the checkpoint storage location.	 *	 * 

If the state backend cannot understand the format of the pointer (for example because it * was created by a different state backend) this method should throw an {@code IOException}. * * @param externalPointer The external checkpoint pointer to resolve. * @return The checkpoint location handle. * * @throws IOException Thrown, if the state backend does not understand the pointer, or if * the pointer could not be resolved due to an I/O error. */ CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException; /** * Creates a storage for checkpoints for the given job. The checkpoint storage is * used to write checkpoint data and metadata. * * @param jobId The job to store checkpoint data for. * @return A checkpoint storage for the given job. * * @throws IOException Thrown if the checkpoint storage cannot be initialized. */ CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException; // ------------------------------------------------------------------------ // Structure Backends // ------------------------------------------------------------------------ /** * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding keyed state * and checkpointing it. Uses default TTL time provider. * *

Keyed State is state where each value is bound to a key. * * @param

The type of the keys by which the state is organized. * * @return The Keyed State Backend for the given job, operator, and key group range. * * @throws Exception This method may forward all exceptions that occur while instantiating the backend. */ default
AbstractKeyedStateBackend
createKeyedStateBackend( Environment env, JobID jobID, String operatorIdentifier, TypeSerializer
keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry) throws Exception { return createKeyedStateBackend( env, jobID, operatorIdentifier, keySerializer, numberOfKeyGroups, keyGroupRange, kvStateRegistry, TtlTimeProvider.DEFAULT ); } /** * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding
keyed state * and checkpointing it. * *

Keyed State is state where each value is bound to a key. * * @param

The type of the keys by which the state is organized. * * @return The Keyed State Backend for the given job, operator, and key group range. * * @throws Exception This method may forward all exceptions that occur while instantiating the backend. */ default
AbstractKeyedStateBackend
createKeyedStateBackend( Environment env, JobID jobID, String operatorIdentifier, TypeSerializer
keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider ) throws Exception { return createKeyedStateBackend( env, jobID, operatorIdentifier, keySerializer, numberOfKeyGroups, keyGroupRange, kvStateRegistry, ttlTimeProvider, new UnregisteredMetricsGroup()); } /** * Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding
keyed state * and checkpointing it. * *

Keyed State is state where each value is bound to a key. * * @param

The type of the keys by which the state is organized. * * @return The Keyed State Backend for the given job, operator, and key group range. * * @throws Exception This method may forward all exceptions that occur while instantiating the backend. */
AbstractKeyedStateBackend
createKeyedStateBackend( Environment env, JobID jobID, String operatorIdentifier, TypeSerializer
keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup) throws Exception; /** * Creates a new {@link OperatorStateBackend} that can be used for storing operator state. * *

Operator state is state that is associated with parallel operator (or function) instances, * rather than with keys. * * @param env The runtime environment of the executing task. * @param operatorIdentifier The identifier of the operator whose state should be stored. * * @return The OperatorStateBackend for operator identified by the job and operator identifier. * * @throws Exception This method may forward all exceptions that occur while instantiating the backend. */ OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier) throws Exception;}复制代码

  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed
  • flink目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • MemoryStateBackend可以在构造器中指定大小,默认是5MB,可以增大但是不能超过akka frame size;FsStateBackend模式把TaskManager的state存储在内存,但是可以把checkpoint的state存储到filesystem中(比如HDFS);RocksDBStateBackend把working state存储在RocksDB中,checkpoint的state存储在filesystem
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的
  • StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend

AbstractStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/AbstractStateBackend.java

/** * An abstract base implementation of the {@link StateBackend} interface. * * 

This class has currently no contents and only kept to not break the prior class hierarchy for users. */@PublicEvolvingpublic abstract class AbstractStateBackend implements StateBackend, java.io.Serializable { private static final long serialVersionUID = 4620415814639230247L; // ------------------------------------------------------------------------ // State Backend - State-Holding Backends // ------------------------------------------------------------------------ @Override public abstract

AbstractKeyedStateBackend
createKeyedStateBackend( Environment env, JobID jobID, String operatorIdentifier, TypeSerializer
keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup) throws IOException; @Override public abstract OperatorStateBackend createOperatorStateBackend( Environment env, String operatorIdentifier) throws Exception;}复制代码

  • AbstractStateBackend声明实现StateBackend及Serializable接口,这里没有新增其他内容

AbstractFileStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/filesystem/AbstractFileStateBackend.java

@PublicEvolvingpublic abstract class AbstractFileStateBackend extends AbstractStateBackend {	private static final long serialVersionUID = 1L;	// ------------------------------------------------------------------------	//  State Backend Properties	// ------------------------------------------------------------------------	/** The path where checkpoints will be stored, or null, if none has been configured. */	@Nullable	private final Path baseCheckpointPath;	/** The path where savepoints will be stored, or null, if none has been configured. */	@Nullable	private final Path baseSavepointPath;	//......	// ------------------------------------------------------------------------	//  Initialization and metadata storage	// ------------------------------------------------------------------------	@Override	public CompletedCheckpointStorageLocation resolveCheckpoint(String pointer) throws IOException {		return AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer);	}	// ------------------------------------------------------------------------	//  Utilities	// ------------------------------------------------------------------------	/**	 * Checks the validity of the path's scheme and path.	 *	 * @param path The path to check.	 * @return The URI as a Path.	 *	 * @throws IllegalArgumentException Thrown, if the URI misses scheme or path.	 */	private static Path validatePath(Path path) {		final URI uri = path.toUri();		final String scheme = uri.getScheme();		final String pathPart = uri.getPath();		// some validity checks		if (scheme == null) {			throw new IllegalArgumentException("The scheme (hdfs://, file://, etc) is null. " +					"Please specify the file system scheme explicitly in the URI.");		}		if (pathPart == null) {			throw new IllegalArgumentException("The path to store the checkpoint data in is null. " +					"Please specify a directory path for the checkpoint data.");		}		if (pathPart.length() == 0 || pathPart.equals("/")) {			throw new IllegalArgumentException("Cannot use the root directory for checkpoints.");		}		return path;	}	@Nullable	private static Path parameterOrConfigured(@Nullable Path path, Configuration config, ConfigOption
option) { if (path != null) { return path; } else { String configValue = config.getString(option); try { return configValue == null ? null : new Path(configValue); } catch (IllegalArgumentException e) { throw new IllegalConfigurationException("Cannot parse value for " + option.key() + " : " + configValue + " . Not a valid path."); } } }}复制代码
  • AbstractFileStateBackend继承了AbstractStateBackend,它有baseCheckpointPath、baseSavepointPath两个属性,允许为null,路径的格式为hdfs://或者file://开头;resolveCheckpoint方法用于解析checkpoint或savepoint的location,这里使用AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer)来完成

MemoryStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/memory/MemoryStateBackend.java

@PublicEvolvingpublic class MemoryStateBackend extends AbstractFileStateBackend implements ConfigurableStateBackend {	private static final long serialVersionUID = 4109305377809414635L;	/** The default maximal size that the snapshotted memory state may have (5 MiBytes). */	public static final int DEFAULT_MAX_STATE_SIZE = 5 * 1024 * 1024;	/** The maximal size that the snapshotted memory state may have. */	private final int maxStateSize;	/** Switch to chose between synchronous and asynchronous snapshots.	 * A value of 'UNDEFINED' means not yet configured, in which case the default will be used. */	private final TernaryBoolean asynchronousSnapshots;	// ------------------------------------------------------------------------	/**	 * Creates a new memory state backend that accepts states whose serialized forms are	 * up to the default state size (5 MB).	 *	 * 

Checkpoint and default savepoint locations are used as specified in the * runtime configuration. */ public MemoryStateBackend() { this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED); } /** * Creates a new memory state backend that accepts states whose serialized forms are * up to the default state size (5 MB). The state backend uses asynchronous snapshots * or synchronous snapshots as configured. * *

Checkpoint and default savepoint locations are used as specified in the * runtime configuration. * * @param asynchronousSnapshots Switch to enable asynchronous snapshots. */ public MemoryStateBackend(boolean asynchronousSnapshots) { this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.fromBoolean(asynchronousSnapshots)); } /** * Creates a new memory state backend that accepts states whose serialized forms are * up to the given number of bytes. * *

Checkpoint and default savepoint locations are used as specified in the * runtime configuration. * *

WARNING: Increasing the size of this value beyond the default value * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care. * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there * and the JobManager needs to be able to hold all aggregated state in its memory. * * @param maxStateSize The maximal size of the serialized state */ public MemoryStateBackend(int maxStateSize) { this(null, null, maxStateSize, TernaryBoolean.UNDEFINED); } /** * Creates a new memory state backend that accepts states whose serialized forms are * up to the given number of bytes and that uses asynchronous snashots as configured. * *

Checkpoint and default savepoint locations are used as specified in the * runtime configuration. * *

WARNING: Increasing the size of this value beyond the default value * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care. * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there * and the JobManager needs to be able to hold all aggregated state in its memory. * * @param maxStateSize The maximal size of the serialized state * @param asynchronousSnapshots Switch to enable asynchronous snapshots. */ public MemoryStateBackend(int maxStateSize, boolean asynchronousSnapshots) { this(null, null, maxStateSize, TernaryBoolean.fromBoolean(asynchronousSnapshots)); } /** * Creates a new MemoryStateBackend, setting optionally the path to persist checkpoint metadata * to, and to persist savepoints to. * * @param checkpointPath The path to write checkpoint metadata to. If null, the value from * the runtime configuration will be used. * @param savepointPath The path to write savepoints to. If null, the value from * the runtime configuration will be used. */ public MemoryStateBackend(@Nullable String checkpointPath, @Nullable String savepointPath) { this(checkpointPath, savepointPath, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED); } /** * Creates a new MemoryStateBackend, setting optionally the paths to persist checkpoint metadata * and savepoints to, as well as configuring state thresholds and asynchronous operations. * *

WARNING: Increasing the size of this value beyond the default value * ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care. * The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there * and the JobManager needs to be able to hold all aggregated state in its memory. * * @param checkpointPath The path to write checkpoint metadata to. If null, the value from * the runtime configuration will be used. * @param savepointPath The path to write savepoints to. If null, the value from * the runtime configuration will be used. * @param maxStateSize The maximal size of the serialized state. * @param asynchronousSnapshots Flag to switch between synchronous and asynchronous * snapshot mode. If null, the value configured in the * runtime configuration will be used. */ public MemoryStateBackend( @Nullable String checkpointPath, @Nullable String savepointPath, int maxStateSize, TernaryBoolean asynchronousSnapshots) { super(checkpointPath == null ? null : new Path(checkpointPath), savepointPath == null ? null : new Path(savepointPath)); checkArgument(maxStateSize > 0, "maxStateSize must be > 0"); this.maxStateSize = maxStateSize; this.asynchronousSnapshots = asynchronousSnapshots; } /** * Private constructor that creates a re-configured copy of the state backend. * * @param original The state backend to re-configure * @param configuration The configuration */ private MemoryStateBackend(MemoryStateBackend original, Configuration configuration) { super(original.getCheckpointPath(), original.getSavepointPath(), configuration); this.maxStateSize = original.maxStateSize; // if asynchronous snapshots were configured, use that setting, // else check the configuration this.asynchronousSnapshots = original.asynchronousSnapshots.resolveUndefined( configuration.getBoolean(CheckpointingOptions.ASYNC_SNAPSHOTS)); } // ------------------------------------------------------------------------ // Properties // ------------------------------------------------------------------------ /** * Gets the maximum size that an individual state can have, as configured in the * constructor (by default {@value #DEFAULT_MAX_STATE_SIZE}). * * @return The maximum size that an individual state can have */ public int getMaxStateSize() { return maxStateSize; } /** * Gets whether the key/value data structures are asynchronously snapshotted. * *

If not explicitly configured, this is the default value of * {@link CheckpointingOptions#ASYNC_SNAPSHOTS}. */ public boolean isUsingAsynchronousSnapshots() { return asynchronousSnapshots.getOrDefault(CheckpointingOptions.ASYNC_SNAPSHOTS.defaultValue()); } // ------------------------------------------------------------------------ // Reconfiguration // ------------------------------------------------------------------------ /** * Creates a copy of this state backend that uses the values defined in the configuration * for fields where that were not specified in this state backend. * * @param config the configuration * @return The re-configured variant of the state backend */ @Override public MemoryStateBackend configure(Configuration config) { return new MemoryStateBackend(this, config); } // ------------------------------------------------------------------------ // checkpoint state persistence // ------------------------------------------------------------------------ @Override public CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException { return new MemoryBackendCheckpointStorage(jobId, getCheckpointPath(), getSavepointPath(), maxStateSize); } // ------------------------------------------------------------------------ // state holding structures // ------------------------------------------------------------------------ @Override public OperatorStateBackend createOperatorStateBackend( Environment env, String operatorIdentifier) throws Exception { return new DefaultOperatorStateBackend( env.getUserClassLoader(), env.getExecutionConfig(), isUsingAsynchronousSnapshots()); } @Override public

AbstractKeyedStateBackend
createKeyedStateBackend( Environment env, JobID jobID, String operatorIdentifier, TypeSerializer
keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange, TaskKvStateRegistry kvStateRegistry, TtlTimeProvider ttlTimeProvider, MetricGroup metricGroup) { TaskStateManager taskStateManager = env.getTaskStateManager(); HeapPriorityQueueSetFactory priorityQueueSetFactory = new HeapPriorityQueueSetFactory(keyGroupRange, numberOfKeyGroups, 128); return new HeapKeyedStateBackend<>( kvStateRegistry, keySerializer, env.getUserClassLoader(), numberOfKeyGroups, keyGroupRange, isUsingAsynchronousSnapshots(), env.getExecutionConfig(), taskStateManager.createLocalRecoveryConfig(), priorityQueueSetFactory, ttlTimeProvider); } // ------------------------------------------------------------------------ // utilities // ------------------------------------------------------------------------ @Override public String toString() { return "MemoryStateBackend (data in heap memory / checkpoints to JobManager) " + "(checkpoints: '" + getCheckpointPath() + "', savepoints: '" + getSavepointPath() + "', asynchronous: " + asynchronousSnapshots + ", maxStateSize: " + maxStateSize + ")"; }}复制代码

  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中(但是为了高可用,也可以设置checkpoint state存储到filesystem);MemoryStateBackend仅仅用来做实验用途,比如本地启动或者所需的state非常小,对于生产需要改为使用FsStateBackend(将TaskManager的working state存储在内存,但是将JobManager的checkpoint state存储到文件系统以支持更大的state存储)
  • MemoryStateBackend有个maxStateSize属性(默认DEFAULT_MAX_STATE_SIZE为5MB),每个state的大小不能超过maxStateSize,一个task的所有state不能超过RPC系统的限制(默认是10MB,可以修改但不建议),所有retained checkpoints的state大小总和不能超过JobManager的JVM heap大小;另外如果创建MemoryStateBackend时未指定checkpointPath及savepointPath,则会从flink-conf.yaml中读取全局默认值;MemoryStateBackend里头还有一个asynchronousSnapshots属性,是TernaryBoolean类型(TRUE、FALSE、UNDEFINED),其中UNDEFINED表示没有配置,将会使用默认值
  • MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

小结

  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed;目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的;StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend
  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中;MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

doc

转载地址:http://zjrma.baihongyu.com/

你可能感兴趣的文章
CentOS下如何从vi编辑器插入模式退出到命令模式
查看>>
Mysql索引的类型
查看>>
Eclipse debug模式 总是进入processWorkerExit
查看>>
Nginx的https配置记录以及http强制跳转到https的方法梳理
查看>>
[每天五分钟,备战架构师-1]操作系统的类型和结构
查看>>
springcloud(十三):Eureka 2.X 停止开发,但注册中心还有更多选择:Consul 使用详解...
查看>>
关于Boolean类型做为同步锁异常问题
查看>>
TestLink运行环境:Redhat5+Apache2.2.17+php-5.3.5+MySQL5.5.9-1
查看>>
Get File Name from File Path in Python | Code Comments
查看>>
显示本月每一天日期
查看>>
[转]java 自动装箱与拆箱
查看>>
NET的堆和栈04,对托管和非托管资源的垃圾回收以及内存分配
查看>>
think in coding
查看>>
IdHttpServer实现webservice
查看>>
HTML的音频和视频
查看>>
Unsupported major.minor version 52.0
查看>>
面对对象之差异化的网络数据交互方式--单机游戏开发之无缝切换到C/S模式
查看>>
优酷网架构学习笔记
查看>>
把HDFS里的json数据转换成csv格式
查看>>
WEEX-EROS | 集成并使用 bindingx
查看>>