Discovery

共同研究先：Shibaura Institute of TechnologyAcademic 共同研究数 2

Conference Paper　2015 1 20　IEEE : Institute of Electrical and Electronics Engineers

Parallelization of tree-to-TLV serialization（Last author）

木からTLVへの直列化の並列化

Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara
【抄録】A serializer/deserializer (SerDe) is necessary to serialize a data object into a byte array and to deserialize in reverse direction. A SerDe that is used worldwide and runs quickly is the Protocol Buffer (ProtoBuf), which serializes a tree-structured data object into the Type-Length-Value (TLV) format. Acceleration of SerDe processing is beneficial because SerDes are used in various fields. This paper proposes a new method that accelerates the tree-to-TLV serialization through 2-way parallel processing called "parallelized serialization" and "parallelization with streaming". Experimental results show that parallelized serialization with 4 worker threads achieves a 1.97 fold shorter serialization time than when using a single worker thread, and the combination of 2-way parallel processing achieves a 2.11 fold shorter output time than that for ProtoBuf when 4 worker threads, File Output Stream and trees of 10,080 container nodes are used. © 2014 IEEE.
【抄録日本語訳】データオブジェクトをバイト配列に直列化し、逆方向にデシリアライズするためには、シリアライザ／デシリアライザ（SerDe）が必要である。このSerDeは、ツリー構造のデータオブジェクトをTLV（Type-Length-Value）形式にシリアライズするプロトコルバッファ（ProtoBuf）であり、世界中で使用され高速に実行されている。SerDeは様々な分野で利用されているため、SerDeの処理を高速化することは有益なことである。本論文では、「並列化直列化」と「ストリーミングによる並列化」という2つの並列処理によって、木からTLVへの直列化を高速化する新しい手法を提案する。実験の結果、4つのワーカスレッドを用いた並列化シリアライゼーションでは、単一のワーカスレッドを用いた場合に比べて1.97倍のシリアライゼーション時間の短縮を達成し、2ウェイ並列処理の組み合わせでは、4ワーカスレッド、ファイル出力ストリーム、1080コンテナノードの木を用いた場合にProtoBufの出力時間の短縮を2.11倍達成することを確認することができました。© 2014 IEEE.

Conference Paper　2013　IEEE : Institute of Electrical and Electronics Engineers

Dynamic profiling and feedback framework for reduce-side join（Last author）

リデュースサイドジョインのためのダイナミックプロファイリングとフィードバックフレームワーク

Makoto Nakayama, Kenichi Yamazaki, Satoshi Tanaka, Hironori Kasahara
【抄録】MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data. © 2013 IEEE.
【抄録日本語訳】MapReduceが普及し、Reduce-side joinはMapReduceの最も重要なアプリケーションの1つとなっています。各Reduceタスクに割り当てられるデータ負荷がタスクごとに変動するデータスキューは、MapReduceのジョブ完了時間を増加させる。本論文では、MapReduceクラスタ上で動作する動的プロファイリングおよびフィードバックフレームワークを提案する。このフレームワークにより、プログラマはReduce側joinのデータスキューに対処するためのアルゴリズムを、プログラマ固有の知識／要件に基づいて構築することができる。また、本論文では、我々のフレームワークを広範囲のMapReduceクラスタサイズに適応させるための推定方法を提案する。本論文では、推定法を用いてデータスキューに対処する2つのアルゴリズム例を示し、50台のサーバを持つクラスタで、スキューの大きい入力データに対して結合完了時間を最大2.59倍に高速化する実験結果を示す。© 2013 IEEE.