A functional Journey

by Rewati Raman
HTML5 Icon

Hadoop Filesystem RemoteIterator To Scala Iterator

16 Mar 2017

While working with Hadoop filesystem java client, we come across org.apache.hadoop.fs.RemoteIterator . This is not compatible with Scala iterator. This prevent us from using Scala collection apis on top of org.apache.hadoop.fs.RemoteIterator. For example converting Iterator to List, or apply filter or map. Here is wrapper case class which can convert org.apache.hadoop.fs.RemoteIterator to Scala Iterator.

case class RemoteIteratorWrapper[T](underlying: RemoteIterator[T]) extends RemoteIteratorConvertor[T]

sealed trait RemoteIteratorConvertor[T] extends AbstractIterator[T] {
  val underlying: RemoteIterator[T]
  def hasNext = underlying hasNext
  def next = underlying next

Now given that we have hdfsPath (org.apache.hadoop.fs.Path) of hdfs location. We want to get the list of all path of only files no directory recursively.

val paths = RemoteIteratorWrapper(fileSystem .listFiles(hdfsPath,true)) .toList .filter(_.isFile) .map(_.getPath)