- mapred.InputPathProcessor: Executor shut down
On Friday, May 13, 2016 at 10:07:38 PM UTC-7, Xiao Zhou wrote:
I am using hdfs as output.
On Friday, May 13, 2016 at 5:48:51 PM UTC-7, Félix GV wrote:
Did you end up needing the other change I made for the mv operation? Or
are you using just HDFS for the output of the build now?
On Fri, May 13, 2016 at 16:23 Xiao Zhou <xiao...@gmail.com> wrote:
yes, I have opened a pull request
There are no more issues for s3 migration right now, Thanks for
On Friday, May 13, 2016 at 3:44:33 PM UTC-7, Félix GV wrote:
All right, great!
Do you think you could squash those three commits and create a formal
Pull Request for your change?
Also, are you in a state where things work end-to-end for you with
S3? Or are there any other issues?
On Fri, May 13, 2016 at 3:09 PM, Xiao Zhou <xiaoz...@gmail.com>
here is the pull request for my change for the input:
On Friday, May 13, 2016 at 1:20:44 PM UTC-7, Félix GV wrote:
These files are only kept on HDFS for the duration of the fetch.
After the Voldemort servers are finished fetching the data, the files are
deleted from HDFS.
I believe it is probably transient enough for you to leave them on
Once you're done, you should definitely write a blog post about any
and everything you needed to do to get Voldemort / BnP up and running on
Amazon. Very interesting stuff!
On Fri, May 13, 2016 at 1:06 PM, Xiao Zhou <xiaoz...@gmail.com>
Amazon recommended to use hdfs as temp file storage and s3 as
final file storage.
The reason is hdfs can go away if the cluster is power down.
We could write the output the hdfs and then distcp the files to s3
, which is not optimal but acceptable.
I was trying to find out if there is an easy way to write to s3
directly so we can remove an extra step. Seems it will be too much trouble
so we will just take the extra copy step. It can be done out of the normal
workflow so should not have too much impact.
Thanks for helping out.
On Friday, May 13, 2016 at 12:53:56 PM UTC-7, Félix GV wrote:
That sounds like a correct assessment. Although I'm not sure if
the right solution is to build the files in /tmp/ on HDFS, and then to copy
those files over to S3 afterwards. Why not build the files on S3 and copy
them to some other place on S3, or build them on HDFS and move them on HDFS?
The API I used does allow two filesystems to be specified, but I
think it may be complicated to get a hold of both of these FS from within
the HadoopStoreWriter class (not saying it's impossible, just that it may
Since you are apparently using HDFS anyway, would there be any
downside to setting your output path to be on HDFS as well? In which case,
you could use the regular mv operation which is already well-supported?
Thanks for your debugging effort. Let me know what you think.
On Fri, May 13, 2016 at 12:13 PM, Xiao Zhou <xiaoz...@gmail.com
Got this error on the else branch:
logger.info("Moving " + src + " to " + dest);
I think the issue is fs is created from the temp directory which
is on hdfs?
-tmp /tmp/rosb \
we probably don't want the temp to be on s3.
The condition if (fs.getScheme().toLowerCase().contains("s3"))
did not get invoked.
and we need to do a cross file system copy from hdfs to s3 in
that code , and the 2 fs in the FileUitl.copy probably need to be different?