Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've successfully completed mahout vectorizing job on Amazon EMR (using Mahout on Elastic MapReduce as reference). Now I want to copy results from HDFS to S3 (to use it in future clustering).

For that I've used hadoop distcp:

den@aws:~$ elastic-mapreduce --jar s3://elasticmapreduce/samples/distcp/distcp.jar 
> --arg hdfs://my.bucket/prj1/seqfiles 
> --arg s3n://ACCESS_KEY:[email protected]/prj1/seqfiles 
> -j $JOBID

Failed. Found that suggestion: Use s3distcp Tried it also:

elastic-mapreduce --jobflow $JOBID 
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar 
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' 
> --arg --src --arg 'hdfs://my.bucket/prj1/seqfiles' 
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

In both cases I have the same error: java.net.UnknownHostException: unknown host: my.bucket
Below the full error output for the 2nd case.

2012-09-06 13:25:08,209 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system
java.net.UnknownHostException: unknown host: my.bucket
    at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1193)
    at org.apache.hadoop.ipc.Client.call(Client.java:1047)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:127)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:249)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:214)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1413)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:68)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1431)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:431)
    at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

I've found a bug:

  1. The main problem is not

    java.net.UnknownHostException: unknown host: my.bucket

but:

2012-09-06 13:27:33,909 FATAL com.amazon.external.elasticmapreduce.s3distcp.S3DistCp (main): Failed to get source file system

So. After adding 1 more slash in source path - job was started without problems. Correct command is:

elastic-mapreduce --jobflow $JOBID 
> --jar --arg s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar 
> --arg --s3Endpoint --arg 's3-eu-west-1.amazonaws.com' 
> --arg --src --arg 'hdfs:///my.bucket/prj1/seqfiles' 
> --arg --dest --arg 's3://my.bucket/prj1/seqfiles'

P.S. So. it is working. Job is correctly finished. I've successfully copied dir with 30Gb file.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...