文章目录
客户端操作环境搭建常见错误:客户端常用API1.文件常见API操作2.流操作-分块下载
客户端操作环境搭建
1.环境准备工作:
在windows使用java客户端调用HDFS集群时,会初始化一个本地的文件系统对象,期间会用到hadoop编译包的一些文件。所以我们需要将Hadoop安装包解压到某个目录,并配置HADOOP_HOME 环境变量同时在path中添加%HADOOP_HOME%/bin
需要注意,要选择对应版本的windows编译包。如果用Linux包(即集群中安装的hadoop包)解压,则需要手动添加winutils.exe和hadoop.dll到bin目录。
winutils对应版本下载链接如下,我用的是Hadoop3.1.3,版本列表没有,使用的3.1.2代替也生效。
https://github.com/cdarlint/winutils
2.新建maven工程:
添加依赖:依赖的版本可以和集群的hadoop版本不一致。
<dependencies
>
<dependency
>
<groupId
>junit
</groupId
>
<artifactId
>junit
</artifactId
>
<version
>RELEASE
</version
>
</dependency
>
<dependency
>--
>
<!--
<groupId
>org.apache.logging.log4j
</groupId
>--
>
<!--
<artifactId
>log4j-core
</artifactId
>--
>
<!--
<version
>2.8.2
</version
>--
>
</dependency
>
<dependency
>
<groupId
>org.apache.hadoop
</groupId
>
<artifactId
>hadoop-hdfs-client
</artifactId
>
<version
>3.2.0
</version
>
<scope
>provided
</scope
>
</dependency
>
<dependency
>
<groupId
>org.apache.hadoop
</groupId
>
<artifactId
>hadoop-hdfs
</artifactId
>
<version
>3.2.0
</version
>
</dependency
>
<dependency
>
<groupId
>org.apache.hadoop
</groupId
>
<artifactId
>hadoop-common
</artifactId
>
<version
>3.2.0
</version
>
</dependency
>
</dependencies
>
添加一个日志配置文件log4j.properties到resources目录:
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
3.新建测试类:
package com
.zd
.hadoop
;
import org
.apache
.hadoop
.conf
.Configuration
;
import org
.apache
.hadoop
.fs
.FileSystem
;
import org
.apache
.hadoop
.fs
.Path
;
import org
.junit
.Test
;
public class HadfsClient {
@Test
public void testHdfs(){
Configuration conf
= new Configuration();
conf
.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
conf
.set("fs.defaultFS", "hdfs://192.168.43.11:9000");
try {
FileSystem fs
= FileSystem
.get(conf
);
fs
.mkdirs(new Path("/zx/6222"));
fs
.close();
} catch (Exception e
) {
e
.printStackTrace();
}
}
}
执行后,在集群的hdfs便可以生成一个目录:
常见错误:
缺少wintuils.exe 和hadoop.dll
Exception in thread
"main" java
.lang
.RuntimeException
: java
.io
.FileNotFoundException
: Could not locate Hadoop executable
:
权限错误:
org
.apache
.hadoop
.security
.AccessControlException
: Permission denied
: user
=Administrator
, access
=WRITE
, inode
="/zx":xiaomao
:supergroup
:drwxr
-xr
-x
这是因为调用用户使用了windows的administrator,要采用集群中的用户,所以通过下面方式调用:指明用户是 xiaomao
FileSystem fs
= FileSystem
.get(new URI("hdfs://192.168.43.11:9000"), conf
, "xiaomao");
3.hadoop环境配置错误
java
.io
.FileNotFoundException
:HADOOP_HOME and hadoop
.home
.dir are unset
该错误是因为没有再windows本地配置hadoop,参加文章开头。
4.main方法错误
Exception in thread
"main" org
.apache
.hadoop
.fs
.UnsupportedFileSystemException
: No FileSystem
for scheme
"hdfs"
因为依赖的范围是provided,所以运行main方法,报错。
该错误还有一个可能的原因是,如果配置了maven-assembly-plugin插件,描述如下
Different JARs
(hadoop
-commons
for LocalFileSystem
, hadoop
-hdfs
for DistributedFileSystem
) each contain a different file called org
.apache
.hadoop
.fs
.FileSystem in their META
-INFO
/services directory
. This file lists the canonical classnames of the filesystem implementations they want to declare
(This is called a Service Provider Interface
implemented via java
.util
.ServiceLoader
, see org
.apache
.hadoop
.FileSystem line
2622).
When we use maven
-assembly
-plugin
, it merges all our JARs into one
, and all META
-INFO
/services
/org
.apache
.hadoop
.fs
.FileSystem overwrite each
-other
. Only one of these files remains
(the last one that was added
). In
this case, the FileSystem list from hadoop
-commons overwrites the list from hadoop
-hdfs
, so DistributedFileSystem was no longer declared
.
这是需要这样修复:
hadoopConfig
.set("fs.hdfs.impl",
org
.apache
.hadoop
.hdfs
.DistributedFileSystem
.class.getName()
);
hadoopConfig
.set("fs.file.impl",
org
.apache
.hadoop
.fs
.LocalFileSystem
.class.getName()
或者在core-site.xml 添加如下:
<property
>
<name
>fs.file.impl
</name
>
<value
>org.apache.hadoop.fs.LocalFileSystem
</value
>
<description
>The FileSystem
for file: uris.
</description
>
</property
>
<property
>
<name
>fs.hdfs.impl
</name
>
<value
>org.apache.hadoop.hdfs.DistributedFileSystem
</value
>
<description
>The FileSystem
for hdfs: uris.
</description
>
</property
>
客户端常用API
完整代码github
1.文件常见API操作
package com
.zd
.hadoop
;
import org
.apache
.hadoop
.conf
.Configuration
;
import org
.apache
.hadoop
.fs
.*
;
import org
.apache
.hadoop
.fs
.permission
.FsPermission
;
import org
.junit
.Test
;
import java
.net
.URI
;
public class HDFSClientAPI {
private String url
= "hdfs://192.168.43.11:9000";
private String user
= "xiaomao";
public FileSystem
createCommonFS() {
try {
Configuration conf
= new Configuration();
FileSystem fs
= FileSystem
.get(new URI(url
), conf
, user
);
return fs
;
} catch (Exception e
) {
e
.printStackTrace();
}
return null
;
}
@Test
public void mkdirs() throws Exception
{
FileSystem fs
= createCommonFS();
fs
.mkdirs(new Path("/zx/7117"));
fs
.close();
}
@Test
public void delete() throws Exception
{
FileSystem fs
= createCommonFS();
fs
.delete(new Path("/zx/7117"),true);
fs
.close();
}
@Test
public void copyFromLocalFile() throws Exception
{
FileSystem fs
= createCommonFS();
fs
.copyFromLocalFile(new Path("f:/hdfs.txt"),new Path("/zx/file/"));
}
@Test
public void copyToLocalFile() throws Exception
{
FileSystem fs
= createCommonFS();
fs
.copyToLocalFile(new Path("/zx/file/"),new Path("f:/"));
}
@Test
public void rename() throws Exception
{
FileSystem fs
= createCommonFS();
fs
.rename(new Path("/zx/file"),new Path("/zx/files"));
}
@Test
public void listFiles() throws Exception
{
FileSystem fs
= createCommonFS();
RemoteIterator
<LocatedFileStatus> listFiles
= fs
.listFiles(new Path("/"),true);
while(listFiles
.hasNext()){
LocatedFileStatus fileStatus
= listFiles
.next();
System
.out
.println(fileStatus
.isFile()?"文件":"文件夹");
System
.out
.println(fileStatus
.getPath().getName());
System
.out
.println(fileStatus
.getPermission());
System
.out
.println(fileStatus
.getLen());
BlockLocation
[] blockLocations
= fileStatus
.getBlockLocations();
for(BlockLocation blockLocation
:blockLocations
){
String
[] hosts
= blockLocation
.getHosts();
for(String host
:hosts
){
System
.out
.println(host
);
}
}
}
fs
.close();
}
@Test
public void isFile() throws Exception
{
FileSystem fs
= createCommonFS();
FileStatus
[] listStatus
= fs
.listStatus(new Path("/"));
for (FileStatus fileStatus
: listStatus
) {
if (fileStatus
.isFile()) {
System
.out
.println("f:"+fileStatus
.getPath().getName());
}else{
System
.out
.println("d:"+fileStatus
.getPath().getName());
}
}
fs
.close();
}
@Test
public void testEnv(){
System
.out
.println(System
.getenv("HADOOP_HOME"));
}
}
2.流操作-分块下载
package com
.zd
.hadoop
;
import java
.io
.File
;
import java
.io
.FileInputStream
;
import java
.io
.FileOutputStream
;
import java
.io
.IOException
;
import java
.net
.URI
;
import java
.net
.URISyntaxException
;
import org
.apache
.hadoop
.conf
.Configuration
;
import org
.apache
.hadoop
.fs
.FSDataInputStream
;
import org
.apache
.hadoop
.fs
.FSDataOutputStream
;
import org
.apache
.hadoop
.fs
.FileSystem
;
import org
.apache
.hadoop
.fs
.Path
;
import org
.apache
.hadoop
.io
.IOUtils
;
import org
.junit
.Test
;
public class HDFSIO {
@Test
public void putFileToHDFS() throws IOException
, InterruptedException
, URISyntaxException
{
Configuration conf
= new Configuration();
FileSystem fs
= FileSystem
.get(new URI("hdfs://hadoop101:9000"), conf
, "xiaomao");
FileInputStream fis
= new FileInputStream(new File("f:/测试文件.docx"));
FSDataOutputStream fos
= fs
.create(new Path("/zx/622/测试文件.docx"));
IOUtils
.copyBytes(fis
, fos
, conf
);
IOUtils
.closeStream(fos
);
IOUtils
.closeStream(fis
);
fs
.close();
}
@Test
public void getFileFromHDFS() throws IOException
, InterruptedException
, URISyntaxException
{
Configuration conf
= new Configuration();
FileSystem fs
= FileSystem
.get(new URI("hdfs://hadoop101:9000"), conf
, "xiaomao");
FSDataInputStream fis
= fs
.open(new Path("/zx/622/测试文件.docx"));
FileOutputStream fos
= new FileOutputStream(new File("f:/download.docx"));
IOUtils
.copyBytes(fis
, fos
, conf
);
IOUtils
.closeStream(fos
);
IOUtils
.closeStream(fis
);
fs
.close();
}
@Test
public void readFileSeek1() throws IOException
, InterruptedException
, URISyntaxException
{
Configuration conf
= new Configuration();
FileSystem fs
= FileSystem
.get(new URI("hdfs://hadoop101:9000"), conf
, "xiaomao");
FSDataInputStream fis
= fs
.open(new Path("/zx/622/测试文件.docx"));
FileOutputStream fos
= new FileOutputStream(new File("f:/测试文件.docx.part1"));
byte[] buf
= new byte[1024];
for (int i
= 0; i
< 1024 * 20; i
++) {
fis
.read(buf
);
fos
.write(buf
);
}
IOUtils
.closeStream(fos
);
IOUtils
.closeStream(fis
);
fs
.close();
}
@SuppressWarnings("resource")
@Test
public void readFileSeek2() throws IOException
, InterruptedException
, URISyntaxException
{
Configuration conf
= new Configuration();
FileSystem fs
= FileSystem
.get(new URI("hdfs://hadoop101:9000"), conf
, "xiaomao");
FSDataInputStream fis
= fs
.open(new Path("/zx/622/测试文件.docx"));
fis
.seek(1024*1024*20);
FileOutputStream fos
= new FileOutputStream(new File("f:/测试文件.docx.part2"));
IOUtils
.copyBytes(fis
, fos
, conf
);
IOUtils
.closeStream(fos
);
IOUtils
.closeStream(fis
);
fs
.close();
}
对分块下载的文件,在windows下,进入命令行窗口,用下面的命令合并
type 测试文件.docx.part2
>>测试文件.docx.part1
修改后缀后,便可以正常打开。
附录:
scope依赖范围:
compile (编译范围)
compile是默认的范围;如果没有提供一个范围,那该依赖的范围就是编译范围。编译范围依赖在所有的classpath 中可用,同时它们也会被打包。
provided (已提供范围)
provided 依赖只有在当JDK 或者一个容器已提供该依赖之后才使用。例如, 如果你开发了一个web 应用,你可能在编译 classpath 中需要可用的Servlet API 来编译一个servlet,但是你不会想要在打包好的WAR 中包含这个Servlet API;这个Servlet API JAR 由你的应用服务器或者servlet 容器提供。已提供范围的依赖在编译classpath (不是运行时)可用。它们不是传递性的,也不会被打包。
runtime (运行时范围)
runtime 依赖在运行和测试系统的时候需要,但在编译的时候不需要。比如,你可能在编译的时候只需要JDBC API JAR,而只有在运行的时候才需要JDBC 驱动实现。
test (测试范围)
test范围依赖 在一般的编译和运行时都不需要,它们只有在测试编译和测试运行阶段可用。
system (系统范围)
system范围依赖与provided 类似,但是你必须显式的提供一个对于本地系统中JAR 文件的路径。这么做是为了允许基于本地对象编译,而这些对象是系统类库的一部分。这样的构件应该是一直可用的,Maven 也不会在仓库中去寻找它。如果你将一个依赖范围设置成系统范围,你必须同时提供一个 systemPath 元素。注意该范围是不推荐使用的(你应该一直尽量去从公共或定制的 Maven 仓库中引用依赖)。