Review from any client . All the files

Review :
Advance Distributed File System  

 

Snahil
Indoria

Department
of Computer Technology

JK
Lakshmipat University

Jaipur, India

[email protected]

Dishan Shukla

Department
of Computer and technology

JK
Lakshmipat University

Jaipur, India

[email protected]

 

 

Abstract—Networks of computers are
everywhere. The internet is one of the most common example of it likewise
distributed system is a network that consist of autonomous computer that are
connected through a distributed middleware. In this paper four distributed file
system architecture Google File System, Microsoft distributed file network
Andrew File System and Sun Network File System is reviewed on the basis of performance,
Scalability, Data Integrity, Security and heterogeneity for the better understanding
of different file system a comparative study is required.

Keywords— DFS,GFS,SUN,AFS,Google
File System ,Sun Network File System,Andrew File System .

                                                                                                                                                  
I.      Introduction 1

File System is referred to as file management and
sometimes abbreviated ad FS, A file system is a 
 method and data structure that an
operating system user to keep track of the files on a disk or partition, the
word is also refer to a partition or disk that is used to store the file or the
type of file system. A file is a collection of related information that is
recorded on secondary storage. Or file is a collection of logical related entities.
 File system usually consist of files
separated into groups called directories. There are many types of File system
which are commonly used to determine how data is accessed.

Distributed file System or DFS is a file system is a
client/server-based application that allows clients to access and process data
stored on the server as  if it were on
their own machine , when a user accessed a file on the server , the server
sends the user a copy of the file, which is cached on the user’s computer while
the data is being processed and then return to the server , a distributed file
system organizes files and directory services of individual servers into a
global directory in such a way that remote data access is not location-specific
but is identical from any client . All the files are requested by the by the user
are located at different system at different places globally whenever any user
request any service/file all the system simultaneously provide
information/service to the Client. Sharing of resources is the main motive of
the DFS.

A DFA operating system runs on multiple independent
computers, connected through communication network, but appears to its user as
a single virtual machine and runs its own os. Each computer node has its own
memory. Internet, Intranet, Mobile and ubiquitous computing are the come
examples of DFS. Fig__ show the Architecture of a distributed file system

                                                                                                                                             
II.    Literature
Review

 

Aditya B. Patel, Manashvi Birla, Ushma Nair,”Addressing
Big Data Problem Using Hadoop and Map Reduce”, NIRMA university international
conference on engineering, nuicone, 06-08december, 2012.2

 

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Google3

 

A REVIEW: Distributed File System International Journal of
Computer Networks and Communications Security VOL. 3, NO. 5, MAY 2015, 229–234  Shiva Asadianfam1, Mahboubeh Shamsi2 and
shahrad kashany34

                                                                                                                                 
III.   Distributed
File System5

A Distributed file system is a
client/server -based application that allows clients to access and process data
stored on the server as it is on their local node, when user accesses a file on
the server, the server sends the user a copy of the file , which is cached on
the user’s computer while the data is being processed and is then returned to
the server. The Distributed file system are the bedrock of distributed
computing in office/engineering environments.

 

       

Fig-I Architecture
of Distributed File System6

 

Features of Distributed file system7

 

v  Transparency8

Transparency
refers to hiding details from a user, there are three types of transparency

                                
i.           
Structure transparency

Multiple file
servers are used to provide better performance, scalability, and reliability.
The multiplicity of file servers should be transparent to the client of a
distributed file system

 

                              
ii.           
Access transparency

Local and
remote files should be accessible in the same way. The file system should
automatically locate an accessed file and transport it to the client’s site

                            
iii.           
Naming transparency

The name of the
file should not reveal the location of the file. The name of the file must not
be changed while moving from one node to another.

                            
iv.           
Replication transparency

The existence
of multiple copies and their locations should be hidden from the clients where
files are replicated on multiple nodes.

 

v  User
Mobility

The user is not
bounded to work on a specific node but should have the flexibility to work on
any given machine at different time.

 

v  Performance

Performance is
measured as the average amount of time needed to satisfy client requests, which
includes CPU time plus the time for accessing secondary storage along with
network access time. Explicit file placement decisions should not be needed to
increase the performance of a distributed file system.

 

v  Data
Integrity

Concurrent access
requests from multiple users who are competing to access the file must be
properly synchronized using some form of concurrency control mechanism. Atomic
transactions can also be provided to users by a file system for data integrity.

                                                                                                   
IV.   Characterstics
of Distributed File  system 9

 

v  Concurrency

It the
circumstances of happening two or more events at same time, how to handle the
sharing of resources between clients/ Execution of concurrent programs share
resources: ex web pages, files, etc. 

 

 

v  No
Global Clock

In a
distributed system, Computers are connected through network and have their own
clocks. Communication/sharing between programs is only through messages and
their coordination depends on time.

 

v  Independent
Failure

Each component of
a distributed system can fail independently, leaving other system unaffected

v  Fault
Tolerance.

Fault tolerance
is the property of the system that continue operating properly in the event of
failure.

 

v  Scalability

Scalability is
the capability of a system, network, or process to handle a growing amount of work,
or its potential to be enlarged to accommodate that growth.

 

v  Heterogeneity

Heterogeneity
computing refers to system which use more than one kind of processor or cores. These
systems gain performance or energy efficiency but not just by adding the same
type processors also by adding dissimilar co-processor.

 

 

v  Security

Security is one
of the most important principles, since security need to be pervasive through
the system, security system is normally placed in distributed system.

                                                                                                                                        
V.    Google
File System10

 

Google file system is a highly scalable,
distributed file system on expensive commodity hardware that provide fault
tolerance and high aggregate performance and it delivers high aggregate
performance to many clients.

The design has been driven by observation
of our application workloads, and technological environment, both current and anticipated,
that reflect a marked department from some earlier file system assumptions.
This has led to reexamine traditional choices and explore radically different
design points. The file system has successfully met the google storage platform
for the generation and processing of data. The largest cluster of data provides
hundred of terabytes of storage across thousand of disks on over a thousand
machines, and its concurrently accessed by hundreds of clients. GFS is one of
the most successful example of real-time application of distributed system.
With very high percentage of fault tolerance.

 

 

 

 

 

Fig-II Architecture
of Google File System11

A GFS cluster
consists of a single master and multiple chunk-servers and is accessed by
multiple clients. The basic analogy of GFS is master maintains the metadata,
client contact the master and retrieves the metadata about chunks that are
stored in chunk server next time, client directly contact to the chunk-server
Fig II is Describing the same. Each of these is typically a commodity Linux
machine running a user-level server process. Files are divided into fixed-size
chunks. Each chunk is identified by an immutable and globally unique 64 bit
chunk handle assigned by the master at the time of chunk creation. Chunk-servers
store chunks on local disks as Linux files and read or write chunk data
specified by a chunk handle and byte range. For reliability, each chunk is
replicated on multiple chunk-servers. By default, three replicas are stored,
though users can designate different replication levels for different regions
of the file namespace.

The master
maintains all file system metadata. This includes the namespace, access control
information, the mapping from files to chunks, and the current locations of
chunks. It also controls system-wide activities such as chunk lease management,
garbage collection of orphaned chunks, and chunk migration between chunk-servers.

The master
periodically communicates with each chunk-server in Heart-Beat messages to give
it instructions and collect its state. GFS client code linked into each
application implements the file system API and communicates with the master and
chunk-servers to read or write data on behalf of the application. Clients
interact with the master for metadata operations, but all data-bearing
communication goes directly to the chunk-servers. Neither the client nor the
chunk-server caches file data. Client caches offer little benefit because most
applications stream through huge files or have working sets too large to be
cached.

Not having them
simplifies the client and the overall system by eliminating cache coherence
issues. (Clients do cache metadata, however.) Chunk-servers need not cache file
data because chunks are stored as local files and so Linux’s buffer cache
already keeps frequently accessed data in memory.

 

                                                                                                                             
VI.   Sun
Network File System12

 

A network file system is a
remotely located file system on different networks it is a type of file
mechanism which provide the storage and retrieval of data and services form
multiple disk/nodes, NFA was initially developed by SUN Microsystem in the
1980s and now it is managed by the Internet engineering Task Force(IETF)
.Network file system versions 2 and 3 allows the user datagram protocol (UDP)
running over IP network to provide stateless network connection between client
and server but the current version of NFS require transmission control protocol13(TCP)

 

 

 

Fig-III Architecture of Sun Network  File System(Client Side)14

 

 

 

 

Fig-IV Architecture of Sun Network  File System(Server Side)15

                                                                                                               
VII.  Microsoft
distributed file system16

 

Distributed File System (DFS)
Namespaces and DFS Replication offer simplified, highly-available access to
files, load sharing, and WAN-friendly replication. In the Windows Server® 2003
R2 operating system, Microsoft revised and renamed DFS Namespaces (formerly
called DFS), replaced the Distributed File System snap-in with the DFS
Management snap-in, and introduced the new DFS Replication feature. In the
Windows Server® 2008 operating system, Microsoft added the Windows Server 2008
mode of domain-based namespaces

 

DFS Namespaces

Enables you to group shared
folders that are located on different servers into one or more logically
structured namespaces. Each namespace appears to users as a single shared
folder with a series of subfolders.

 

DFS Replication

DFS Replication is an efficient,
multiple-master replication engine that you can use to keep folders
synchronized between servers across limited bandwidth network connections. It
replaces the File Replication Service (FRS) as the replication engine for DFS
Namespaces

 

 

 

Fig-V Elements of Name-space 16

 

Namespace Server

 A namespace server hosts a namespace. The
namespace server can be a member server or a domain controller.

 

Folder Targets

 A folder target is the UNC path of a shared
folder or another namespace that is associated with a folder in a namespace.
The folder target is where data and content are stored. In the previous figure,
the folder named Tools has two folder targets, one in London and one in New
York, and

 

 the folder named Training Guides has a single
folder target in New York. A user who browser to
\ContosoPublicSoftwareTools is transparently redirected to the shared
folder \LDN-SVR-01Tools or \NYC-SVR-01Tools, depending on which site the
user is currently located in.

 

                                                                                                                                   
VIII. Andrew
File System17

 

Started as a joint effort of Carnegie Mellon
University and IBM_ today basis for DCE/DFS: the distributed file system
included in the Open Software Foundations’ Distributed Computing Environment some
UNIX file system usage observations, as pertaining to caching

 

Andrew file system (AFS) is a
location-independent file system that uses a local cache to reduce the workload
and increase the performance of a distributed computing environment. A first
request for data to a server from a workstation is satisfied by the server and
placed in a local cache. A second request for the same data is satisfied from
the local cache.

 

An AFS may be accessed from a
distributed environment or location independent platform. A user accesses an
AFS from a computer running any type of OS with Kerberos authentication and
single namespace features. Users share files and applications after logging
into machines that interact within the Distributed Computing Infrastructure
(DCI).18

 

 

 

Fig-VI Architecture of Andrew File System (Server Side)19

 

 

 
File System

 
Google File System
(GFS)

 
Sun Network File System
(SUN-NFS)

 
Microsoft Distributed System
(MSDN)

 
Andrew File System
(AFS)

 
Architecture

Clustered-based
asymmetric

Symmetric

 

Symmetric

 
Processes

State Full

State Full

State Full

State Full

 
Communication

RPC/TCP

RPC/TCP and in version
4 UPD

TCP

RPC/TCP

 
Scalability

Highly scalable

Highly scalable

Highly scalable

 
Synchronization

Write-once -read many

Read ahead delayed -write

Call Back Promise

 
Fault Tolerance

 
Failure as standard

 
Failure as standard

 
Failure as standard

 
Failure as standard

Table – I Comprasion of GFS,SUN-NFS,MSDN and AFS File System

 

 

 

Conclusion

 

In this paper the Distributed
File system of Distributed Google File system, Distributed Sun Network file
system, Distributed Microsoft file system and Distributed Andrew File System
review has been done based on the Special Features of the distributed file
system and their Comprasion is also done on the basis of implementation.

 

Acknowledgment

 

There are many people associated
the completion of this paper I would like to thank each one of them.

 

References

 

 

1     
http://searchstorage.techtarget.com/definition/file-system.

2     
Aditya B. Patel, Manashvi Birla, Ushma Nair,”Addressing
Big Data Problem Using Hadoop and Map Reduce”, NIRMA university international
conference on engineering, nuicone, 06-08december, 2012.

3     
Sandberg R., Goldberg
D., Kleiman S., Walsh D., Lyon B., “Design and Implementation of the Sun
Network Filesystem”..

4     
Ghemawat S., Gobioff H.,
Leung S., “The Google File System”..

5     
Dean J., Ghemawat S.,
“MapReduce: Simplified Data Processing on Large Clusters”, OSDI 2004. I.S. Jacobs and C.P. Bean, “Fine
particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G.T.
Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271-350.

6     
Keerthivasan M., “Review
of Distributed File Systems: Concepts and Case Studies”, ECE 677 Distributed
Computing Systems

 

7  Aditya B. Patel, Manashvi Birla, Ushma
Nair,”Addressing Big Data Problem Using Hadoop and Map Reduce”, NIRMA
university international conference on engineering,

8  Youwei Wang, Jiang Zhou,Can Ma,
WeipingWang, Dan Meng, Jason Kei , “Clover: A distributed file system of
expandable metadata service derived from HDFS”, IEEE

International Conference on Cluster Computing, DOI
10.1109/CLUSTER.54,pp