分布式系统(Distributed System) 01:Introduction

墨尔本大学 COMP90015 课程笔记


Posted by Yuyao on November 26, 2020

Lecture 01:Introduction

1. Introduction

  • Networks of computers are everywhere!
    1. Mobile phone networks – GSM/3G networks, cell towers connect to backbones
    2. Corporate networks(企业网络)– LAN, consisting of PC’s, laptops, servers, storage, printing
    3. Factory networks – Machines (drills, stamps, robots) communicate over network
    4. Campus networks – Like CSSE Dept / Unimelb. Wired/wireless, clients, servers
    5. In-car networks – Network of sensors (monitoring engine, breaks, tires etc)
    6. Internet of Things (IoT)
    7. On board networks in planes and trains
  • This subject aims:
    1. to cover characteristics of networked/distributed computing systems and applications
    2. to present the main concepts and techniques that have been developed to help in the tasks of designing and implementing systems and applications that are based on networks.
  • Node meaning in DS
    • the nodes are clients, servers or peers. A peer may sometimes serve as client, sometimes server.

2. Defining Distributed Systems


“A system in which hardware or software components(组件) located at networked computers communicate and coordinate their actions only by message passing.” ———— Coulouris

“A distributed system is a collection of independent computers that appear to the users of the system as a single computer.” ———— Tanenbaum

“A distributed system is one on which I cannot get any work done because some machine I have never heard of has crashed.“ ———— Leslie Lamport

  • Example Distributed Systems:
    • Cluster:

      “A type of parallel(平行的) or distributed processing system, which consists of a collection of interconnected stand-alone(独立的) computers cooperatively working together as a single, integrated(集成的) computing resource” ———— Buyya (本课老师 Buyya 自己的定义)

    • Cloud:

      “a type of parallel and distributed system consisting of a collection of interconnected and virtualised computers that are dynamically provisioned(配置) and presented as one or more unified(统一的) computing resources based on service-level agreements established through negotiation between the service provider and consumers” ———— Buyya (本课老师 Buyya 自己的定义)

3. Networks vs. Distributed Systems

  • Networks: A media for interconnecting local and wide area computers and exchange messages based on protocols(协议). Network entities(网络实体) are visible and they are explicitly(明确地) addressed (IP address).
    • These protocols are generally ‘well defined’, such as TCP/UDP over IP, IPX, NETBIOS.
    • In DS, much of the complexity is handled by the network layer, however we cannot assume everything is 100% reliable.
    • In DS, we operate at a higher level, but we must be mindful of the failures that can and will occur on Computer Networks.
  • Distributed System: existence of multiple autonomous computers is transparent
  • Many problems (e.g., openness, reliability) in common, but at different levels.
    1. Networks focuses on packets(包), routing(路由), etc., whereas distributed systems focus on applications.
    2. Every distributed system relies on services provided by a computer network.

4. Reasons for Distributed Systems

  1. Functional Separation(分离)(功能分离):
    • Existence of computers with different capabilities and purposes: - Clients and Servers - Data collection and data processing - Beneficial to separate collection from processing.
  2. Inherent(固有) distribution:
    • Information: - Different information is created and maintained by different people (e.g., Web pages)
    • People: - Computer supported collaborative work (virtual teams, engineering, virtual surgery(手术))
    • Retail store and inventory(库存) systems for supermarket chains (e.g., Coles, Woolworths) - The process/activity is naturally distributed
  3. Power imbalance and load variation(变化):
    • Distribute computational load among different computers.
    • We are forced to use DS as one system cannot handle!
  4. Reliability: - Long term preservation(保存) and data backup(备份) (replication(复制)) at different locations. - Government reg. may force us to retain data for 10 years
  5. Economies: - Sharing a printer by many users and reduce the cost of ownership. - Building a supercomputer out of a network of computers.
    • Share resources, statistical multiplexing

5. Consequences(意义) of Distributed Systems

Computers in distributed systems may be on separate continents(大洲,大陆), in the same building, or the same room. DSs have the following consequences(意义): Consequencies are problems that need solving when we choose DS:

  1. Concurrency(并发) – each system is autonomous.
    • Carry out tasks independently
    • Tasks coordinate their actions by exchanging messages.
  2. Heterogeneity(异质性)
    • Heterogeneity (that is, variety and difference) applies to all of the following: - Hardware devices: computers, tablets, mobile phones, embedded(嵌入式的) devices, etc. - Operating System: Ms Windows, Linux, Mac, Unix, etc. - Network: Local network, the Internet, wireless network, satellite(卫星) links, etc. - Programming languages: Java, C/C++, Python, PHP, etc. - Different roles of software developers, designers, system managers

      For example, In a distributed system, some computers can have linux os while others can have windows. Similarly, some can have intel processor whereas others may have arm processors. They are referred to OS level and processor level heterogeneity.

  3. No global clock

A logical clock is a mechanism(机制) for capturing chronological(时序的) and causal(因果顺序的) relationships in a distributed system. Distributed systems may have no physically synchronous(同步的) global clock, so a logical clock allows global ordering on events from different processes in such systems.

  • In a distributed system there are as many clocks as there are systems. The clocks are coordinated to keep them somewhat consistent but no one clock has the exact time. Even if the clocks were some what in sync, the individual clocks on each component may run at a different rate or granularity(粒度) leading to them being out of sync only after one local clock cycle
  • Time is only known within a given precision(精度). At frequent intervals(间隔), a clock may synchronize with a more trusted clock. However, the clocks are not precisely the same because of time lapses(失误) due to transmission(传输) and execution(执行).
  • Consider a group of people going to a meeting. Each person has a watch. Each watch has a similar, but different time. Even with the error in time, the group is able to meet and conduct business. This is how distributed time works.
  • This is in contrast(对比) to a clock on a single system. Here there is only one clock and it provides a unified time for all sub components on this individual system.
  1. Independent Failures

More detailed and useful informtion to this link

6. Characteristics of Distributed Systems

  1. Parallel(并行的) activities

    Autonomous components executing concurrent tasks

    • Participants can execute their own tasks in parallel, with little or no synchronisation
  2. Communication via message passing

    No shared memory

    • With wide scale DS, there is no shared memory space to use as a ‘blackboard’. Instead, messages must be passed .
  3. Resource sharing

    Printer, database, other services

    • We need to give ‘fair’ access to many users
  4. No global state

    No single process can have knowledge of the current global state of the system

    • See message passing, can’t know all like on single PC
  5. No global clock

    Only limited precision for processes to synchronize their clocks 只有有限的精度用于同步其时钟

    • Can assume millesecond accuracy at best

7. Goals of Distributed Systems

  1. Connecting Users and Resources

    Ultimately we want our system made widely availible.

  2. Transparency

    User should be unaware of complexity behind system, access it almost like local system.

  3. Openness

    Use standards (like http/ftp, w3 standards) so anyone can interface with, understand and extend your system

  4. Scalability

    Having near linear scaling is the ultimate goal, add capacity to deal with growth

  5. Enhanced Availability

    Enhanced availability means the distributed system will always be available for access regardless of your location and situation.

More detailed and useful information to this link

7. Differentiation with parallel systems

  1. Multiprocessor(多处理器) systems
    1. Shared memory
    2. Bus-based interconnection network
    3. E.g. SMPs (symmetric(对称的) multiprocessors) with two or more CPUs
  2. Multicomputer systems / Clusters
    1. No shared memory
    2. Homogeneous(同质的) in hard- and software
      1. Massively Parallel Processors (MPP)
        • Tightly(紧紧地) coupled high-speed network(紧密耦合的高速网络)
      2. PC/Workstation clusters
        • High-speed networks/switches-based connection.
  3. Differentiation with parallel systems is blurring(模糊的)
    1. Extensibility(可扩展性) of clusters leads to heterogeneity(异质性)

      Adding additional nodes as requirements grow

    2. Extending clusters to include user desktops by harnessing(利用) their idle(空闲的) resources

      E.g., SETI@Home, Folding@Home

    3. Leading to the rapid convergence(趋于一致) of various concepts of parallel and distributed systems

8. Examples of Distributed Systems

  1. They (DS) are based on familiar and widely used computer networks:
    1. Internet
    2. Intranets(企业内部网)
    3. Wireless networks
  2. Example DS and its Applications:
    1. Web (and many of its applications like Online bookshop)
    2. Data Centers and Clouds
    3. Wide area storage systems
    4. Banking Systems
    5. User-level communication (Facebook, Zoom)

9. Challenges (Important part)



An example: Online bookstore (e.g. in World Wide Web)

  1. Heterogeneity
    1. Your customer uses a completely different hardware? (PC, MAC, iPad, Mobile…)
    2. … a different operating system? (Windows, Unix,…)
    3. … a different way of representing data? (ASCII, EBCDIC,…)
  2. Distribution transparency
    1. You want to move your business and computers to the Caribbean (because of the weather or low tax)?
    2. Your client moves to the Caribbean (more likely)?
  3. Concurrency

    Two customers want to order the same item at the same time?

  4. Fault tolerance(容错)
    1. The database with your inventory(库存) information crashes(崩溃)?
    2. Your customer’s computer crashes in the middle of an order?
  5. Security
    1. Someone tries to break into your system to steal data?
    2. … sniffs (嗅探)for information?
    3. … your customer orders something and doesn’t accept the delivery saying he didn’t?
  6. Scalability

    You are so successful that millions of people are visiting your online store at the same time?

  7. Reuse and Openness (Standards)
    1. Do you want to write the whole software on your own (network, database,…)?
    2. What about updates, new technologies?


Characteristics of Distributed Systems

  1. Heterogeneity(异质性)

    • Heterogeneous components must be able to interoperate cross different:

      1. Operating systems
        • Eg. Windows, MacOS, Linux, BSD
      2. Hardware architectures
        • Eg. PowerPC, Sparc, ARM
      3. Communication architectures
      4. Programming languages - Eg. C/C++, Java, Perl, Python, Ruby, etc 5. Software interfaces(界面,接口) - Eg. Common interfaces are needed (RPC, WS) 6. Security measures - Eg. Need to talk same language (3DES, Blowfish, etc) 7. Information representation
    • Solution: - Big/Little Endian, other platform differences have mostly been solved thanks to new DS protocols, TCP/IP, WS, Corba etc

  2. Distribution transparency
    • Distribution should be hidden from the user as much as possible
      • To hide from the user and the application programmer the separation/distribution of components, so that the system is perceived(被看做) as a whole rather than a collection of independent components.
      • ISO Reference Model for Open Distributed Processing (ODP) identifies the following forms of transparencies:
        1. Access transparency
        • Access to local or remote resources is identical(相同的)
        • E.g. Network File System / Dropbox
        • Solution
          • Most modern OS provide this for us. Once remote FS is mounted, we work with it like a local drive
            1. Location transparency
        • Access without knowledge of location
        • E.g. separation of domain name from machine address.
        • Solution
          • DNS gives us this. If server moves and changes IP, DNS will always point to current location (with some limits)
            1. Failure transparency - Tasks can be completed despite failures - E.g. message retransmission, failure of a Web server node should not bring down the website. - Solution - Can be solved by good design - message retransmission, hot spares, self healing architectures.
            2. Replication(复现,复制) transparency - Access to replicated resources as if there was just one(访问复制的资源就像只有一个资源一样). And provide enhanced reliability(可靠性) and performance without knowledge of the replicas(复制品) by users or application programmers. - Solution - DNS can achieve this. Eg: google.com goes to {host1.google.com, host2.google.com, …hostn.google.com, etc}. User is unaware they have been redirected.
            3. Migration (mobility(流动性)/relocation) transparency - Allow the movement of resources and clients within a system without affecting the operation of users or applications. - E.g. switching from one name server to another at runtime; migration of an agent/process from one node to another. - Solution - Server migration is well covered by DNS. DS Services should be created to not be confused when client rapidly changes location (e.g. 3g user, or wifi changing hotspots).
            4. Concurrency transparency - A process should not notice that there are other sharing the same resources - Solution - The fact that many users are competing for one resource can be hidden by good scheduling
            5. Performance transparency: - Allows the system to be reconfigured to improve performance as loads vary - E.g., dynamic addition/deletion(删除) of components, switching from linear structures to hierarchical(分层的) structures when the number of users increase - Solution - Inter-related, DS should be agile enough to expand when load increases and contract during quiet times. - This requires careful initial design.
            6. Scaling transparency: - Allows the system and applications to expand in scale without changes in the system structure or the application algorithms.
            7. Application level transparencies:
        • Persistence(持久) transparency
          • Masks the deactivation and reactivation of an object
        • Transaction transparency
          • Hides the coordination required to satisfy the transactional properties of operations
        • Solution
          • Persistance for end users, they should not need to login constantly.
          • Transactions should happen without undue delay
    • Solution
      • We can utilise services like DNS and Hyperlinks to mask the fact that many machines are involved in DS.
  3. Fault tolerance
    • Failure of a component (partial failure) should not result in failure of the whole system
      • Failure: an offered service no longer complies with(遵守) its specification (e.g., no longer available or very slow to be usable)
      • Fault: cause of a failure (e.g. crash of a component)
      • Fault tolerance: no failure despite faults


        i.e., programmed to handle failures and hides them from users.

      • Fault Tolerance Mechanisms (How to acieve Fault tolerance)
        • Fault detection
          • Checksums, heartbeat, …
        • Fault masking
          • Retransmission(重传) of corrupted(损坏的) messages, redundancy(冗余), …
            • TCP handles retransmission, we can add our own application level fault masking for better reliability.
          • Fault toleration
            • Exception handling, timeouts,…
            • Handle faults, delays gracefully (优雅得).
          • Fault recovery
            • Rollback(回滚) mechanisms,
            • If something bad DOES happen, hopefully we know about it (!!) and we can roll back to a known good state.
    • Solution
      • Can be achieved via careful programming and assumptions, replicated (eg fall-over) servers for backup.
  4. Scalability(可伸缩性)
    1. System should work efficiently with an increasing number of users
    2. System performance should increase with inclusion(包含) of additional resources
      • System should work efficiently at many different scales, ranging from a small Intranet to the Internet
      • Remains effective when there is a significant increase in the number of resources and the number of users
      • Challenges of designing scalable distributed systems:
        • Cost of physical resources
          • Cost should linearly increase with system size
        • Performance Loss
          • For example, in hierarchically(分层) structure data, search performance loss due to data growth should not be beyond O(log n), where n is the size of data
        • Preventing software resources running out:
          • Numbers used to represent Internet addresses (32 bit->64bit)
          • Y2K-like problems
        • Avoiding performance bottlenecks(瓶颈):
          • Use of decentralized(去中心化的) algorithms (centralized DNS to decentralized)
    3. Scalability can be difficult to achieve depending on application domain.


    • Services can receive huge increases in traffic (e.g. when featured on slashdot/digg, etc).

      服务可以收到巨大的流量增加(例如,当在斜线/digg 等上提供功能时)。

    • We want to be able to add commodity (cheap) servers to increase cap.
    • Static file serving: Add more servers, adjust DNS
    • Dynamic web page serving: Add more servers, replicate and synchronise (cluster) back-end DB, adjust DNS
    • Solution
      • Smart load distribution / balancing, good scalable design.
  5. Concurrency

    Shared access to resources must be possible

    • Provide and manage concurrent access to shared resources:
      • Fair scheduling (公平调度)
        • Fair scheduling ensures everyone gets a turn on a popular resource
      • Preserve(保存) dependencies(依赖) (e.g. distributed transactions – buy a book using Credit card, make sure user has sufficient funds prior to finalizing(敲定) order )
      • Avoid deadlocks(死锁)
        • Deadlocks can occur when two users hold a resource, and they are waiting for each other to release their respective resource.

        A deadlock is a situation in which two computer programs sharing the same resource are effectively preventing each other from accessing the resource, resulting in both programs ceasing (停止) to function. The earliest computer operating systems ran only one program at a time.

    • Solution
      • Access to a shared resource must be synchronised so that data remains consistent (I.e. via semaphores/locks). Most modern programming languages (like Java) do the hard work for us, so no excuses!!
  6. Openness and Interoperability(互联互通)

    Interfaces should be publicly available to ease(简化) inclusion of new components

    • Open system:”… a system that implements sufficient open specifications for interfaces, services, and supporting formats to enable properly engineered applications software to be ported across a wide range of systems with minimal changes, to interoperate with other applications on local and remote systems, and to interact with users in a style which facilitates(促进) user portability” (POSIX Open Systems Environment, IEEE POSIX 1003.0)

    • Open spec/standard developers - communities:
      • ANSI, IETF, W3C, ISO, IEEE, OMG, Trade associations,..
    • Solution:
      • Open standards, open API’s, open message protocols
  7. Security

Resources are accessible to authorized users and used in the way they are intended

  • Confidentiality(保密性)
    • Protection against disclosure(泄露) to unauthorized individual information
    • Solution
      • E.g. ACLs (access control lists) to provide authorized access to information,ACL ensures that only certain people can access service, and can be restricted to certain locations (e.g. IP’s, subnets)
  • Integrity(完整性)
    • Protection against alteration(修改) or corruption(损坏)
    • E.g. changing the account number or amount value in a money order
    • Solution
      • We can encrypt data transferred, and/or perform checksums/hash to verify integrity of message.
  • Availability
    • Protection against interference(干扰) targeting access to the resources.
    • E.g. denial of service (DoS, DDoS) attacks

      A distributed denial-of-service (DDoS) attack occurs when multiple systems flood the bandwidth(带宽) or resources of a targeted system, usually one or more web servers. Such an attack is often the result of multiple compromised systems (for example, a botnet) flooding the targeted system with traffic.

      拒绝服务攻击(英语:denial-of-service attack,简称DoS攻击)亦称洪水攻击,是一种网络攻击手法,其目的在于使目标电脑的网络或系统资源耗尽,使服务暂时中断或停止,导致其正常用户无法访问。当黑客使用网络上两个或以上被攻陷的电脑作为“僵尸”向特定的目标发动“拒绝服务”式攻击时,称为分布式拒绝服务攻击(distributed denial-of-service attack,简称DDoS攻击)

  • Non-repudiation
    • Non-repudiation is the assurance that someone cannot deny the validity of something. Non-repudiation is a legal concept that is widely used in information security and refers to a service, which provides proof of the origin of data and the integrity of the data.
    • 不否认是某人不能否认某物有效性的保证。不否认是一个在信息安全中广泛使用的法律概念,它是指一种服务,它提供数据来源和数据完整性的证据。
    • Proof of sending / receiving an information
    • Solution
      • E.g. digital signature
    • Security Mechanisms
    • Encryption(加密)
      • E.g. Blowfish, RSA
    • Authentication(认证)
      • E.g. password, public key authentication
      • Authentication can be via standard user/pass or public key auth
    • Authorization(授权)
      • E.g. access control lists
      • Access Control Lists can control who can access a service, which parts of the service they can use and where they access it from.
  • Solution
    • Good secure design, combining encryption, authentication and access control.

10. Summary

  • Distributed Systems are everywhere
  • Internet enables users throughout the world to access its (application) services from anywhere
  • Resource sharing is the main motivating factor for constructing distributed systems
  • Construction of DS produces many challenges:
    • Heterogeneity, Openness, Security, Scalability, Failure handling, Concurrency, and Transparency
  • Distributed systems enable globalization:
    • Community (Virtual teams, organizations, social networks)
    • Science (e-Science)
    • Business (..e-Banking..)
    • Entertainment (YouTube, e-Friends)

11. Q&A from the tutorial

  1. Provide a definition of a Distributed System

    1. A system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing message ———— Coulouris
    2. A collection of independent computers that appears to its users as a single coherent system ———— Tanenbaum
  2. Briefly explain the difference between a computer network and a distributed system.

    A Computer Network: Is a collection of spatially separated (空间分隔得), interconnected computers that exchange messages based on specific protocols. Computers are addressed by IP addresses.

    A Distributed System: Multiple computers on the network working together as a system. The spatial separation of computers and communication aspects are hidden from users.

  3. List reasons for using a distributed system.

    • Economy (cost effective)
    • Reliability (fault tolerance)
    • Availability (high uptime)
    • Scalability (extendible)
    • Functional Separation (Modularity(模块化))

    • The main motivation to build and use distributed systems is Resource Sharing
      • Hardware Resources (Disks, printers, scanners etc.)
      • Software Resources (Files, databases etc)
      • Other (Processing power, memory, bandwidth)
  4. Briefly explain four consequences when using distributed systems, i.e. issues that arise that are not present otherwise.

    • Concurrency
    • Heterogeneity
    • No Global Clock
    • Independent Failures
  5. What are the things to consider when building a distributed system

    • Geography: Will this system be global, or will it run in “silos” per region?

    • Data segregation(数据隔离): Will this system offer a single- or multi-tenancy model?

    • SLAs: Availability, latency(延迟), throughput(吞吐量), consistency, and durability guarantees(耐久性保证) must all be defined.

    • Security: IAAA (identity, authentication, authorization, and audit(审核)), data confidentiality(保密性), and privacy must all be considered.

    • Usage tracking(使用情况跟踪): Understanding usage of the system is necessary for at minimum day-to-day operations of the system, as well as capacity planning. It may also be used to perform billing for use of the system and/or governance (quota/rate limits).

    • Deployment and configuration management: How will updates to the system be deployed?

知识共享许可协议本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。 欢迎转载,并请注明来自:Yuyao 的博客 同时保持文章内容的完整和以上声明信息!