【译】Brave Clojure 第九章:并发与并行编程

本文是我对Clojure书籍 CLOJURE FOR THE BRAVE AND TRUE第九章The Sacred Art of Concurrent and Parallel Programming 做的翻译。翻译形式,中英对照,英文引用跟着中文翻译。如有错误,在所难免,欢迎指正。

其他章的翻译在这里

译文开始。


If I were the lord of a manor and you were my heir, I would sit you down on your 13th name day and tell you, “The world of computing is changing, lass, and ye must be prepared for the new world of multi-core processors lest ye be trampled by it.

“Listen well: In recent years, CPU clock speeds have barely increased, but dual-core and quad-core computers have become common. The laws of physics are cruel and absolute, and they demand that increasing clock speed requires exponentially more power. The realm’s best engineers are unlikely to overcome this limitation anytime soon, if ever. Therefore, you can expect the trend of increasing cores on a single machine to continue—as will the expectation that you as a programmer will know how to make the most of modern hardware.

“Learning to program in this new paradigm will be fun and fascinating, verily. But beware: it is also fraught with peril. You must learn concurrent and parallel programming, which is the sacred art of structuring your application to safely manage multiple, simultaneously executing tasks.

“You begin your instruction in this art with an overview of concurrency and parallelism concepts. You’ll then study the three goblins that harry every practitioner: reference cells, mutual exclusion, and dwarven berserkers. And you’ll learn three tools that will aid you: futures, promises, and delays.”

And then I’d tap you on the shoulder with a keyboard, signaling that you were ready to begin.

这章先讲并发与并行的概念。然后学习每个从业者都会遇到的三个问题:引用单元(reference cells),互斥(mutual exclusion),矮人狂暴者问题(dwarven berserkers)。然后学习三个辅助工具:未来(futures),承诺(promises),延期(delays)。

Concurrency and Parallelism Concepts

并发与并行概念

Concurrent and parallel programming involves a lot of messy details at all levels of program execution, from the hardware to the operating system to programming language libraries to the code that springs from your heart and lands in your editor. But before you worry your head with any of those details, in this section I’ll walk through the high-level concepts that surround concurrency and parallelism.

并发与并行编程在程序执行的所有层次上都涉及大量乱七八糟的细节:从硬件到操作系统,到编程语言库,到你编写代码。在考虑这些细节前,先领略一下并发与并行的高层次概念。

Managing Multiple Tasks vs. Executing Tasks Simultaneously

管理多任务与同时执行多任务

Concurrency refers to managing more than one task at the same time. Task just means “something that needs to get done,” and it doesn’t imply anything regarding implementation in your hardware or software. We can illustrate concurrency with the song “Telephone” by Lady Gaga. Gaga sings,
I will put down this drink to text you, then put my phone away and continue drinking, eh

并发 指同时管理超过一个任务。任务 指“需要完成的事情”,它不意味着任何软件或硬件的相关实现。我们用Lady Gaga的歌“Telephone”演示并发。她唱到:

我将放下饮料,给你发短信,然后放下电话,继续喝饮料。

In this hypothetical universe, Lady Gaga is managing two tasks: drinking and texting. However, she is not executing both tasks at the same time. Instead, she’s switching between the two, or interleaving. Note that, while interleaving, you don’t have to fully complete a task before switching: Gaga could type one word, put down her phone, pick up her drink and take a sip, and then switch back to her phone and type another word.

在这个假想的世界里,Lady Gaga正在管理两个任务:喝饮料和发短信。但她不是同时执行两个任务。而是在两个任务间切换或 交替。注意,交替进行时,切换前不必完全完成一个任务:Gaga可以输入一个单词,放下电话,拿起饮料喝一口,然后拿起电话输入另一个单词。

Parallelism refers to executing more than one task at the same time. If Madame Gaga were to execute her two tasks in parallel, she would sing,

I can text you with one hand while I use the other to drink, eh

并行 指同时执行超过一个任务。如果Gaga并行执行两个任务,她将唱到:

我能用一只手发短信,同时用另一只手喝饮料。

Parallelism is a subclass of concurrency: before you execute multiple tasks simultaneously, you first have to manage multiple tasks.

并行是并发的一个子类:同时执行多个任务之前,必须管理多个任务。

Clojure has many features that allow you to achieve parallelism easily. While the Lady Gaga system achieves parallelism by simultaneously executing tasks on multiple hands, computer systems generally achieve parallelism by simultaneously executing tasks on multiple processors.

Clojure有很多特性,让你容易地实现并行。Lady Gaga在两只手上同时执行两个任务,从而实现了并行。与此相比,计算机系统实现并行的方法通常是:在多个处理器同时执行多个任务。

It’s important to distinguish parallelism from distribution. Distributed computing is a special version of parallel computing where the processors are in different computers and tasks are distributed to computers over a network. It’d be like Lady Gaga asking Beyoncé, “Please text this guy while I drink.” Although you can do distributed programming in Clojure with the aid of libraries, this book covers only parallel programming, and here I’ll use parallel to refer only to cohabiting processors. If you’re interested in distributed programming, check out Kyle Kingsbury’s Call Me Maybe series at https://aphyr.com/.

区分并行与分布式计算很重要。分布式计算是并行计算的特殊版本,分布式计算的处理器处于不同的计算机中,并且任务被分配到网络上不同的计算机上。就像Lady Gaga请求Beyoncé,“我喝饮料时候,请给这个人发短信”。虽然可以通过库在Clojure中进行分布式编程,本书只覆盖并行编程,并且 并行 只用来指同一台计算机内的处理器。如果你对分布式编程感兴趣,可以去看 https://aphyr.com/

Blocking and Asynchronous Tasks

阻塞与异步任务

One of the major use cases for concurrent programming is for blocking operations. Blocking really just means waiting for an operation to finish. You’ll most often hear it used in relation to I/O operations, like reading a file or waiting for an HTTP request to finish. Let’s examine this using the concurrent Lady Gaga example.

并发编程的一个主要使用场合是用于阻塞操作。阻塞的意思是等待一个操作完成。在I/O相关操作中,比如文件读取或等待HTTP请求完成,经常听说阻塞操作。用并行Lady Gaga举例。

If Lady Gaga texts her interlocutor and then stands there with her phone in her hand, staring at the screen for a response and not drinking, then you would say that the read next text message operation is blocking and that these tasks are executing synchronously.

如果Lady Gaga发短信给对话者,然后站在那,手里拿着手机,盯着屏幕等响应,不喝饮料,那么就应该说 读下一条短信 操作是阻塞的,并且这个任务是 同步 执行的。

If, instead, she tucks her phone away so she can drink until it alerts her by beeping or vibrating, then the read next text message task is not blocking and you would say she’s handling the task asynchronously.

相反,如果为了能喝饮料,她把手机放在一边,直到手机响铃或震动提醒她。那么 读下一条短信 任务是非阻塞的,并且这个任务是 异步 处理的。

Concurrent Programming and Parallel Programming

并发编程与并行编程

Concurrent programming and parallel programming refer to techniques for decomposing a task into subtasks that can execute in parallel and managing the risks that arise when your program executes more than one task at the same time. For the rest of the chapter, I’ll use the two terms interchangeably because the risks are pretty much the same for both.

并发编程与并行编程指任务分解技术(把任务分解成可以同时执行的子任务),和管理风险技术(同时执行多个任务时出现的风险)。本章的剩余部分,这两个属于可以交换使用,因为使用他们的风险几乎一样。

To better understand those risks and how Clojure helps you avoid them, let’s examine how concurrency and parallelism are implemented in Clojure.

为了更好理解这些风险和Clojure如何帮助你避免这些风险,让我们看看Clojure如何实现并发与并行。

Clojure Implementation: JVM Threads

Clojure实现:JVM线程

I’ve been using the term task in an abstract sense to refer to a series of related operations without regard for how a computer might implement the task concept. For example, texting is a task that consists of a series of related operations that are totally separate from the operations involved in pouring a drink into your face.

到现在为止,我一直在以一种抽象的方式使用任务这个词,任务是指一系列相关操作,与计算机如何实现这个任务无关。比如,发短信是一个由一系列相关操作构成的任务,这些操作与喝饮料的那些操作完全分开。

In Clojure, you can think of your normal, serial code as a sequence of tasks. You indicate that tasks can be performed concurrently by placing them on JVM threads.

在Clojure里,可以把连续的代码当成一系列的任务。通过把他们放在JVM线程里,表明这些任务可以并发执行。

What’s a Thread?

什么是线程?

I’m glad you asked! A thread is a subprogram. A program can have many threads, and each thread executes its own set of instructions while enjoying shared access to the program’s state.

很高兴你问这个!线程是子程序。一个程序能有很多线程,每个线程执行自己的指令集,同时拥有对程序状态共同的访问权。

Thread management functionality can exist at multiple levels in a computer. For example, the operating system kernel typically provides system calls to create and manage threads. The JVM provides its own platform-independent thread management functionality, and since Clojure programs run in the JVM, they use JVM threads. You’ll learn more about the JVM in Chapter 12.

计算机的很多层次都有线程管理功能。例如,操作系统内核通常提供系统调用用来创建和管理线程。JVM提供自己的独立于平台的线程管理功能,由于Clojure程序运行在JVM里,所以使用线程。

You can think of a thread as an actual, physical piece of thread that strings together a sequence of instructions. In my mind, the instructions are marshmallows, because marshmallows are delicious. The processor executes these instructions in order. I picture this as an alligator consuming the instructions, because alligators love marsh­mallows (true fact!). So executing a program looks like a bunch of marshmallows strung out on a line with an alligator traveling down the line and eating them one by one. Figure 9-1 shows this model for a single-core processor executing a single-threaded program.

可以把线程想象成一根真正的线,把一系列指令串了起来。在我的脑海里,这些指令是软糖,因为软糖很美味。处理器按顺序执行这些指令。就像一个鳄鱼挨个吃掉串成一串的软糖一样。图9-1显示了这个单核处理器执行单一线程的程序的模型。

图9-1
9-1

A thread can spawn a new thread to execute tasks concurrently. In a single-processor system, the processor switches back and forth between the threads (interleaving). Here’s where potential concurrency issues get introduced. Although the processor executes the instructions on each thread in order, it makes no guarantees about when it will switch back and forth between threads.

一个线程能 生产 一个新线程用于并行执行任务。在单处理器的系统中,处理器在线程中来回切换。潜在的并行问题由此产生。尽管处理器按顺序执行每个线程上的指令,但不保证什么时候进行切换。

Figure 9-2 shows an illustration of two threads, A and B, and a timeline of how their instructions could be executed. I’ve shaded the instructions on thread B to help distinguish them from the instructions on thread A.

图9-2演示了两个进程,A和B,和一个可能的指令执行时间线。线程B的指令画上了阴影以便与A区分。

图9-2
图9-2

Note that this is just one possible order of instruction execution. The processor could also have executed the instructions in the order A1, A2, A3, B1, A4, B2, B3 for example. This makes the program nondeterministic. You can’t know beforehand what the result will be because you can’t know the execution order, and different execution orders can yield different results.

注意,这只是一个可能的指令执行顺序。例如,处理器也可能按照,A1, A2, A3, B1, A4, B2, B3,的顺序执行这些指令。这使程序变得 不确定 。因为不能确定执行顺序,所以无法预先知道结果,并且不同的执行顺序可能产生不同的执行结果。

This example shows concurrent execution on a single processor through interleaving, whereas a multi-core system assigns a thread to each core, allowing the computer to execute more than one thread simultaneously. Each core executes its thread’s instructions in order, as shown in Figure 9-3.

上面的例子演示了单核系统通过穿插实现并发执行。而多核系统把线程分配到每个处理器上使计算机能同时执行多个线程,每个处理器按顺序执行线程指令。如图9-3

图9-3
9-3

As with interleaving on a single core, there are no guarantees for the overall execution order, so the program is nondeterministic. When you add a second thread to a program, it becomes nondeterministic, and this makes it possible for your program to fall prey to three kinds of problems.

同单核上穿插执行一样,整体执行顺序不能确定,所以程序是不确定的。当往程序里加入第二个线程时候,程序变的不确定了,使程序容易受到三类问题的影响。

The Three Goblins: Reference Cells, Mutual Exclusion, and Dwarven Berserkers

三只哥布林:引用单元,互斥,矮人狂暴者

There are three central challenges in concurrent programming, also known as the The Three Concurrency Goblins. To see why these are scary, imagine that the program in the image in Figure 9-3 includes the pseudo­instructions in Table 9-1.

并发编程中有三个主要挑战,也叫 三只并发哥布林 。为了说明其令人困扰之处,假设图9-3包含表9-1的伪指令:

表9-1

ID instruction
A1 WRITE X = 0
A2 READ X
A3 WRITE X = X + 1
B1 READ X
B2 WRITE X = X + 1

If the processor follows the order A1, A2, A3, B1, B2, then X will have a value of 2, as you’d expect. But if it follows the order A1, A2, B1, A3, B2, X’s value will be 1, as you can see in Figure 9-4.

如果处理的执行顺序是:A1, A2, A3, B1, B2,那么x的值将是2,如你所愿。但如果顺序是:A1, A2, B1, A3, B2,x的值将是1,如图9-4所示。

图9-4 两个进程与一个引用单元交互
9-4

We’ll call this the reference cell problem (the first Concurrency Goblin). The reference cell problem occurs when two threads can read and write to the same location, and the value at the location depends on the order of the reads and writes.

我们管这个叫 引用单元 问题(第一个并发哥布林)。引用单元问题发生在:两个线程能读写同一个位置,那个位置的值取决于读写顺序。

The second Concurrency Goblin is mutual exclusion. Imagine two threads, each trying to write a spell to a file. Without any way to claim exclusive write access to the file, the spell will end up garbled because the write instructions will be interleaved. Consider the following two spells:

第二个并发哥布林是互斥。假设两个线程都尝试向一个文件写入内容。如果没有任何方法获得文件的独占写权限,由于写指令穿插执行,文件内容将是混乱的。比如下面两段文字:

By the power invested in me
by the state of California,
I now pronounce you man and wife

Thunder, lightning, wind, and rain,
a delicious sandwich, I summon again

If you write these to a file without mutual exclusion, you could end up with this:

如果不带互斥地写入,最终可能得到这个:

By the power invested in me
by Thunder, lightning, wind, and rain,
the state of California,
I now pronounce you a delicious man sandwich, and wife
I summon again

The third Concurrency Goblin is what I’ll call the dwarven berserker problem (aka deadlock). Imagine four berserkers sitting around a rough-hewn, circular wooden table comforting each other. “I know I’m distant toward my children, but I just don’t know how to communicate with them,” one growls. The rest sip their coffee and nod knowingly, care lines creasing their eye places.

第三个并发哥布林是矮人狂暴者问题(即死锁)。想象一下四个狂暴者围着一个木头桌子聊天。

Now, as everyone knows, the dwarven berserker ritual for ending a comforting coffee klatch is to pick up their “comfort sticks” (double-bladed war axes) and scratch each other’s backs. One war axe is placed between each pair of dwarves, as shown in Figure 9-5.

每对矮人之间都放有一把斧头。如图9-5

图9-5
图9-5

Their ritual proceeds thusly:

  1. Pick up the left war axe, when available.
  2. Pick up the right war axe, when available.
  3. Comfort your neighbor with vigorous swings of your “comfort sticks.”
  4. Release both war axes.
  5. Repeat.

仪式的过程是这样的:

  1. 可用时,拿起左边的战斧。
  2. 可用时,拿起右边的战斧。
  3. 用它们给邻居挠痒痒。
  4. 放下两把斧头。
  5. 重复。

Following this ritual, it’s entirely possible that all the dwarven berserkers will pick up their left comfort stick and then block indefinitely while waiting for the comfort stick to their right to become available, resulting in deadlock. (By the way, if you want to look into this phenomenon further, it’s usually referred to as the dining philosophers problem, but that’s a more boring scenario.) This book doesn’t discuss deadlock in much detail, but it’s good to know the concept and its terminology.

按照这个仪式,完全有可能所有矮人都举着左手斧头,然后等着右边的斧头,陷入无限等待,导致死锁。(顺便提一下,如果你想进一步研究这个问题,这个问题通常叫进餐的哲学家问题)。本书不对死锁进行细节讨论,但知道概念和术语是件好事。

Concurrent programming has its goblins, but with the right tools, it’s manageable and even fun. Let’s start looking at the right tools.

并发编程有这些哥布林,但用正确的工具,他们是可管理,甚至有趣的。让我们开始查看这些正确的工具。

Futures, Delays, and Promises

未来(future),延期(delay)和承诺(promise)

Futures, delays, and promises are easy, lightweight tools for concurrent programming. In this section, you’ll learn how each one works and how to use them together to defend against the reference cell Concurrency Goblin and the mutual exclusion Concurrency Goblin. You’ll discover that, although simple, these tools go a long way toward meeting your concurrency needs.

未来,延期和承诺是简单轻量的并发编程工具。这节里,你将了解他们是如何工作的,并学习如何使用他们防止引用单元问题和互斥问题。你会发现,这些工具虽然简单,但对你的并发编程有很大帮助。

They do this by giving you more flexibility than is possible with serial code. When you write serial code, you bind together these three events:

相比连续代码,他们为你提供了更大的灵活性。写连续代码时候,把这三个事件绑定在一起了:

  1. Task definition
  2. Task execution
  3. Requiring the task’s result
  1. 任务定义
  2. 任务执行
  3. 请求任务结果

As an example, look at this hypothetical code, which defines a simple API call task:

作为例子,看下这个假定的代码,定义了一个简单的API调用任务:

1
(web-api/get :dwarven-beard-waxes)

As soon as Clojure encounters this task definition, it executes it. It also requires the result right now, blocking until the API call finishes. Part of learning concurrent programming is learning to identify when these chronological couplings aren’t necessary. Futures, delays, and promises allow you to separate task definition, task execution, and requiring the result. Onward!

一旦Clojure遇到这个任务定义,就会执行,并立刻请求结果,一直等待直到API调用完成。学习并发编程需要学会识别这些时间上的藕合何时不是必须的。未来,延期和承诺让你能分离任务定义,任务执行,和请求返回结果。

Futures

未来

In Clojure, you can use futures to define a task and place it on another thread without requiring the result immediately. You can create a future with the future macro. Try this in a REPL:

在Clojure中,可以用未来定义一个任务,并且把它放进另一个线程,不用马上请求结果。用future宏建立未来。在REPL里试一下:

1
2
3
(future (Thread/sleep 4000)
(println "I'll print after 4 seconds"))
(println "I'll print immediately")

Thread/sleep tells the current thread to just sit on its bum and do nothing for the specified number of milliseconds. Normally, if you evaluated Thread/sleep in your REPL, you wouldn’t be able to evaluate any other statements until the REPL was done sleeping; the thread executing your REPL would be blocked. However, future creates a new thread and places each expression you pass it on the new thread, including Thread/sleep, allowing the REPL’s thread to continue, unblocked.

Thread/sleep告诉当前线程在指定的时间内什么都不干。通常,如果你在你的REPL里执行Thread/sleep,你将无法求值任何其他语句,sleep结束前你的REPL都将处于阻塞状态。但future新建了一个线程,把传给future的每个表达式都放在新线程上,包括Thread/sleep,这使你的REPL线程继续,保持非阻塞。

You can use futures to run tasks on a separate thread and then forget about them, but often you’ll want to use the result of the task. The future function returns a reference value that you can use to request the result. The reference is like the ticket that a dry cleaner gives you: at any time you can use it to request your clean dress, but if your dress isn’t clean yet, you’ll have to wait. Similarly, you can use the reference value to request a future’s result, but if the future isn’t done computing the result, you’ll have to wait.

你可以用未来把任务放在独立的线程上运行,然后不再管它,但你经常需要任务运行的结果。future函数返回一个引用,你可以用这个引用请求结果。这个引用就像干洗店员发给你的票据:任何时候你都可以用它来取回你的干净衣服,但如果衣服还没洗好,你必须等待。类似地,你可以用这个引用值来请求未来的结果,但如果这个未来还没计算出结果,你必须等待。

Requesting a future’s result is called dereferencing the future, and you do it with either the deref function or the @ reader macro. A future’s result value is the value of the last expression evaluated in its body. A future’s body executes only once, and its value gets cached. Try the following:

请求一个未来的结果叫做对未来取值,可以用defef函数或@读取宏取值。未来的结果的值,是其主体里最后一个表达式的值。一个未来的主体只执行一次,其结果被缓存。试试下面这个:

1
2
3
4
5
6
7
(let [result (future (println "this prints once")
(+ 1 1))]
(println "deref: " (deref result))
(println "@: " @result))
; => "this prints once"
; => deref: 2
; => @: 2

Notice that the string "this prints once" indeed prints only once, even though you dereference the future twice. This shows that the future’s body ran only once and the result, 2, got cached.

注意字符串"this prints once"确实只打印了一次,即使你取值了两次。这说明这个未来的主体只运行了一次,并且结果,2被缓存了。

Dereferencing a future will block if the future hasn’t finished running, like so:

取值一个未运行完成的未来会阻塞,像这样:

1
2
3
4
5
6
(let [result (future (Thread/sleep 3000)
(+ 1 1))]
(println "The result is: " @result)
(println "It will be at least 3 seconds before I print"))
; => The result is: 2
; => It will be at least 3 seconds before I print

Sometimes you want to place a time limit on how long to wait for a future. To do that, you can pass deref a number of milliseconds to wait along with the value to return if the deref times out:

可以设置时间限制,限制等待一个未来的时间上限。给defef传一个等待时间和一个默认值即可:

1
2
(deref (future (Thread/sleep 1000) 0) 10 5)
; => 5

This code tells deref to return the value 5 if the future doesn’t return a value within 10 milliseconds.

这个代码为defef设置了10毫秒的等待时间,和5,作为到期返回值。

Finally, you can interrogate a future using realized? to see if it’s done running:

最后,可以用realized?查询一个未来是否运行完成:

1
2
3
4
5
6
7
(realized? (future (Thread/sleep 1000)))
; => false

(let [f (future)]
@f
(realized? f))
; => true

Futures are a dead-simple way to sprinkle some concurrency on your program.

未来是超级简单的并发编程方法。

On their own, they give you the power to chuck tasks onto other threads, which can make your program more efficient. They also let your program behave more flexibly by giving you control over when a task’s result is required.

就其本身而言,他赋予你把任务放在其他线程运行的能力,这使程序更高效。也让你能控制何时请求任务结果,这使你的程序更加灵活。

When you dereference a future, you indicate that the result is required right now and that evaluation should stop until the result is obtained. You’ll see how this can help you deal with the mutual exclusion problem in just a bit. Alternatively, you can ignore the result. For example, you can use futures to write to a log file asynchronously, in which case you don’t need to dereference the future to get any value back.

对一个未来取值时,表示现在就需要结果,而且求值会停止,直到获得结果。你会看到他如何帮你处理互斥问题。另外,也可以忽略结果。比如,可以用未来异步地写日志文件,这种情况下不需要取回任何值。

The flexibility that futures give you is very cool. Clojure also allows you to treat task definition and requiring the result independently with delays and promises.

未来赋予你的灵活性很酷。Clojure也允许你用延期和承诺分别处理任务定义和结果请求。

Delays

延期

Delays allow you to define a task without having to execute it or require the result immediately. You can create a delay using delay:

Delays允许你定义一个任务,但不用立刻执行并请求结果。用delay建立一个延期:

1
2
3
4
(def jackson-5-delay
(delay (let [message "Just call my name and I'll be there"]
(println "First deref:" message)
message)))

In this example, nothing is printed, because we haven’t yet asked the let form to be evaluated. You can evaluate the delay and get its result by dereferencing it or by using force. force behaves identically to deref in that it communicates more clearly that you’re causing a task to start as opposed to waiting for a task to finish:

这个例子什么都没打印,因为我们还没让let被求值。用取值或用force,可以对延期求值并得到结果。force的功能与defef一样,但它更清晰地表明你使一个任务开始,而不是等待一个任务完成:

1
2
3
(force jackson-5-delay)
; => First deref: Just call my name and I'll be there
; => "Just call my name and I'll be there"

Like futures, a delay is run only once and its result is cached. Subsequent dereferencing will return the Jackson 5 message without printing anything:

与未来类似,延期只运行一次,并且结果被缓存。后续的取值不会打印任何东西:

1
2
@jackson-5-delay
; => "Just call my name and I'll be there"

One way you can use a delay is to fire off a statement the first time one future out of a group of related futures finishes. For example, pretend your app uploads a set of headshots to a headshot-sharing site and notifies the owner as soon as the first one is up, as in the following:

延期的一个用法是:当第一次,一组相关的未来中的某个完成时,执行一条语句。例如,假设你的app上传一组头像照片到一个头像照片分享网站,而且当第一张上传完成时通知用户,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
(def gimli-headshots ["serious.jpg" "fun.jpg" "playful.jpg"])
(defn email-user
[email-address]
(println "Sending headshot notification to" email-address))
(defn upload-document
"Needs to be implemented"
[headshot]
true)
(let [notify (delay ➊(email-user "and-my-axe@gmail.com"))]
(doseq [headshot gimli-headshots]
(future (upload-document headshot)
➋(force notify))))

In this example, you define a vector of headshots to upload (gimli-headshots) and two functions (email-user and upload-document) to pretend-perform the two operations. Then you use let to bind notify to a delay. The body of the delay, (email-user "and-my-axe@gmail.com") ➊, isn’t evaluated when the delay is created. Instead, it gets evaluated the first time one of the futures created by the doseq form evaluates (force notify) ➋. Even though (force notify) will be evaluated three times, the delay body is evaluated only once. Gimli will be grateful to know when the first headshot is available so he can begin tweaking it and sharing it. He’ll also appreciate not being spammed, and you’ll appreciate not facing his dwarven wrath.

例子定义了一组要上传的头像(gimli-headshots)和两个函数(email-userupload-document)模拟执行两个操作。然后用let绑定nodify至一个延期。延期的主体,(email-user "and-my-axe@gmail.com")➊在创建时没有求值。相反,求值发生的时间是:第一次doseq创建的某个未来求值(force notify)➋的时候。虽然(force notify)将求值三次,,但延期主体只求值一次。上传者知道第一张照片上传完成会很高兴,他可以开始编辑照片,他也因为没收到重复的垃圾邮件而高兴。

This technique can help protect you from the mutual exclusion Concurrency Goblin—the problem of making sure that only one thread can access a particular resource at a time. In this example, the delay guards the email server resource. Because the body of a delay is guaranteed to fire only once, you can be sure that you will never run into a situation where two threads send the same email. Of course, no thread will ever be able to use the delay to send an email again. That might be too drastic a constraint for most situations, but in cases like this example, it works perfectly.

这个技巧可以防止互斥并发问题,确保一个时间只有一个线程能访问一个特定资源。这个例子里,延期保护了邮件服务器资源。由于延期主体保证只执行一次,你能确信绝不会有两个线程发送同样的邮件。当然,没有线程能再次使用这个延期发送邮件。大多数情况下这个限制可能太激进,但这个例子的情况,这个限制工作很完美。

Promise

承诺

Promises allow you to express that you expect a result without having to define the task that should produce it or when that task should run. You create promises using promise and deliver a result to them using deliver. You obtain the result by dereferencing:

承诺 让你能表达:期望一个结果,但不用定义产生这个结果的任务或不用指定那个任务应该什么时候运行。用promise创建承诺,用deliver交付承诺结果。用取值获得这个结果:

1
2
3
4
(def my-promise (promise))
(deliver my-promise (+ 1 2))
@my-promise
; => 3

Here, you create a promise and then deliver a value to it. Finally, you obtain the value by dereferencing the promise. Dereferencing is how you express that you expect a result, and if you had tried to dereference my-promise without first delivering a value, the program would block until a promise was delivered, just like with futures and delays. You can only deliver a result to a promise once.

这里创建了一个承诺然后对它交付了一个值。最后用取值获得了那个值。取值是表达了你期望一个结果,如果不先交付一个值,就对my-promise取值,程序将会阻塞,直至一个值被交付,与未来(futures)和延期(delay)一样。对一个承诺只能交付一次结果。

One use for promises is to find the first satisfactory element in a collection of data. Suppose, for example, that you’re gathering ingredients to make your parrot sound like James Earl Jones. Because James Earl Jones has the smoothest voice on earth, one of the ingredients is premium yak butter with a smoothness rating of 97 or greater. You have a budget of $100 for one pound.

承诺的一个用处是从一个数据集合里找到第一个满足条件的成员。例如,假设你正在收集使你的鹦鹉发出同James Earl Jones声音一样的声音的配方。其中一个配方是光滑度为97%或更高的优质牦牛黄油。你有100美元的预算用于购买一磅。

You are a modern practitioner of the magico-ornithological arts, so rather than tediously navigating each yak butter retail site, you create a script to give you the URL of the first yak butter that meets your needs.

你决定写一个脚本用来获得符合需求的第一个牦牛油的URL。

The following code defines some yak butter products, creates a function to mock up an API call, and creates another function to test whether a product is satisfactory:

下面的代码定义了一些牦牛黄油产品,创建了一个函数用来模拟API调用,创建了另一个函数测试产品是否满足需要:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(def yak-butter-international
{:store "Yak Butter International"
:price 90
:smoothness 90})
(def butter-than-nothing
{:store "Butter Than Nothing"
:price 150
:smoothness 83})
;; This is the butter that meets our requirements
(def baby-got-yak
{:store "Baby Got Yak"
:price 94
:smoothness 99})

(defn mock-api-call
[result]
(Thread/sleep 1000)
result)

(defn satisfactory?
"If the butter meets our criteria, return the butter, else return false"
[butter]
(and (<= (:price butter) 100)
(>= (:smoothness butter) 97)
butter))

The API call waits one second before returning a result to simulate the time it would take to perform an actual call.

API调用返回结果前等待了一秒,用来模拟实际调用花费的时间。

To show how long it will take to check the sites synchronously, we’ll use some to apply the satisfactory? function to each element of the collection and return the first truthy result, or nil if there are none. When you check each site synchronously, it could take more than one second per site to obtain a result, as the following code shows:

为显示同步检查这些网站需要多久,我们对产品集合的每项产品模拟api调用,取得产品,然后判断是否符合要求。并用some取出第一个符合要求的产品。如果没有符合的产品,返回nil。如下代码所示:

1
2
3
4
(time (some (comp satisfactory? mock-api-call)
[yak-butter-international butter-than-nothing baby-got-yak]))
; => "Elapsed time: 3002.132 msecs"
; => {:store "Baby Got Yak", :smoothness 99, :price 94}

Here I’ve used comp to compose functions, and I’ve used time to print the time taken to evaluate a form. You can use a promise and futures to perform each check on a separate thread. If your computer has multiple cores, this could reduce the time it takes to about one second:

这里使用了comp组合函数,使用了time打印求值时间。你可以使用一个承诺和未来在单独的线程执行每个检查。如果你的计算机是多核的,可以把花费的时间减少到大约一秒:

1
2
3
4
5
6
7
8
(time
(let [butter-promise (promise)]
(doseq [butter [yak-butter-international butter-than-nothing baby-got-yak]]
(future (if-let [satisfactory-butter (satisfactory? (mock-api-call butter))]
(deliver butter-promise satisfactory-butter))))
(println "And the winner is:" @butter-promise)))
; => "Elapsed time: 1002.652 msecs"
; => And the winner is: {:store Baby Got Yak, :smoothness 99, :price 94}

In this example, you first create a promise, butter-promise, and then create three futures with access to that promise. Each future’s task is to evaluate a yak butter site and to deliver the site’s data to the promise if it’s satisfactory. Finally, you dereference butter-promise, causing the program to block until the site data is delivered. This takes about one second instead of three because the site evaluations happen in parallel. By decoupling the requirement for a result from how the result is actually computed, you can perform multiple computations in parallel and save some time.

这个例子里,建立了一个承诺,butter-promise,然后创建了三个未来用于访问这个承诺。每个未来的干的事情是:求值一个网站的牦牛黄油,并且如果满足需要,就把它交付给这个承诺。最后,对butter-promise取值,使程序阻塞,直到有数据交付。由于求值并行运行,所以花费了一秒而不是三秒。通过把结果计算和请求结果分离,可以并行执行多个计算并节约时间。

You can view this as a way to protect yourself from the reference cell Concurrency Goblin. Because promises can be written to only once, you prevent the kind of inconsistent state that arises from nondeterministic reads and writes.

你可以把这个当作防止引用单元问题的一个方法。因为承诺只能被写一次,所以防止了不确定的读写中出现的数据不一致状态。

You might be wondering what happens if none of the yak butter is satisfactory. If that happens, the dereference would block forever and tie up the thread. To avoid that, you can include a timeout:

你可能想知道,如果没有符合要求的牦牛黄油怎么办?如果发生这种情况,取值将永久阻塞,并占用那个线程。为避免这样,可以增加一个超时:

1
2
(let [p (promise)]
(deref p 100 "timed out"))

This creates a promise, p, and tries to dereference it. The number 100 tells deref to wait 100 milliseconds, and if no value is available by then, to use the timeout value, "timed out".

这个例子创建了一个承诺,p,并且对它取值。数字100告诉deref等待100毫秒,如果之后没有可用的值,就用超时值"timeed out"

The last detail I should mention is that you can also use promises to register callbacks, achieving the same functionality that you might be used to in JavaScript. JavaScript callbacks are a way of defining code that should execute asynchronously once some other code finishes. Here’s how to do it in Clojure:

最后一个细节是可以用承诺注册回调函数,获得与JavaScript回调函数相同的功能。JavaScript回调函数定义了这样的代码:一旦某些其他代码完成,这些代码就会异步执行。在Clojure里可以这么做:

1
2
3
4
5
(let [ferengi-wisdom-promise (promise)]
(future (println "Here's some Ferengi wisdom:" @ferengi-wisdom-promise))
(Thread/sleep 100)
(deliver ferengi-wisdom-promise "Whisper your way to success."))
; => Here's some Ferengi wisdom: Whisper your way to success.

This example creates a future that begins executing immediately. However, the future’s thread is blocking because it’s waiting for a value to be delivered to ferengi-wisdom-promise. After 100 milliseconds, you deliver the value and the println statement in the future runs.

这个例子创建了一个立刻开始执行的未来。但这个未来的线程是阻塞的,因为等待一个值交付给ferengi-wisdom-promise。100毫秒后,值被交付,未来里的println语句运行。

Futures, delays, and promises are great, simple ways to manage concurrency in your application. In the next section, we’ll look at one more fun way to keep your concurrent applications under control.

未来,延期和承诺是应用程序里管理并发的好方法。下节我们会看到更多有趣的控制并发程序的方法。

Rolling Your Own Queue

自建队列

So far you’ve looked at some simple ways to combine futures, delays, and promises to make your concurrent programs a little safer. In this section, you’ll use a macro to combine futures and promises in a slightly more complex manner. You might not necessarily ever use this code, but it’ll show the power of these modest tools a bit more. The macro will require you to hold runtime logic and macro expansion logic in your head at the same time to understand what’s going on; if you get stuck, just skip ahead.

到现在你已经看到了一些简单的,用来组合未来,延期,和承诺,使并发程序更加安全的方法。这节,你将用一个宏以更复杂些的方式组合未来和承诺。你不一定要使用这个代码,但它能让你进一步看到这些朴素工具的力量。为理解这个宏,你头脑中要同时想着宏展开和运行时逻辑,如果你卡住了,跳过即可。

One characteristic The Three Concurrency Goblins have in common is that they all involve tasks concurrently accessing a shared resource—a variable, a printer, a dwarven war axe—in an uncoordinated way. If you want to ensure that only one task will access a resource at a time, you can place the resource access portion of a task on a queue that’s executed serially. It’s kind of like making a cake: you and a friend can separately retrieve the ingredients (eggs, flour, eye of newt, what have you), but some steps you’ll have to perform serially. You have to prepare the batter before you put it in the oven. Figure 9-6 illustrates this strategy.

三个并发问题的一个共同特点是:他们都与不协调地访问共享资源的任务有关,这些资源包括:变量,打印机,矮人战斧。如果你想确保一次只有一个任务访问某个资源,你可以把任务中访问资源的部分放进一个顺序执行的队列。这有点想做蛋糕:你和一个朋友可以分别取得制作材料(鸡蛋,面粉,蝾螈眼睛等东西),但某些步骤你们必须按顺序执行。你必须先准备好面糊,再放入烤箱。图9-6演示了这个策略。

图9-6
Dividing tasks

To implement the queuing macro, you’ll pay homage to the British, because they invented queues. You’ll use a queue to ensure that the customary British greeting “Ello, gov’na! Pip pip! Cheerio!” is delivered in the correct order. This demonstration will involve an abundance of sleeping, so here’s a macro to do that more concisely:

你将使用一个队列用以确保问候语 “Ello, gov’na! Pip pip! Cheerio!” 按正确的顺序被交付。这个示例里有很多sleep,所以用下面这个宏使代码更简洁:

1
2
3
4
(defmacro wait
"Sleep `timeout` seconds before evaluating body"
[timeout & body]
`(do (Thread/sleep ~timeout) ~@body))

All this code does is take whatever forms you give it and insert a call to Thread/sleep before them, all wrapped up in do.

这段代码的全部工作就是在所有形式前插入一个调用 Thread/sleep,并全部用do包起来。

The code in Listing 9-1 splits up tasks into a concurrent portion and a serialized portion:

下面9-1的代码把任务分成并行部分和串行部分:

Listing 9-1. The expansion of an enqueue macro call:

列表 9-1。enqueue宏调用的展开:

1
2
3
4
5
6
7
8
9
10
11
12
(let [saying3 (promise)]
(future (deliver saying3 (wait 100 "Cheerio!")))
@(let [saying2 (promise)]
(future (deliver saying2 (wait 400 "Pip pip!")))
➊ @(let [saying1 (promise)]
(future (deliver saying1 (wait 200 "'Ello, gov'na!")))
(println @saying1)
saying1)
(println @saying2)
saying2)
(println @saying3)
saying3)

The overall strategy is to create a promise for each task (in this case, printing part of the greeting) to create a corresponding future that will deliver a concurrently computed value to the promise. This ensures that all of the futures are created before any of the promises are dereferenced, and it ensures that the serialized portions are executed in the correct order. The value of saying1 is printed first—"'Ello, gov'na!"—then the value of saying2, and finally saying3. Returning saying1 in a let block and dereferencing the let block at ➊ ensures that you’ll be completely finished with saying1 before the code moves on to do anything to saying2, and this pattern is repeated with saying2 and saying3.

代码的整体策略是为每个打印任务建立一个承诺,用于建立对应的未来,这个未来将交付一个并行计算值给那个承诺。这确保了所有未来都建立于任何承诺被取值之前,同时确保了串行部分按正确的顺序执行。saying1的值"'Ello, gov'na!"被首先打印,然后是saying2,saying3的值。在➊处,let代码块返回返回saying1,并对它取值,这确保了代码前进到saying2前,saying1将完全执行完,saying2saying3也重复了这个模式。

It might seem silly to dereference the let block, but doing so lets you abstract this code with a macro. And you will definitely want to use a macro, because writing out code like the previous example would drive you mental (as the British would say). Ideally, the macro would work as shown in Listing 9-2:

let取值可能看起来很傻,但这么做使你能用宏抽象这块代码。而且你肯定想用宏,因为像前面那样写代码会使你头脑发疯。理想情况下,这个宏将这么用:

Listing 9-2:

1
2
3
(-> (enqueue ➊saying ➋(wait 200 "'Ello, gov'na!") ➌(println @saying))
➍(enqueue saying (wait 400 "Pip pip!") (println @saying))
(enqueue saying (wait 100 "Cheerio!") (println @saying)))

The macro lets you name the promise that gets created ➊, define how to derive the value to deliver that promise ➋, and define what to do with the promise ➌. The macro can also take another enqueue macro call as its first argument, which lets you thread it ➍. Listing 9-3 shows how you can define the enqueue macro. After defining enqueue, the code in Listing 9-2 will expand into the code in Listing 9-1, with all the nested let expressions:

位置➊:命名创建的承诺,➋:定义如何得到交付给承诺的值,➌:定义用这个承诺干什么。➍:这个宏也能接受另一个enqueue宏调用作为其第一个参数,以便于把他们串起来。列表9-3展示了如何定义enqueue宏。定义enqueue之后,列表9-2将被展开成列表9-1,带着所有得嵌套let表达式:

Listing 9-3:

1
2
3
4
5
6
7
8
9
(defmacro enqueue
➊ ([q concurrent-promise-name concurrent serialized]
➋ `(let [~concurrent-promise-name (promise)]
(future (deliver ~concurrent-promise-name ~concurrent))
➌ (deref ~q)
~serialized
~concurrent-promise-name))
➍ ([concurrent-promise-name concurrent serialized]
`(enqueue (future) ~concurrent-promise-name ~concurrent ~serialized)))

Notice first that this macro has two arities in order to supply a default value. The first arity ➊ is where the real work is done. It has the parameter q, and the second arity does not. The second arity ➍ calls the first with value (future) supplied for q; you’ll see why in a minute. At ➋, the macro returns a form that creates a promise, delivers its value in a future, dereferences whatever form is supplied for q, evaluates the serialized code, and finally returns the promise. q will usually be a nested let expression returned by another call to enqueue, like in Listing 9-2. If no value is supplied for q, the macro supplies a future so that the deref at ➌ doesn’t cause an exception.

首先注意,这个宏有两套参数以便于提供默认值。第一套参数(➊处)做了真正的工作。他有个参数q,第二套没有。第二套参数(➍处)调用了第一套,为q提供了值(future),马上就会解释这么做的用处。在➋处:这个宏返回了一个形式,这个形式创建了一个承诺,在一个未来里交付了这个承诺的值,对q取值,求值串行代码,最后返回这个承诺。通常情况下,q将是个另一个enqueue调用返回的嵌套的let表达式,像列表9-2那样。如果没有值提供给q,这个宏提供了一个未来,使➌处的deref不会造成异常。

Now that we’ve written the enqueue macro, let’s try it out to see whether it reduces the execution time!

现在有了enqueue宏,试一下它能否减少执行时间!

1
2
3
4
5
6
7
(time @(-> (enqueue saying (wait 200 "'Ello, gov'na!") (println @saying))
(enqueue saying (wait 400 "Pip pip!") (println @saying))
(enqueue saying (wait 100 "Cheerio!") (println @saying))))
; => 'Ello, gov'na!
; => Pip pip!
; => Cheerio!
; => "Elapsed time: 401.635 msecs"

Blimey! The greeting is delivered in the correct order, and you can see by the elapsed time that the “work” of sleeping was handled concurrently.

啊!结果正确,从时间看那些休眠部分也是并行执行的。

Summary

总结

It’s important for programmers like you to learn concurrent and parallel programming techniques so you can design programs that run efficiently on modern hardware. Concurrency refers to a program’s ability to carry out more than one task, and in Clojure you achieve this by placing tasks on separate threads. Programs execute in parallel when a computer has more than one CPU, which allows more than one thread to be executed at the same time.

学习并发和并行编程技术,以便设计出能在现代硬件上高效运行的程序,对于你这样的程序员是很重要的事情。并发指程序执行不止一个任务的能力,在Clojure里,可以通过把任务放在独立的线程上获得这种能力。当计算机有不止一个CPU时候,程序并行执行,这样同一时间就可以执行不止一个线程。

Concurrent programming refers to the techniques used to manage three concurrency risks: reference cells, mutual exclusion, and deadlock. Clojure gives you three basic tools that help you mitigate those risks: futures, delays, and promises. Each tool lets you decouple the three events of defining a task, executing a task, and requiring a task’s result. Futures let you define a task and execute it immediately, allowing you to require the result later or never. Futures also cache their results. Delays let you define a task that doesn’t get executed until later, and a delay’s result gets cached. Promises let you express that you require a result without having to know about the task that produces that result. You can only deliver a value to a promise once.

并发编程指用于管理三种并发风险(引用单元,互斥,死锁)的技术。Clojure提供了三个基本工具帮助你减少这些风险:未来,延期,承诺。使你能解耦定义任务,执行任务,获取任务结果。未来定义并立刻执行一个任务,并且可以请求或不请求结果。未来也缓存结果。延期定义了一个以后执行的任务,延期的结果也缓存。承诺让你可以请求一个结果,而不用必须知道如何产生这个结果。对一个承诺只能交付一次值。

In the next chapter, you’ll explore the philosophical side of concurrent programming and learn more sophisticated tools for managing the risks.

下章将探索并发编程的哲学方面,会学习更多管理这些风险复杂工具。

Exercises

练习

  1. Write a function that takes a string as an argument and searches for it on Bing and Google using the slurp function. Your function should return the HTML of the first page returned by the search.
  2. Update your function so it takes a second argument consisting of the search engines to use.
  3. Create a new function that takes a search term and search engines as arguments, and returns a vector of the URLs from the first page of search results from each search engine.
  1. 写一个函数,接受一个字符串参数,用slurp在Google和Bing上搜索它。函数返回搜索返回的第一个HTML。

  2. 更新这个函数,接受第二个参数,表示搜索引擎。

  3. 新建一个函数,接受一个搜索项和多个引擎作为参数,返回每个搜索引擎第一页搜索结果里的URL组成的vector。


译文结束。