You are given 3 chances to drop a non-biased dice (with number from 1-6). You can stop earlier, but the number of final drop is the money you can get. Give a strategy to maximize the money you can get.
So you may stop after the first or second roll. To decide whether you should go on or not, you should look at the probability of "getting bigger point in subsequent roll(s)" and the probability of "getting smaller rolls(s)". If the earlier is bigger, proceed; else, stop. If they are the same, going on and stopping have the same degree of judiciousness ;-)
Let's consider the second roll now (since it's easier). Once you have done the second roll, the first roll does not matter anymore. There are six outcomes from the second roll:
dice Pr("getting better") Pr("getting worse")
1 5/6 0
2 4/6 1/6
3* 3/6 2/6
4 2/6 3/6
5 1/6 4/6
6 0 1
Clearly the tipping point is 3. So if you get 3 or less at the second roll, proceed; else, stop.
Now let's think about the situation after we've done the first roll. Sure we have two rolls ahead, but once we are done with the second roll, we need to follow the rule we've already established above. It means that we only need to consider the second roll here. This is the same as what we've already done above.
So the conclusion is, at each roll, proceed only if you get <= 3.
posted at: 19:29 | path: / | permanent link to this entry
\Omega_1 = {goat, goat, prize}.
Afterwards the sample space becomes
\Omega_2 = {goat, prize}.
You can see the "whole" sample space of the game as the Cartesian
product of the two:
\Omega = {(goat, goat), (prize, prize), (goat, prize), (prize, goat)}.
To clearly see this problem, you must consider "switch" and "do not
switch" as two totally different games. Though they lead to the same
sample space, they do affect how the probability distribute among
events. Let's take a look.
1. Do not switch.
Pr((goat, goat)) = 2/3 Pr((prize, prize)) = 1/3 Pr((goat, prize)) = 0 Pr((prize, goat)) = 0
2. Switch.
Pr((goat, goat)) = 0 Pr((prize, prize)) = 0 Pr((goat, prize)) = 2/3 Pr((prize, goat)) = 1/3Winning in this game means choosing prize in the second sample space.
Pr("Winning!")
= Pr((prize, prize) \cup (goat, prize))
= Pr((prize, prize)) + Pr((goat, prize))
Note that we can do the addition only because they are disjoint - they
are simple events (or basic events).
Clearly if you don't switch, this value is 1/3 but if you do, the value is 2/3.
Another way to see this problem is through the law of total probability.
Pr("get prize by not switching")
= Pr("get prize in the 2nd stage" | "get prize in the 1st stage") +
Pr("get prize in the 2nd stage" | "get goat in the 1st stage")
= 1/3 + 0
= 1/3
Pr("get prize by switching")
= Pr("get prize in the 2nd stage" | "get prize in the 1st stage") +
Pr("get prize in the 2nd stage" | "get goat in the 1st stage") +
= 0 + 2/3
= 2/3
Both ways work. Just wanna emphasize the Cartesian product thing - it
really makes the problem a whole lotta clearer for me.
posted at: 14:39 | path: / | permanent link to this entry
fibs = 0:1:zipWith (+) fibs (tail fibs) fib = fibs !! n
I have no problem imagining infinite/cyclic structures like this but it has always been vague for me about its implementation. Then I feel that it is much easier to think about it with "names". So let's try.
When Haskell sees this, a thunk is generated for fibs. Let's call it "fibs_thunk". At first, fibs_thunk is just fibs without any further reduction/computation:
fibs_thunk = 0:1:zipWith (+) fibs_thunk (tail fibs_thunk)
Note how fibs_thunk points to itself. This is important because that's the way we implement (and think about) this infinite/cyclic structures.
If we call fib with 0 or 1, fibs_thunk will stay what it is.
The interesting thing happens when we call fib n where n > 1 ---- clearly Haskell has to push fibs_thunk to compute the third element.
Suppose zipWith is defined in this way (which it should be!):
zipWith f [] _ = [] zipWith f _ [] = [] zipWith f (x:xs) (y:ys) = f x y : zipWith f xs ys
Then we have:
fibs_thunk = 0:1:zipWith (+) fibs_thunk (tail fibs_thunk) -- visualize fibs_thunk -- "..." represents some infinite structure, in this case, fibs_thunk = 0:1:zipWith (+) (0:1:...) (tail (0:1:...)) -- applying tail so we can apply zipWith = 0:1:zipWith (+) (0:1:...) (1:...) -- applying zipWith -- the last two expressions are actually fibs_thunk without the first -- elemfent and fibs_thunk without the first two elements = 0:1:(0+1):zipWith (+) (1:...) (...) -- by now we suddenly resize that we know the third element of -- fibs_thunk! = 0:1:1:zipWith (+) (1:1:...) (1...) -- and we continue a few more steps = 0:1:1:(1+1):zipWith (+) (1:...) (...) = 0:1:1:2:zipWith (+) (1:2:...) (2:...) = 0:1:1:2:(1+2):(zipWith (+) (2:...) (...) = 0:1:1:2:3:(zipWith (+) (2:3...) (3:...) = 0:1:1:2:3:(2+3):(zipWith (+) (3...) (...) = 0:1:1:2:3:5:(zipWith (+) (3:5:...) (5:...) -- the computation ends when the nth elemfent is found ...
The important observation is that these infinite structures nested in its own definition are KEPT UPDATED in the process of computation!!! I bet it's referred by pointers internally. So before running out of data when Haskell pushes it, new data always come in to feed along with the computation process.
Let's look at one more example, sieve, which is used to find out primes.
primes = sieve 2 where sieve n = n : [x | x <- sieve(n + 1), x `mod` n /= 0]
Let's expand sieve 2:
sieve 2
-- from here we can only easily get the first element 2
-- to get any furthur, we need to push the computation
= 2 : [x | x <- sieve 3,
x `mod` 2 /= 0]
-- getting further
= 2 : [x | x <- (3 : [x | x <- sieve 4,
x `mod` 3 /= 0]),
x `mod` 2 /= 0]
= 2 : [x | x <- (3 : [x | x <- (4 : [x | x <- sieve 5,
x `mod` 4 /= 0]),
x `mod` 3 /= 0]),
x `mod` 2 /= 0]
Another version of sieve and its expansion:
primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0] sieve [2..] = 2 : sieve [x | x <- [3..], x `mod` 2 /= 0] -- we name the parameter to sieve [x | x <- [3..], x `mod` 2 /= 0] "p1" = 2 : (head p1) : sieve [x | x <- (tail p1), x `mod` (head p1) /= 0] -- we name the parameter to sieve [x | x <- (tail p1), x `mod` (head p1) /= 0] "p2" = 2 : (head p1) : (head p2) : sieve [x | x <- (tail p2), x `mod` (head p2) /= 0] ...
posted at: 15:58 | path: / | permanent link to this entry
Caches exploit temporal locality by storing recently accessed data.
Caches transfer data from main memory in contiguous blocks that
encompass multiple words and, consequently, benefit from spatial
locality.
1. Cache is divided/organized into lines. Line size = L bytes.
2. The memory side counterpart of a cache line is a paragraph (with the
additional requirement for alignment, see below). So the cache controller
"streams in" a paragraph into a cache line, or "streams out" a cache line into
a memory paragraph. Paragraph size = line size.
3. A set is a logical collection of memory which corresponds to some fixed,
specific cache lines. If the paragraphs in one set are continuous, we call it
"blocked"; if they're scattered (we only discuss the typical case where
adjacent paragraphs are assigned to different sets in a round-robin fashion),
"cyclic". Also used to call the collection of cache lines which correspond to
a memory set. If number of sets > 1, there must be multiple cache lines or it
doesn't make any sense.
4. Alignment.
(size of the memory space) mod (line/paragraph size) = 0
(starting address of a paragraph) mod (line/paragraph size) = 0
5. Associativity. N-way associative: one paragraph in memory can be mapped to
N different lines, within a set though. Direct mapped = 0-way associative.
Set-Associative: associative within each set.
6. Lines cannot just store data - there are overhead bits for other information:
- a valid bit indicating whether this line is in use
- a dirty bit indicating whether this line is out of sync (only necessary
for write back policy)
- a few bits for (relative) aging. Number = log2 (number of lines for one
set) because we only need to rank those who belong to the same set
- some bits for tag. Tag size = M - log2(L) - log2(N), where M
is memory address length in bits, L is the line size and N is the number
of sets. See below for explanation.
7. Tag is some minimum bits stored in a line, so that the cache controller can
identify which address this line maps to. Say we have a 32-bit address ... and
we want to store its data in cache. Say line size L = 32 bytes. So take any
paragraph (32 continuous bytes whose starting address is aligned), every
addressable byte in it should reside in the same line, which means that, for
this particular line, to see where it maps to, only their first (32 - log2(L))
bits matter.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | | | | | | | | | | | | | | | | | | | | | | | | | | |X|X|X|X|X|
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Now we consider sets. Assume that we have 4 blocked sets, by looking at the 2
most significant bits we know which set this address belongs to.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|X|X| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If, however, we have 4 cyclic sets, by looking at these 2 bits, we know which
set this address belongs to. Because change in these 2 bites means a different
paragraph, which belongs to another set, for it's cyclic.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| | | | | | | | | | | | | | | | | | | | | | | | | |X|X| | | | | |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
So we can save another 2 bits, for when the cache controller must find if an
address is cached or not, it can use the 2 bits to decide which set it belongs
to, thus it knows the corresponding cache lines to check with.
8. LRU. Replacement policy. Using age (say 8 lines, youngest 000, oldest
111). When hit, that line becomes 000 and every line who was younger than it
before gets +1).
9. Write Back/Once/Through
When there's a write-miss: Write By, Allocate-on-Write
posted at: 17:54 | path: / | permanent link to this entry
A few points for me to remember... The assumption is that the reader understand the basic idea of Monad:
The definition of State, its >>= and return are defined as
data State s a = State {runState :: s -> (a, s)}
instance Monad (State s) where
return a = State $ \s -> (a, s)
m >>= k = State $ \s ->
let (a, s') = runState m s
in runState (k a) s'
Note that the instance is (State s) not State itself so the wrapped value is of type a. Also note that the runState function inside State is constructed by >>=. The initial state
To understand this, first look at runState. Yeah it's the name of the only field of State. The tricky part is that it can be used as a function which takes an instance of State and returns this instance's runState field, which is a function of type "s -> (a, s)". Okay read the last sentence twice before we move on.
"return a" takes the unwrapped value of type a and gives the State instance.
For >>=, m is the previous computational result (a monad value of course) and k is the next computation. The new result (a new State instance) can be seen as a thunk. Basically (runState m) extracts the function of type (s -> (a, s)) from the previous monad. Then the new state s'
posted at: 22:22 | path: / | permanent link to this entry