Tuesday, October 9, 2007

Corrupt Official

A corrupted Chinese official brings $10 mln to Macau... and lost $7 mln of it. He'll be executed anyway, so he must win back $7 mln. His strategy is to play a 50/50 game, and bet $1 mln each time (not optimal, but that's not the point here). When are the chances of him winning back $7 mln?

Follow-up: what's the expected number of times this official will be playing this game before he learns his fate? (i.e. either lose it all or win back $7 mln)

Solution:

The gambler's ruin problem. The are many elegant solutions to this, but I like this the most. Based on the martingale stopping time theorem.

The expected value of his money is always the same--$3 mln, no matter when he stops playing the game, i.e. either when he's lost all or now back at $10 mln.

So, E[Money= ] $3 mln = Prob(going bust) * $0 mln + Prob (Get to $10 mln) * $10 mln

provided such probabilities exist (they do!). Prob (Get to $10 mln) = 1 - Prob(going bust), so we have:

3 = p*0 + (1-p)* 10 = 10 - 10p, 10p = 7/10 , p = 0.7.

In general, it's just a/(a+b). In this case, the chances of him getting executed is 70%.


Solution to the follow-up:

Yet another application of martingales. Martingales are just such wonderful animals.

The trick is to find a suitable martingale. You need to know that B2(t)-t is a martingale, where B(t) is a Brownian motion. This martingale is application here because we're dealing with a random walk.

So, we have:

E[B2(t) - t] = 0
E[B2(T) - T] = 0 T is when game ends; this substitution okay thanks to stopping time theorem
E[B2(T)] - E[T] = 0 expectations are linear
E[T] = E[B2(T)] rearrange

t is time. In this context, it's the number of times our corrupt official plays the game.

So, down to evaluating the left hand side. Substituting the numbers in:

E[B2(T)]
= P(win back $7 mln) (72) + P(lose $3 mln) (32)
= 3/10 * 49 + 7/10 * 9 = 21

So the expected number of games this official plays would be 21. Notice that we're essentially calculating the variance in the last step.

People Lining up at Bus Station

People line up at a bus station according to a Poisson distribution with 5 people / minute. The next bus will arrive some time in the next 2 minutes, with "some time" being a uniform distribution. How many people will get onto the next bus? What is the standard deviation of the number of people?


Solution:

The expected number of people waiting at time t is: E[Poisson(n)|t] = 5t.
Taking the expectation of this value over t = [0,2] we have: E[ E[Poisson(n)|t] ] = E [ 5t ] = 5.

So expect 5 people to get onto the next bus.

The standard deviation: use the variance formula, Var(Y) = Var( E[Y|X] ) + E[ Var(Y|X) ] with Y = number of people and X = time.

We have: Var(people) = Var ( E[People | t ] ) + E[ Var( People | t ) ]
The first term = Var ( E[Poisson(n, t)] ) = Var ( 5t ) = Int(0,2) { 1/2 * (5t - 5)2 }dt = 0.5 * 25 * 1/3 * (t-1)3 | (2, 0) = 25/3.
Second term = E[ Var( Poisson (n,t) | t ] = E[ 5t ] = 5. Variance of Poisson process is just 5 for this case.


Total Variance: 5+ 25/3 = 13 1/3.

So standard deviation is ~ 3.65.

Transformation of Probability Distributions

Given X ~ f(x). What distribution is g(X)?

Solution:

Think in terms of F(x), the CDF, instead of f(x).
Let F_g be the CDF of g(X). Then:


F_g(x) = Prob ( g(X) x ) = Prob ( X g-1(x) )


So the RHS is just: F( g-1(x) ). Now to get PDF, it's just differentiation of this value.

Examples:

Find the CDF of X2, given X ~ f(x), with CDF F(x).

F_x2(x) = Prob (X2 ≤ x) = Prob ( -|x| ≤ X |x|) = F(x) - F(-x) , noting that domain for X2 is [0, ∞).

Taking derivatives, we have:


½ * 1/x * ( F´(√x) - F´(-√x) ) = ½ * 1/x * ( f(x ) - f(-√x ))


Find the CDF of eX, given X ~ f(x), with CDF F(x).

F_eX(x) = Prob (eX ≤ x) = Prob ( X ≤ ln (x) ) = F(ln (x))


Taking derivative of RHS to get PDF:

d/dx (F (ln(x)) = 1/x d/d(ln(x)) F (ln (x)) = 1/x f(ln (x)) (since d/dx F(x) = f(x))


Thursday, June 14, 2007

Swapping

Write a one liner in C/C++ for swapping 2 integers.


Solution:

void swap(int &a, int &b) {
a^=(b^=(a^=b));
}

^ is the bitwise XOR operator. Yeah, it looks like magic... but it's actually not too tough to figure out once you get the gist of the idea. It's tricky, definitely.

Sunday, June 10, 2007

Rainy Day

The probability of it raining on Saturday is 60%, on Sunday 30%. What's the range of probability that it doesn't rain over the weekend?


Solution:

Doesn't rain on Saturday: 40%. Let this be event A.
Doesn't rain on Sunday: 70%. Let this be event B.

If the two events are independent: 28%.
Else: P(A & B) = P(A intersect B).
min P(A intersect B): clearly 10%.
max P(A intersect B): clearly 40%.

So the range is 10% to 40%.

Saturday, June 9, 2007

Conditional Expectation

X is taken from [0,Pi]. What is E[X | sin X] ?

Solution:

Tests the understanding of conditional expectations.

First off, E[X|sin X] is a function, namely E[X|sin X] is E[X|sin X=y](y).

Figure out the sigma algebra generated by sin X. The interval [0,Pi] is our sample space, so sin X belongs to [0,1]. So sigma(sin X) is gonna be some Borel subset of the complete Borel set on [0, Pi].

The elements of this generated Borel subset are symmetric about Pi/2--for instance, take sin X = 0.5 --> X belongs to {Pi/6, 5*Pi/6}. This is evident, just graph sin X over [0, Pi]. So,

E[X|sin X=Y for any Y] = E[ X belongs to a set that is in the borel sets symmetric around Pi /2 ] = Pi/2.

which is independent of Y, the constant function Pi/2.

Maximizing Variance

X is an r.v. from the interval [0,1]. What's the maximum variance for X?


Solution:

Intuitively, you'll guess it's 1/4--just binomial from {0,1}. The proof:

Instead of looking at the interval [0,1], let's look at the inter [-1/2, 1/2]. This does not change the variance properties. We have Var[X] = E[X^2] - E[X]^2 <= E[X^2] <= 1/4. So the upper bound is 1/4. Can the upper bound be reached? Sure, take bionomial from {-1/2, 1/2}.

Flipping Coins

There's a machine that keeps flipping two unbaised coins at once (i.e. 4 possible outcome per operation). Two guys, A and B. A wins if a (H,H) combination comes up. B wins if two consecutive (H,T) combinations come up (by (H,T) I mean one head and one tail from either of the coins). What's the probability of A winning?



Solution:

This one is tricky is you condition by when the game ends, because the states get mangled up. Easier is to do the following.

Let P(A) be the probability that A wins. Then, we have:

P(A) = 0.25 + 0.5 * (0.25 * 1 + 0.5 * 0 + 0.25 * P(A) ) + 0.25 * P(A)

explanation:

1st term on right: 0.25: Prob of (H,H)
2nd term on right: if either (H,T) or (T,H), which has prob 0.5, A wins if (H,H) (0.25 prob), loses if either (H,T) or (T,H) again (0.5 prob), or the game essentially resets if (T,T), (0.25 prob).
3rd term on right: if (T,T), game essentially restarts, since (T,T) doesn't help B.

Solving for P(A) we get 60%.

Thursday, June 7, 2007

Probability

Given n uniform random variables that takes value between 0 and 1, what is the probability that they sum up to less than 1?


Solution:

If you think about it, it's just the volume of the n-dimensional triangle. That volume would be 1/n!, hence the 1/n! is the answer.

Monday, February 26, 2007

Linearity of Expectation

100 people arrive at a party. Each of them puts his hat away in a common area. When they leave, they randomly pick a hat from the common area. What is the expected number of people leaving with the hats they arrived with?

Solution:

This is a classic. The answer is 1, essentially because of linearity of expectations:

E[total#matching hats] = Sum {i} E[ith guy leaving with matching hat]

Each E[ith guy leaving with matching hat] is equal, with value == 1/100.

A priori, there's no reason to assume E[ith guy leaving with matching hat] would be identical... but turns out they are. The proof is relatively straightforward and the application of this result is quite surprising in less obvious cases.

Burning the chopsticks

You have two identical pieces of chopsticks... they burn at different rates and not at steady rates. BUT! They only burn for one hour.

You have to use them to show 45 minutes exactly. How would you do it?

Crossing the bridge

You have four people at a bridge... one HAS to carry a flashlight as they cross, but only two can cross at a time. And nobody can go alone.

It only takes the first person 1 minute to cross. The second person takes 2 minutes. The third person takes 5 minutes. And the last one takes 10 minutes.

If the person who takes 1 minute goes with the person who takes 10, it automatically takes 10 minutes.

How many minutes will it take them all to cross at the quickest?

Check whether a singly linked list contains a loop

Given a singly linked list, determine whether it contains a loop or not.

ANS. (a) Start reversing the list. If you reach the head, gotcha! there is a loop!
But this changes the list. So, reverse the list again.
(b) Maintain two pointers, initially pointing to the head. Advance one of them one node at a time. And the other one, two nodes at a time. If the latter overtakes the former at any time, there is a loop!

p1 = p2 = head;

do {
p1 = p1->next;
p2 = p2->next->next;
} while (p1 != p2);

Jars of Pills

You have 5 jars of pills. Each pill weighs 10 gram, except for contaminated pills contained in one jar, where each pill weighs 9 gm. Given a scale, how could you tell which jar had the contaminated pills in just one measurement?

ANS. 1. Mark the jars with numbers 1, 2, 3, 4, and 5.
2. Take 1 pill from jar 1, take 2 pills from jar 2, take 3 pills from jar 3, take 4 pills from jar 4 and take 5 pills from jar 5.
3. Put all of them on the scale at once and take the measurement.
4. Now, subtract the measurement from 150 ( 1*10 + 2*10 + 3*10 + 4*10 + 5*10)
5. The result will give you the jar number which has contaminated pill.

Apples, Oranges and Mixes

There are 3 baskets. one of them have apples, one has oranges only and the other has mixture of apples and oranges. The labels on their baskets always lie. (i.e. if the label says oranges, you are sure that it doesn't have oranges only,it could be a mixture) The task is to pick one basket and pick only one fruit from it and then correctly label all the three baskets.

HINT. There are only two combinations of distributions in which ALL the baskets have wrong labels. By picking a fruit from the one labeled MIXTURE, it is possible to tell what the other two baskets have.

Cake Cutting Technique

Given a rectangular (cuboidal for the puritans) cake with a rectangular piece removed (any size or orientation), how would you cut the remainder of the cake into two equal halves with one straight cut of a knife?

ANS. Join the centers of the original and the removed rectangle. It works for cuboids too! BTW, I have been getting many questions asking why a horizontal slice across the middle will not do. Please note the "any size or orientation" in the question! Don't get boxed in by the way you cut your birthday cake :) Think out of the box.

Reverse a linked list

Reverse a singly linked list recursively.
The function prototype is node * reverse (node *) ;

node * reverse (node * n)
{
node * m ;

if (! (n && n -> next))
return n ;

m = reverse (n -> next) ;
n -> next -> next = n ;
n -> next = NULL ;
return m ;
}


Alternatively, we can use 3 points to reverse the list:

void reverselist(void)
{
if(head==0)
return;
if(head->next==0)
return;
if(head->next==tail)
{
head->next = 0;
tail->next = head;
}
else
{
node* pre = head;
node* cur = head->next;
node* curnext = cur->next;
head->next = 0;
cur->next = head;
for(; curnext!=0; )
{
cur->next = pre;
pre = cur;
cur = curnext;
curnext = curnext->next;
}
curnext->next = cur;
}
}

Find the permutations of a string using recursion

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <iostream>
using namespace std;

void swap(char* src, char* dst)
{
char ch = *dst;
*dst = *src;
*src = ch;
}
/* permute [set[begin], set[end]) */
int permute(char* set, int begin, int end)
{
int range = end - begin;
if (range == 1) {
printf("set: %s\n", set);

} else {
for(int i=0; i<range; i++) {
swap(&set[begin], &set[begin+i]);
permute(set, begin+1, end);
swap(&set[begin], &set[begin+i]); /* set back */
}
}
return 0;
}
int main()
{
char str[] = "abcd";
permute(str, 0, strlen(str));
return 0;
}

Sunday, February 25, 2007

Two Normal Distributions

Given X,Y, both normal random distributions (N(0,1)). Their correlation is rho. If I tell you X is negative, what can you tell me about Y?

Solution:

Draw All Balls

There are 100 distinguishable balls in an urn. Each time you draw one ball and put it back. How many times do you expect you need to draw in order to have drawn each ball at least once?

Solution:

So on your first draw, you obviously draw a ball you've never drawn before. On your next draw, there's a 99/100 chance you draw a different one. After you've drawn 2 distinct balls, there's a 98/100 chance you draw a different one from those 2. And so on. The expected number of draws is just the reciprocal of the probability of that event happening.

So we simply have: 1 + 100/99 + 100/98 + ... +100/3+100/2+100.

I wasn't asked to evaluate this sum during the actual interview, and quite frankly I wouldn't know how to do this exactly. I probably would have taken the integral of 1/x over 1 to 100, which is ln 100 ~= 4.61 so about 461 draws.

The actual answer is 518.74 so that approximation would be way off--not surprising. I'm gonna be anal here and look at the graph--okay, so half of the 1st bar of the sum I'm approximating by the integral isn't included. That got an area of 50! Add 50 to 461 and we have 511. That's much better.

I suppose at this point, if you have gone this route, an interviewer could start talking about numerical methods, especially if he happened did his thesis on something related.

ps. check out the harmonic series page from mathworld. the closed form formula for the sum is... certainly not fair game for an interview.

Saturday, February 24, 2007

Random Reduction

This one I like a lot.

N_1 = 10^100.
N_2 = random number picked from 1 to N_1.
...
N_k = random number picked from 1 to N_(k-1).

For what k do you expect to see N_k = 1?

Solution:

The following is my solution, passing to using continuous probabilities. It's a little long-winded, so maybe there's a better solution, especially during an interview. Let me know!

Basically, we're applying some random number X of uniform distribution k-1 number of times:

N_k = X_1 * X_2 * ... X_(k-1) * N_1

Dealing with products is tough, so the natural thing to do would be to take the log of both sides:

log (N_k) = log (X_1) + ... + log (X_(k-1)) + log (N_1)

which leads to:

- log (N_1) = Sum{i from 1 to k-1} log (X_i)

The X_i's are independent, so the question now becomes:

Find k such that:

(k-1) E[log(X)] <= -log(N_1)

Where the minus signs are seriously annoying.

So what's E[log(X)], X being a uniform distribution from [0,1]? That's an integration exercise, taking the integral of X*logX from 0 to 1, dX...

which yields a value of -1. Surprising, no?

which means you would expect N_1 by a factor of e every time you multiply it by a uniform random variable. So 100/(1/e) = 100e ~= 271.8.

Add 1 to that and you have k = 272.8, or 273 to take the next greatest integer.

There has gotta be a more elegant solution to this!

Friday, February 23, 2007

Sorting

A favorite category of quant interview questions is sorting and complexity. Obviously, if you're doing quant, most of the time you'll be coding, and a good understanding of computer science (algorithms in particular) is crucial.

Here's one that's more computer science oriented than most, but fair game nonetheless:

You're given 1 billion integers to sort. You have 1Gb in memory. How would you achieve the task? Complexity?

Would you do it differently if the integers are guaranteed to be unique? Complexity?


Solution:

Basically, some kind of merge sort. Assume memory M and integers N. Step 1: sort N/M length-M lists. Step 2: merge the N/M lists.

Yeah, for non-CS people, it's time to review the various ways to sort. For step 2, review heaps. The answer is straightforward if you remember what a heap is. The complexity is some combination of N log M and N log (N/M).

If the integers are guaranteed to be unique... this is kinda neat. So you have 1Gb in memory, which is more than enough bits to use each bit as a flag to indicate the existence of a, say 32-bit integer. So simply go through your list of N integers and populate the corresponding bit in memory for each of those integers.

This will yield a O(N) solution. Essentially a simple bucket sort.

Wednesday, February 7, 2007

Remove Node in Linked List

This is apparently a question from Microsoft.

Given a singly linked list. You're given a pointer to a node in the list. Give me an algorithm that efficiently removes that node from the list.

Solution:

Some might consider this solution a cheat, but really it's quite elegant. Obviously, you can traverse the entire list up to that node, but equally obvious is that it's not the solution they'll be looking for. You should do this: copy the data from the next node into this node. Now erase the next node and update where the given node is pointing to.

Unless copying data is the bottleneck, this should be the best solution in most cases. Perhaps there are times when the address of the node is critical, but not in an interview question, and certainly not in this interview question.

Scarabs on a Dodecagon

From a problem set, but doesn't mean it's not a good problem.

Consider a regular and convex dodecagon (a polygon with 12 sides). Two scarabs initially located on two opposite vertices (summit) move at each time step along an dege. They ove independently from each other clockwise or counter-clockwise with probability 1/2. What is the proportion of time that the two scarabs spend together (on the same summit, after a large number of steps)?

Solution:

An "official" solution would be: draw your transition graph, write down your transition matrix, calculate your stationary distribution.

A much faster solution would be to simply observe that the distance between the two scarabs would always be even, and since the two scarabs move independently of each other, the expected proportion of time they spend together would simply be 1/6.

Wednesday, January 24, 2007

Crazy Guy On The Airplane

Crazy Guy On The Airplane

A line of 100 airline passengers is waiting to board a plane. they each hold a ticket to one of the 100 seats on that flight. (for convenience, let's say that the nth passenger in line has a ticket for the seat number n.)

Unfortunately, the first person in line is crazy, and will ignore the seat number on their ticket, picking a random seat to occupy. all of the other passengers are quite normal, and will go to their proper seat unless it is already occupied. if it is occupied, they will then find a free seat to sit in, at random.

What is the probability that the last (100th) person to board the plane will sit in their proper seat (#100)?

Solution

The first guy might chose his own seat (1/100) or the 100th person's seat (1/100). These cases have equal probability, and whatever happens the rest of the way won't matter.

The interesting cases are when he picks some seat that's neither his own or #100. Now, when the next guy comes in, he'll pick either his unoccupied seat or a random seat. What he does depends on the first guy, but after he's done we have the same problem from the start minus 1 person. To spell it out, if his seat is unoccupied, he takes it, and the crazy guy is still occupying someone else's seat. Effectively now we're down to 99 seats since 1 seat has been properly taken. The other case is if his seat is occupied, and he'll pick the crazy guy's seat (1/99) or the #100's seat (1/99) with equal probability, the uninteresting cases. The rest of the time, he picks some other seat. Notice that this again reduces the size of the problem by 1--there's only one wrongly occupied seat.

There's always a maximum of 1 wrongly occupied seat, so the problem is recursive (in a backward sense). The base case is 2, and the probability is 1/2 either way and thus there's a 1/2 chance for the last guy to take his own seat.

Graphically, the "search path" is:

{1/100 or 1/100} or {
{1/99 or 1/99} or {
{1/98 or 1/98} or {
...
}}}

I suppose it's much clearer this way. Just keep in mind the statement before "or" happens P=2/100 of the time and P=98/100 for the statement after "or", not that it matters in this problem.