Here's an intereting little optimization problem that occured to me while working on a side job I have of labeling the contents of audio tapes. Basically, the job involves archiving cassette tapes that have multiple concerts or sets recorded on them. I have to record the time into the tape at which the next recorded set starts. The only way to accomplish this is to manually rewind / ff, and hit play. Most of the time there's at most one additional recording per 45 minute side. To simplify the problem, I assumed that this is the case.

If you are getting paid on a task basis, maximal efficiency is of utmost importance, especially if you are busy. In reality, I'm getting paid on an hourly basis, so I want to do the opposite, but I just as easily could get paid that way. The problem is then, how do you get the most tapes done in the least amount of time? In other words if the location of the beginning of the second set on the tape is completely unknown and random for all intents and purposes, what is the best method of hitting fast forward, play and rewind to use that will ensure you find the location of the beginning of the second set quickest?

It simplifies the problem somewhat to assume that the second set continues to play until the end of the tape, so that you're guaranteed to reach some point in the second set if you fast forward all the way to the end of the tape. Also, there may be different answers depending upon whether efficiency is gauged in terms of # of stops on the tape (or times you hit play) versus amount of time spent on the search. It also depends on how realistically you are going to model the actual time it takes to press buttons, reverse direction etc. I'm not absolutely sure there's an optimal strategy if the set placement on the tape is completely random. But I have some preliminary thoughts on at least creating a model that might lead to an answer:

  • It's good to fast forward not too far into the tape and hit a point where there is music recorded. This means the beginning of the set is between where you are (which is not too far), and where you started. It's equally good to fast forward far into the tape and hit a point where there isn't music recorded. This means the beginning of the tape is between where you are, and the end of the tape (which is not too far away). But, you're just as likely to find music recorded far into the tape than to not find music recorded not too far into the tape, so neither choice is more likely to pay off.
  • However, all other things being equal, if you choose to fast forward not too far into the tape, you have wasted less time. So the general algorithm is something like always choose to fast forward to the point not too far into the tape.
  • I can't prove that it's best, but it simplifies the problem a lot if you just assume that the initial move is to fast forward exactly half-way through the tape. If you do this, when you push play, then you are faced with two equally likely options:
1) You hear music
2) You do not hear music
As I said, I can't prove that this is the best fast forwarding / rewinding regime, but it seems equivalent to any other. Since if you initially fast-forward less than half-way into the tape, say 30% of the way, then if you happen to hear music, then you have effectively diminished the length on the tape you have to explore to 30% of the original length of the tape; however, assuming that the placement of the set on the tape is random, you have only a 30% chance of attaining this outcome, so the advantage seems to be nullified. After you fast-forward to where you are going to stop, you push play and listen for a period of time. To simplify the problem, I assumed you always listen for the same amount of time.
As suggested before, if you are to choose as your first move fast forwarding to exactly the half-way mark, then it logically follows that you would choose the same protocol as your second move. This follows from the fact that after your first move, you are logically faced with the same problem you are faced with on the first move: you are given a length of tape over which the beginning of a set of music is equally likely to appear. The only difference is that the movement can now take place in two directions: either backward or forward depending on whether you hear music or not. However this doesn't affect the amount of time you spend, so you don't need to take it into account. The pattern equally applies to the next step, and the next step, and so on. Therefore you can express the total time, which you want to optimize, in terms of time spent fast-forwarding/rewinding and spent listening to the tape. If "f" is the speed of fast-forwarding/rewinding, "w" is the length of the whole tape, and "d" is the standard displacement due to listening to the tape - which is also equal to the time spent listening to the tape, since the play function operates in real time...
T = (1/2(w/f) + d) + (1/4(w/f) + d) + (1/8(w/f) + d) + ...
The first parentheses indicates the time you spend searching and listening on the first attempt, the second parentheses expresses this for the second time, etc. I assume that the displacement on the tape, d, is negligible compared to the displacement traversed in fast-forwarding/rewinding. Of course, this is a worse and worse approximation with each round of fast-forwarding.
The total time T is just the sum from n=1 to n=N of [1/2^n(w/f)] + Nd, where N is the number of times you apply the protocol. However the sum doesn't continue infinitely. There comes a point where the displacement you fast-forward into the tape is less than the displacement into the tape that you pass by from listening to it, and at this point it will be useless to continue the cycle further, since you will have accounted for all the space on the tape. This occurs when d > 1/2^n(w/f) . If you solve for n you get n > log2(w/fd). Therefore, this is your upper limit to the sum, N. Stating the sum again
T = sum from n=1 to log2(w/fd) of [1/2^n(w/f)] + log2(w/fd)d
You can actually figure out that sum based on a simple formula. To spare you the tedium, unless I've made a mistake, the sum comes out to
T = w/f - d + log2(w/fd)d
At this point you can find the listening time, d, at which the total time, T, is lowest by solving a simple max/min problem. Once you take the derivative of T with respect to d and set it equal to 0, you can solve for d... You get
d = (w/f)/2^(1/ln2 + 1), which is somehow a pleasing answer. Unfortunately, this is a maximum, not a minimum. So this tells the optimal amount of time you would want to be listening if you wanted it to take as long as possible to find the spot in the tape. Basically it says if you reduce the total time it takes you to fast-forward through the entire side by a factor of 1/2^(ln2 + 1) you get the optimal time you should spend listening. The speed of the fast-forward mechanism will vary by tape recorder, but mine has a speed of 16 1/2 minutes tape displacement per minute. Therefore, working with my tape recorder, with a 45 minute-sided tape, I will optimally want to spend 3o seconds listening to the tape per round before I choose fast-forward or rewind. This is the protocol you would want to use if you were getting paid by the hour, as I am, and if you for some reason had to appear as if you were trying to complete the task as quickly as possible. However, it should be noted that in the real situation, this is only a local maximum, as one easy way to spend much more time is to just set d so that it exceeds w/f.
There is a minimum of the function for T, and it occurs when the time spent listening to the tape approaches 0. It's interesting that the value of the equation T = w/f - d + log2(w/fd)d in the limit where d --> 0 is T= w/f, which is the original time it takes to fast-forward through the entire tape. This constitues the maximum time you would spend searching for the spot in the tape if you were using the minimizing protocol. On my tape recorder, it takes 2 minutes and 45 seconds to fast-forward through an entire tape. This is probably the one actually useful result to come out of all this, since it seems intuitively obvious that you would want to minimize the time spent listening to the tape as much as possible if your goal was to find a spot on the tape the fastest. This is the protocol you would want to use if you are getting paid by the amount of work you get done.
Looking back, you can judge whether the "d is much less than w/2^n" approximation is a good one. Using the maximize protocol, you can go about 5 rounds before the distance you displace on the next fast-forward/rewind is comparable to the distance you displace listening to the tape. You can calculate that probability that 5 rounds will be sufficient to locate the spot on the tape. Since you always spend 30 seconds, the chance of finding it on the first blind attempt is 30/(45*60), on the second attempt, 30/1350...and the total sum of all these is somewhere around 30%. This is obviously not a very good approximation. A new model is clearly needed. Of course when d=0 the approximation is still good, so the minimize protocol is definitely good.
Pretty much the only valuable thing to come out of all this inquiry is the realization that your optimum minimize strategy is to listen for as little as possible. In fact I'm not sure if any of these results are even meanginful with all the approximations I made. Perhaps someone will come to another answer, or one that's based on fewer approximations?

1 comment:

Grobstein said...

Actually, I got confused pretty early on and so can't offer any good answers to the questions you pose at the end. Here's my confusion: when you play the tape and either here music or don't, how do you know where you are relative to the one or two sets that are recorded on the tape?

If you hear music, it could be the music either of set 1 or set 2 (right?), which means the point at which you hear music could be bounded by the start of set 1 and the end of the tape, or by the start of set 1 and the start of set 2, or by the start of set 2 and the end of the tape. How do you disambiguate?

There seems to be a similar problem if you don't hear music.

I know you probably actually use these techniques in your job, so I'm sure they work, but I'm having a hard time figuring out how.