9/22/2012

09-22-12 - Input Streams and Focus Changes

Apps should have an input/render thread which takes input and immediately responds to simple actions even when the app is busy doing processing.

This thread should be able to display the current state of the "world" (whatever the app is managing) and let you do simple things like move/resize windows, scroll, etc. without blocking on complex processing.

Almost every app gets this wrong; even the ones that try (like some web browsers) just don't actually do it; eg. you should never ever get into a situation where you browse to a slow page that has some broken script or something, and that causes your other tabs to become unresponsive. (part of the problem with web browsers these days of course is that scripts are allowed to do input processing, which never should have been allowed, but anyhoo).

Anyway, that's just very basic and obvious. A slightly more advanced topic is how to respond to input when the slow processing causes a change in state which affects input processing.

That is, we see a series of input commands { A B C D ... } and we start doing them, but A is some big slow operation. As long as the commands are complelety indepenent (like "pure" functions) then we can just fire off A, then while it's still running we go ahead and execute B, C, D ...

But if A is something like "open a new window and take focus" , then it's completely ambiguous about whether we should go ahead and execute B,C,D now or not.

I can certainly make arguments for either side.

Argument for "go ahead and process B C D immediately" :

Say for example you're in a web browser and you click on a link as action A. The link is very slow to load so you decide you'll do something else and you center-click some other links on the original page to open them in new tabs. Clearly these inputs should be acted on immediately.

Argument for "delay processing B C D until A is done" :

For similarity we'll assume a web browser again. Say you are trying to log into your bank, which you have done many times. You type in your user name and hit enter. You know that this will load the next page which will put you at a password prompt, so you go ahead and start typing your password. Of course those key presses should be enqueued until the focus change is done.

A proponent of this argument could outline two clear principles :

1. User input should be race free. That is, the final result should not depend on a race between my fingers and the computer. I should get a consistent result even if the processing of commands is subject to random delays. One way to do this is :

2. For keyboard input, any keyboard command which changes key focus should cause all future keyboard input to be enqueued until that focus change is done.

This certainly bugs me on a nearly daily basis. The most common place I hit it is in MSVC because that's where I spend most of my life, and I've developed muscle-memory for common things. So I'll frequently do something like hit "ctrl-F stuff enter" , expecting to do a search for "stuff" , only to be dismayed to see that for some inscrutable reason the find dialog box took longer than usual to open, and instead of searching for "stuff" I instead typed it into my source code and am left with an empty find box.

I think in the case of pure keyboard input in a source code editor, the argument for race-freeness of user input is the right one. I should be able to develop keyboard finger instinctive actions which have consistent results.

However, the counter-example of the slow web browser means that this is not an obvious general rule for user inputs.

The thing I ask myself in these scenarios is "if there was a tiny human inside my computer that was making this decision, could they do it?". If the answer to that question is yes, then it means that there is a solution in theory, it just may not be easy to express as a computer algorithm.

I believe that in this case, 99% of the time a human would be able to tell you if the input should be enqueued or not. For example in the source code "ctrl-F stuff" case - duh, of course he wants stuff to be in the find dialog, not typed into the source code; the human computer would get that right (by saying "enqueue the input, don't process immediately"). Also in the web browser case where I click a slow link and then click other stuff on the original page - again a human would get that right by saying "don't enqueue the input, do process it immediately").

Obviously there are ambiguous cases, but this is an interesting point that I figured out while playing poker that I think most people don't get : the hard decisions don't matter !

Quickly repeating the point for the case of poker (I've written this before) : in poker you are constantly faced with decisions, some easy (in that the right answer is relatively obvious) and some very hard, where the right answer is quite unclear, maybe the right answer is not what the standard wisdom thinks it is, or maybe it requires deep thought. The thing is, the hard decisions don't matter. The reason they are hard is because the EV (expected value) of either line is very close; eg. maybe the EV of raise is 1.1 BB and the EV of call is 1.05 BB ; obviously in analysis you aren't actually figuring out the EV, but just the fact that it's not clear tells you that either line is okay.

The way that people lose value in poker is by flubbing the *easy* decisions. If you fold a full house on the river because you were afraid your opponent had quads, that is a huge error and gives up tons of value. When you fail to do something that is obviously right (like three-betting often enough from the blinds against aggressive late position openers) that is a big error. When you are faced with tricky situations that poker experts would have to debate for some time and still might not agree what the best line is - those are not important situations.

You can of course apply the same situation to politics, and here to algorithms. People love to debate the tricky situations, or to say that "that solution is not the answer because it doesn't work 100% of the time". That's stupid non-productive nit picking.

A common debate game is to make up extreme examples that prove someone's solution is not universal or not completely logical or self-consistent. That's not helpful. Similarly, if you have a good solution for case A, and a good (different) solution for case B, a common debate game is to interpolate the cases and find something in the middle where it's ambiguous or neither solution works, and the sophomoric debater contends that this invalidates the solutions. Of course it doesn't, it's still a good solution for case A and case B and if those are the common cases then who cares.

What actually matters is to get the answer right *when there is obviously a right answer*.

In particular with user input response, the user expects the app to respond in the way that it obviously *should* respond when there is an obvious response. If you do something that would be very easy for the app to get right, and it gets it wrong, that is very frustrating. However if you give input that you know is ambiguous, then it's not a bad thing if the app gets it wrong.

7 comments:

Unknown said...

I hope you'll never use OS X... three-finger-sweeps to switch between fullscreen apps ignores all input until the slow animation finishes. Great fun for when you already know what you want to type into the terminal you sweep to.

cbloom said...

I also hope I'll never use OS X.

cbloom said...

So clearly to some extent the horrible modern GUI design (led by Apple primarily) is because they are emphasizing aesthetics and clarity for casual users over efficiency.

Okay, I don't agree with that, and I wish they provided power user modes, but at least it sort of makes sense.

But that's only part of it.

Some of the problem is that "designers" have been called in, and Designers (with a capital D) always ruin products. They're egotistical morons who don't actually learn the way the product is supposed to work before they "improve it". You can take almost any much-lauded product design that makes magazine covers and they are 99% terrible and worse than the product they supposedly improved.

Part of the problem with GUI designers is that they generally suck at using computers, so they design GUIs for people who don't actually use computers, people who are all thumbs and easily confused. Sure, if you don't know how to type and you don't know what a hard disk is, then you might think that the new GUIs are better.

johnb said...

My ideal is:

There should be a single serial input stream, which is processed synchronously in the foreground. Any further input received during processing is queued.

Any action which often takes long enough for me to care about the pause gets done in the background, allowing the foreground to continue processing the serial input stream. No background action is allowed to steal focus: if it's finished it can sit there any wait for me to go look at it, or it can flash some indicator somewhere to notify me that it's done and prompt me to look. It *cannot* steal focus when it's done.

Actions that are "usually" fast, but *might* take a long time should be done in the foreground but with a time limit. If they hit the time limit and they're not done, they get pushed to the background and lose the ability to steal focus when they do complete. For example, if I open a program, it gets 100 milliseconds to get its shit sorted and present a window for me to use. If it's not ready in 100 millis then it gets pushed to the background and when its window is ready it *does not* steal focus, it sits there in the background with its taskbar icon flashing until I explicitly bring it to the foreground.

johnb said...

Some further details:

If an action is on a time limit, input is queued while the action is going on in the foreground. If time limit is reached and the action gets pushed to the background, *that input is deliberately dropped*. The idea that input should never be dropped is wrong. If the system can't keep up with my input then I'd rather it *drops* some input than sending it to the wrong place.

This all requires UI design to accommodate it, of course, it's not just "do everything in background threads". The concepts of foreground vs. background actions here is part of the UI design, not implementation.

This is all a consequence of two rules: 1) never steal focus, ever; 2) keep the UI responsive.

cbloom said...

@ johnb - I certainly agree that after a certain lapse of time, apps should never steal focus.

Like if I double-click an icon to open a program, and it takes over 2 seconds or so to open, then it must not take focus when it starts up.

Another clue would be if I do some user input which starts a delayed reaction (eg. open a new program), and if I then do *other* user input before that program starts - then don't steal focus.

It's terrible when you start a bunch of slow UI ops, and then you go and open your text editor and are typing some text, and then finally the slop ops catch up and they all start stealing focus from each other and you don't know where your key presses are going.

Unknown said...

IIRC, BeOS was designed with this decoupling. Every app had at least a UI thread and a worker thread, and the kernel scheduler was designed to for responsive, interactive, near-realtime workloads.

Of course, you could also make the claim that it is a bad idea to frontload such concurrency issues in the main framework API. Clearly, this is an argument for layering -- there should both be the traditional synchronous message pump (that you can access without any sort of inversion of control) AND a standard decoupled programming interface (which is more like a framework that calls back into your code, asynchronously).

The problem is that you have the worst of both worlds. The standard API tends to be a synchronous framework.

old rants