<feed xmlns="http://www.w3.org/2005/Atom"><title>Chromium Notes</title><id>tag:neugierig.org,2009:chromium-notes</id><link href="http://neugierig.org/software/chromium/notes/" /><link href="http://neugierig.org/software/chromium/notes/atom.xml" rel="self" /><updated>2011-08-31T21:34:00Z</updated><author><name>Evan Martin</name><email>evan@chromium.org</email></author><entry><id>tag:neugierig.org,2009:chromium-notes/2011-08-31/windows-hookers</id><updated>2011-08-31T21:34:00Z</updated><title>Tracking down a mysterious Windows crash</title><link href="http://neugierig.org/software/chromium/notes/2011/08/windows-hookers.html" /><content type="html">&lt;p&gt;&lt;em&gt;Today's post is a guest post from Eric Roman.  He wrote a slightly
more expletive-laden version of this post inside Google and I asked
him if I could post it here.  It serves as a good illustration of how
deploying software on more than a hundred million different users'
computers is a nearly-biological fight.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Understanding stability of Windows applications is really hard.&lt;/p&gt;
&lt;p&gt;Recently, @apatrick has concluded a heroic investigation into one of
Chrome's most mysterious top crashes for Windows. He just committed a
"fix" for it on the canary channel, which seems to be working!&lt;/p&gt;
&lt;p&gt;This debugging journey is pretty epic, so I'm giving a blow-by-blow
summary of it below. If you want to jump straight to the conclusion,
see &lt;a href="http://crrev.com/96807"&gt;r96807&lt;/a&gt; for the spoiler (can also read
comments in &lt;a href="http://crbug.com/81449"&gt;bug 81449&lt;/a&gt; for the full novella).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Prelude&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Our story begins a year ago, when we first began tracking crashes in a
generic &lt;code&gt;RunnableFunction&amp;lt;&amp;gt;::Run()&lt;/code&gt; location. These crashes had
established themselves as a top browser crash for Windows Chrome. At
the time, Huan and I unsuccessfully looked into the issue, but
couldn't make heads or tails of it (&lt;a href="http://crbug.com/54307"&gt;bug
54307&lt;/a&gt;). The basic format of the crash looked
something like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;0377fd78 0377fdb0 0x0
0377fd7c 0254c3c3 0x377fdb0
0377fd84 021ba1ae chrome_1c30000!RunnableFunction&amp;lt;void (__cdecl*)(void *),Tuple1&amp;lt;void *&amp;gt; &amp;gt;::Run+0xc
0377fd8c 021baa21 chrome_1c30000!`anonymous namespace'::TaskClosureAdapter::Run+0xb
0377fdb0 021baaa6 chrome_1c30000!MessageLoop::RunTask+0x81
0377fdc0 021bae47 chrome_1c30000!MessageLoop::DeferOrRunPendingTask+0x28
0377fdf8 021d1b24 chrome_1c30000!MessageLoop::DoWork+0x80
0377fe24 021ba960 chrome_1c30000!base::MessagePumpDefault::Run+0xc2
0377fe30 021ba8e5 chrome_1c30000!MessageLoop::RunInternal+0x31
0377fe38 021ba7d9 chrome_1c30000!MessageLoop::RunHandler+0x17
0377fe58 021c6530 chrome_1c30000!MessageLoop::Run+0x15
0377fe5c 021c6643 chrome_1c30000!base::Thread::Run+0x9
0377ffa8 021c1a2f chrome_1c30000!base::Thread::ThreadMain+0xa1
0377ffb4 7c80b713 chrome_1c30000!base::`anonymous namespace'::ThreadFunc+0x16
0377ffec 00000000 kernel32!BaseThreadStart+0x37
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Above we can see that the crash is due to jumping to instruction
pointer of 0. And judging by the top frames, it looks like there was
some stack corruption at work (notice how the first frame's alleged
return address is actually a stack location).&lt;/p&gt;
&lt;p&gt;The callstack itself isn't terribly helpful though, since we can't
tell what code was actually running prior to the crash (it was gobbled
up by the stack corruption).&lt;/p&gt;
&lt;p&gt;Moreover, the source location of &lt;code&gt;RunnableFunction&amp;lt;&amp;gt;::Run()&lt;/code&gt; doesn't
help narrow things in the slightest, since pretty much all of Chrome's
code runs through this path (Chrome relies heavily on message passing
to post asynchronous tasks to another thread's message loop).&lt;/p&gt;
&lt;p&gt;Really, all we know at this point is that "some task" got posted to
"some thread", and then crashed at "some point" after running this
task.&lt;/p&gt;
&lt;p&gt;There is one interesting piece of information that can be inferred
from the minidumps: based on the crashed thread's index, it is likely
the crashing task was running on Chrome's "Child process launcher"
thread. In fact, a subsequent instrumentation
(&lt;a href="http://crrev.com/58786"&gt;r58786&lt;/a&gt;) confirmed that all of these crashes were
happening on the child process launcher thread. Since there is very
little code that legitimately runs on this thread, we did a full
code-flow analysis of all the paths that could post tasks. But we did
not discover any problematic codepaths (I was hoping to stumble across
something bad like a use-after-free).&lt;/p&gt;
&lt;p&gt;Without any extra leads, (as well as a temporary dip in the crash's
frequency lowering its priority) the mystery bug got pushed onto the
back-burner.&lt;/p&gt;
&lt;p&gt;It lay dormant for the next 9 months, waiting for a new champion to
take up arms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Part II&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In May, Al Patrick (an innocent bystander), is assigned the bug on
suspicion that it is a regression. (At this point the callstack has
morphed a bit due to various optimization ambiguities, but it is
essentially the same bug I had failed to solve earlier).&lt;/p&gt;
&lt;p&gt;Our new hero starts off by adding some instrumentation trying to see
if a bad function pointer is ever being directly passed to a runnable
function (&lt;a href="http://crrev.com/85359"&gt;r85359&lt;/a&gt;). This doesn't turn up anything
salient.&lt;/p&gt;
&lt;p&gt;Next he instruments posted tasks to retain the location where they got
posted from (&lt;a href="http://crrev.com/85991"&gt;r85991&lt;/a&gt;). This change is absolutely
brilliant! I have benefited from it many times since it was
introduced, to help debug crash dumps.&lt;/p&gt;
&lt;p&gt;The instrumentation is cheap yet effective: whenever posting a task,
the &lt;code&gt;FROM_HERE&lt;/code&gt; macro (that was being used in debug builds to save
filename/line numbers) now saves the instruction pointer into the
&lt;code&gt;PendingTask&lt;/code&gt;. Later when the task is de-queued by the target thread's
message loop, this same instruction pointer (i.e. the birthplace of
the task) is pushed onto the stack prior to calling the task's virtual
function. That way should a crash happen later, you can poach the
address of the birthplace off the stack during postmortem dump
analysis!&lt;/p&gt;
&lt;p&gt;This instrumentation reveals that the problem tasks were posted by
&lt;code&gt;ChildProcessLauncher::Context::Terminate()&lt;/code&gt;, suggesting that the task
itself was a runnable function on
&lt;code&gt;ChildProcessLauncher::Context::TerminateInternal()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So far so good, but we still have no idea why it is crashing.&lt;/p&gt;
&lt;p&gt;Next, our hero instruments the base runnable function/method tasks in
Chrome to detect use-after-frees (&lt;a href="http://crrev.com/86447"&gt;r86447&lt;/a&gt;), as well
as other memory molestation (by preserving the value of the function
pointer into the minidump). This instrumentation reveals that not only
was the task alive and well at the time it was run, but the function
pointer was also untouched. This is a major breakthrough in the
investigation, since it tells us conclusively that whatever craziness
has happened, it occurred while executing
&lt;code&gt;ChildProcessLauncher::Context::TerminateInternal()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This is where things start to get weird. Looking at &lt;code&gt;TerminateInternal&lt;/code&gt;
(the function that is blowing up) there is hardly any code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;chrome_1c30000!ChildProcessLauncher::Context::TerminateInternal:
025e0cea push ebp
025e0ceb mov ebp,esp
025e0ced mov eax,dword ptr [ebp+8]
025e0cf0 mov dword ptr [ebp+8],eax
025e0cf3 test eax,eax
025e0cf5 je chrome_1c30000!ChildProcessLauncher::Context::TerminateInternal+0x16 (025e0d00)
025e0cf7 push 0
025e0cf9 push eax
025e0cfa call dword ptr [chrome_1c30000!_imp__TerminateProcess (02ba56f4)]
025e0d00 push esi
025e0d01 lea esi,[ebp+8]
025e0d04 call chrome_1c30000!base::Process::Close (0289e055)
025e0d09 pop esi
025e0d0a pop ebp
025e0d0b ret
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the code for &lt;code&gt;base::Process::Close()&lt;/code&gt; which it references is also
pretty simple:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;chrome_1c30000!base::Process::Close:
0289e055 cmp dword ptr [esi],0
0289e058 je chrome_1c30000!base::Process::Close+0x1d (0289e072)
0289e05a push edi
0289e05b mov edi,dword ptr [esi]
0289e05d call dword ptr [chrome_1c30000!_imp__GetCurrentProcess (02ba56f0)]
0289e063 cmp edi,eax
0289e065 je chrome_1c30000!base::Process::Close+0x19 (0289e06e)
0289e067 push edi
0289e068 call dword ptr [chrome_1c30000!_imp__CloseHandle (02ba561c)]
0289e06e and dword ptr [esi],0
0289e071 pop edi
0289e072 ret
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So basically all that we are doing is killing the process, by calling
a handful of win32 API functions. Hmmm.&lt;/p&gt;
&lt;p&gt;Al theorizes that someone may be hooking one of the winapi calls
(perhaps &lt;code&gt;kernel32!TerminateProcess&lt;/code&gt;), and that whatever code it is
running in response to that call is responsible for the stack
corruption. For instance if the hooker used the wrong calling
convention (not &lt;code&gt;stdcall&lt;/code&gt;), that could be corrupting our stack upon
return!&lt;/p&gt;
&lt;p&gt;So how is this guy "hooking" the API call?&lt;/p&gt;
&lt;p&gt;There are a lot of different ways you could intercept Windows API
calls, and I am definitely no expert to explain them all. You could
for instance do things like directly patch the code in the system DLL
(in user land, or even on disk). Or re-write the binary to substitute
all the target function calls with your new one. But definitely the
simplest and most intuitive way to hook is to just patch the import
address table. (You can see how that works in the code above -- our
compiled code doesn't call directly into kernel32, but rather it jumps
to the address listed from the import table table... that is the
address you would be patching if you wanted to intercept the call).&lt;/p&gt;
&lt;p&gt;To this end, Al adds yet more instrumentation, this time to try and
detect if &lt;code&gt;TerminateProcess&lt;/code&gt; is being hooked via the import address
table (&lt;a href="http://crrev.com/96266"&gt;r96266&lt;/a&gt;). Unfortunately this instrumentation
doesn't reveal any smoking gun yet.&lt;/p&gt;
&lt;p&gt;Ricardo makes a good observation -- a hook on &lt;code&gt;CloseHandle()&lt;/code&gt; is
perhaps more likely than a hook on &lt;code&gt;TerminateProcess&lt;/code&gt;, based on the
position of the 0 that appears on the stack (&lt;a href="http://crbug.com/81449"&gt;bug
81449&lt;/a&gt;). Plus, there is probably more value
in hooking &lt;code&gt;CloseHandle&lt;/code&gt; over &lt;code&gt;TerminateProcess&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Finally, Al commits a changelist that bypasses the address table
altogether for both &lt;code&gt;CloseHandle&lt;/code&gt; and &lt;code&gt;TerminateProcess&lt;/code&gt;, and instead
calls the underlying implementation in &lt;code&gt;ntdll.dll&lt;/code&gt; directly:
&lt;a href="http://crrev.com/96807"&gt;r96807&lt;/a&gt;. This is not quite as efficient, but
not a huge deal either due to the low frequency of these calls.&lt;/p&gt;
&lt;p&gt;This workaround appears to have thwarted the bad hooker, and so far
there hasn't been a single crash of this sort in the Windows Chrome
canary!&lt;/p&gt;
&lt;p&gt;It remains unclear which of the two was being hooked or why. The
workaround is a pretty ridiculous thing to have to do, but the bypass
could shave off as much as 10% of our Windows browser crashes (yes,
this crash really was that high in some releases)! It is likely the
hooker is malware -- we could copy a fragment of the hooked code into
our minidump to try and learn more.&lt;/p&gt;
&lt;p&gt;Ideally we want to alert the user about these sorts of problems,
rather than papering-over them with workarounds (since they indicate a
real problem with their underlying system). But we don't have a good
mechanism to do that yet (see &lt;a href="http://crbug.com/72239"&gt;bug 72239&lt;/a&gt; for
some proposals).&lt;/p&gt;
&lt;p&gt;In summary, kudos to @apatrick for his excellent and persistent
debugging investigation over the past three months.&lt;/p&gt;
&lt;p&gt;Also this shows how powerful Chrome's Canary channel is, since it
allows you do this style of experimental debugging with very quick
turn-arounds (mere days).&lt;/p&gt;</content></entry><entry><id>tag:neugierig.org,2009:chromium-notes/2011-08-29/static-initializers</id><updated>2011-08-29T19:42:00Z</updated><title>Static initializers</title><link href="http://neugierig.org/software/chromium/notes/2011/08/static-initializers.html" /><content type="html">&lt;p&gt;Globals and singletons are already well-known as a design antipattern,
but they have an interesting additional cost.  Consider a global (I
include file-level static in this category) value that has
initialization code.  That code must be run at startup (which leads to
the &lt;a href="http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.14"&gt;static initialization order fiasco&lt;/a&gt;, though that is not
the point of this post).&lt;/p&gt;
&lt;p&gt;Because this initialization code is run at startup, before even
&lt;code&gt;main()&lt;/code&gt; is entered, it is in the critical path for startup.  It turns
out that even simple code must be paged in off disk, which can lead to
disk seeks, and disk seeks murder your startup performance.&lt;/p&gt;
&lt;p&gt;This is not hypothetical: with ChromeOS we found that
innocuous-seeming static initializers in Chrome were actually
affecting the bottom line of startup performance.  (Note: that
observation comes from a coworker; I'm not sure whether he was using a
non-SSD machine at the time or if it also happens on SSDs.  Just
guessing, but paging in more code, especially code that is
non-contiguous, must have some non-zero cost even on the SSDs that
ChromeOS relies upon.)&lt;/p&gt;
&lt;p&gt;Because of this cost we attempt to track static initialization on our
performance bots and prevent new checkins from adding more.  (Ideally
we'd remove them all but progress is slow.)  I recently looked into
how this works and I thought it'd be useful to write it down before I
forget.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How constructors are implemented&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The compiler creates, for each object file, a function that contains
the constructors for the file.  Pointers to these functions are
collected in a table at link time.  At startup,
&lt;code&gt;__do_global_ctors_aux&lt;/code&gt; iterates through the table and calls each
function.  (&lt;a href="http://vxheavens.com/lib/viz00.html"&gt;Here's a nice page that walks through the
disassembly&lt;/a&gt;.)  Conceptually, to judge the cost of all static
constructors you might want to do something like sum the size of
all of these functions, but for our purposes we care about disk seeks;
even doing more work in a single static constructor is fine if we
reduce the total number of functions paged in, which means the size
of the constructor table is the statistic of interest.&lt;/p&gt;
&lt;p&gt;The table of functions shows up as the &lt;code&gt;.ctors&lt;/code&gt; section of the
executable.  You can dump table via commands like (note that the first
entry is -1, the rest are addresses):&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ objdump --full-content --section=.ctors path/to/binary&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;or in gdb,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;(gdb) x/1000xg &amp;amp;__CTOR_LIST__&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The gdb output is perhaps useful since it will decode little-endian
for you.  (N.b. that "g" trailing the "x" command prints 64-bit
pointers; adjust as necessary locally.)&lt;/p&gt;
&lt;p&gt;For a Chrome binary I glanced at the ctor list appears to be in
pointer order, which means you can see how much of the resulting
binary they span by subtracting the last entry from the first.  From
my random debugging build: 30mb, not good.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Constructors versus static initialization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Note that data that is initialized to a constant is implemented in a
different way: the constant value can just be placed in the right
place at compile time, so there is no cost.  In contrast, C++ objects
that have constructors involve code and must be computed at runtime.
You'll also sometimes encounter code that initializes variables with
function calls (like &lt;a href="http://neugierig.org/software/chromium/notes/2011/01/plugin-conflict.html"&gt;we did with the mysterious IcedTea
crash&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;You might also notice that static data can be shared between multiple
instances of the same executable, while initialized memory is private;
see &lt;a href="http://neugierig.org/software/blog/2011/05/memory.html"&gt;my post about how memory works&lt;/a&gt; for more on that.&lt;/p&gt;
&lt;p&gt;I noticed with some interest that the Go programming language,
designed in part by compiler hackers, neatly sidesteps some of the
above problems: by defining initialization order carefully ("The
importing of packages, by construction, guarantees that there can be
no cyclic dependencies in initialization.") and by only allowing
simple values as constant initializers.  See &lt;a href="http://golang.org/doc/go_spec.html#Program_initialization_and_execution"&gt;their manual&lt;/a&gt;
for more.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What to do about it&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Mozilla hackers &lt;a href="http://blog.mozilla.com/tglek/2010/05/27/startup-backward-constructors/"&gt;have found that Linux is pathologically bad in how it
runs the resulting ctor list&lt;/a&gt;, and it looks like they have
at least &lt;a href="https://bugzilla.mozilla.org/show_bug.cgi?id=606137"&gt;considered fixing that manually&lt;/a&gt;.  We have chatted
about doing the same, but fundamentally I believe the way to keep
startup fast is to &lt;em&gt;do less&lt;/em&gt;. &lt;a href="http://neugierig.org/software/chromium/notes/2010/05/fast.html"&gt;See also my earlier post about
performance&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It appears that the generated functions that run these constructors
get names starting with &lt;code&gt;_GLOBAL__I_&lt;/code&gt;.  This means a call like&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ nm out/Debug/chrome | grep _GLOBAL__I&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;will dump a list of all files that have a global constructor.  Go delete
some code!&lt;/p&gt;</content></entry><entry><id>tag:neugierig.org,2009:chromium-notes/2011-08-03/zygote</id><updated>2011-08-03T17:59:00Z</updated><title>The zygote process and software updates</title><link href="http://neugierig.org/software/chromium/notes/2011/08/zygote.html" /><content type="html">&lt;p&gt;When you make a new tab Chrome (usually) starts a new process for that
tab.  How is this done?  It would seem natural to just &lt;code&gt;fork()&lt;/code&gt;, but
&lt;code&gt;fork&lt;/code&gt; can't be used safely in the presence of threads.  &lt;code&gt;fork&lt;/code&gt; only
forks the current thread but other threads may be holding locks
(including e.g. inside glibc or in the allocator) which would never
be released after the fork.&lt;/p&gt;
&lt;p&gt;If you are careful to not touch anything after a fork, it can be safe
to immediately &lt;code&gt;exec&lt;/code&gt;.  This matches the process launching model on
Windows (no fork, only fork+exec), with the negative that it forces
the overhead of startup again on each new process.  (Code reference:
&lt;a href="http://codesearch.google.com/codesearch#search/&amp;amp;exact_package=chromium&amp;amp;q=launchprocess&amp;amp;type=cs"&gt;&lt;code&gt;LaunchProcess()&lt;/code&gt;&lt;/a&gt;, which also knows to e.g. use &lt;code&gt;_exit&lt;/code&gt; instead
of &lt;code&gt;exit&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;Forking and execing ourselves is how we spawn subprocesses on Mac (I
believe; there may be some trickery related to how app bundles work
that complicate this).  On Linux it is unfortunately more complicated.
Updates on Linux are managed by a systemwide package manager that runs
independently of other software, which effectively means at any point
any file you rely upon can be silently clobbered.  (This problem even
affects single-process apps like Firefox; an update will clobber some
JavaScript used in the UI and suddenly things will either crash or get
weird.)  In Chrome's case, if Chrome binary itself is updated while
the browser is running, processes spawned by the running Chrome would
be the newer Chrome, which may have made an incompatible change to the
interface between Chrome processes.&lt;/p&gt;
&lt;p&gt;Instead, at startup, before we spawn any threads, we fork off a helper
process.  This process opens every file we might use and then waits
for commands from the main process.  When it's time to make a new
subprocess we ask the helper, which forks itself again as the new
child.  By virtue of always forking from the same initial process, we
guarantee that we are always running the same code; even if the files
we opened are replaced by a system update our handle on them is the
handle for the previous file.  (That works as long as nobody
overwrites the contents of the file we have open; thankfully, package
updates write a new file and rename it over the old name, leaving our
open copy the only remaining reference to the old file.)&lt;/p&gt;
&lt;p&gt;(Code reference: &lt;a href="http://codesearch.google.com/codesearch#OAMlx_jo-ck/src/content/browser/child_process_launcher.cc&amp;amp;exact_package=chromium&amp;amp;q=launchinternal&amp;amp;type=cs&amp;amp;l=104"&gt;&lt;code&gt;ChildProcessLauncher&lt;/code&gt;'s &lt;code&gt;LaunchInternal()&lt;/code&gt;&lt;/a&gt;,
the gory &lt;code&gt;ifdef&lt;/code&gt; soup used when launching a subprocess.  Truly some
ugly code.)&lt;/p&gt;
&lt;p&gt;This solution is both clever and an ugly hack.  Any time someone adds
code to Chrome that interacts with a file on disk they either need to
be aware that they need to preemptively open it or they will produce
mysterious failures across updates (in practice, usually the latter;
e.g. &lt;a href="http://code.google.com/p/chromium/issues/detail?id=35793"&gt;bug 35793: Devtools stop working when chrome gets
updated&lt;/a&gt;).  An interesting question to ask is: why is this not a
problem on Windows and Mac?&lt;/p&gt;
&lt;p&gt;On Windows, files are locked if any process is using them, which
forces a design where updates install into a separate directory.  But
-- annoyingness of locking aside -- in fact I think that design is
preferable.  To start with, a given version of Chrome will know its
files will remain unmolested by updates.  Furthermore, when an update
happens, the updater can write out a separate "update succeeded"
sentinel after writing all the files out, making impossible for an
aborted update to leave both the previous and next version in a
half-working state.  (On Mac, we take a similar approach; I don't know
enough about Macs to know whether the versioned directories within
bundles make this magically work.)&lt;/p&gt;
&lt;p&gt;With all this in mind you might reasonably ask why Linux needs to be
special: why we waste memory on this zygote process launcher and have
extra buggy codepaths just to support an inferior update model.  (Note
that by using &lt;code&gt;.deb&lt;/code&gt; files we also lose &lt;a href="http://neugierig.org/software/chromium/notes/2009/05/courgette.html"&gt;our tiny incremental
updates&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;And to that I can only answer the thinking we had at the time: one, we
wanted to be good citizens on Linux; one distinction between "lame
port of a Windows app" and "real Linux software" is exactly whether
you distribute as a tarball or as a package.  Secondly, and more
importantly, we knew that regardless of what we did for Google Chrome
the Linux distros would attempt to stuff Chromium into their package
manager even when they know it breaks the app, much like they've done
to Firefox.  Now that I've summarized it in these terms it sounds a
little depressing, but there it is; with ChromeOS where we control the
stack we have more intelligent updates.&lt;/p&gt;</content></entry><entry><id>tag:neugierig.org,2009:chromium-notes/2011-07-29/datavis</id><updated>2011-07-29T23:32:00Z</updated><title>Data visualization and d3</title><link href="http://neugierig.org/software/chromium/notes/2011/07/datavis.html" /><content type="html">&lt;p&gt;Lately I've been learning the &lt;a href="http://mbostock.github.com/d3/"&gt;awesome d3 library&lt;/a&gt; for data
visualization.  I usually only publish my toys internally, but
only because it's convenient; I'd rather put them online so others
can play with them.&lt;/p&gt;
&lt;p&gt;First up: &lt;a href="http://neugierig.org/software/datavis/lines-spent/"&gt;lines spent&lt;/a&gt;.&lt;/p&gt;</content></entry><entry><id>tag:neugierig.org,2009:chromium-notes/2011-04-19/rtl-titles</id><updated>2011-04-19T18:26:00Z</updated><title>RTL titles</title><link href="http://neugierig.org/software/chromium/notes/2011/04/rtl-titles.html" /><content type="html">&lt;p&gt;&lt;em&gt;(Here's a post from some months ago.  I think I'm not writing new posts
because I've been sitting on this one for so long, so perhaps it's for
the best that I just publish it.)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I've been away for a while.  Part of my travels involved a hackfest in
Tel Aviv for right-to-left text in WebKit.  My RTL knowledge of WebKit
is minor -- I have done a decent bit of hacking on the Linux Chrome
rendering code for complex text in Webkit -- but as a language
enthusiast (I like to tell people: by college degrees, I am as
qualified a linguist as I am a computer scientist!), I am already
familiar with the bidi algorithm and I've studied some Arabic.&lt;/p&gt;
&lt;p&gt;(I know some properties of Hebrew but not the alphabet, but in Tel
Aviv all the street signs are Rosetta stones of Hebrew/Arabic/Roman
scripts; by the end of my week I proudly identified and boarded the
shared taxi to Jerusalem by reading the Hebrew sign in the window.)&lt;/p&gt;
&lt;p&gt;Upon arriving in Tel Aviv, we assigned the important bugs to the more
talented WebKit hackers, while I picked a relatively minor bug in the
hope that I could churn through a few of them.  By the end of the week
I didn't even fix that one bug.  I did, however, make two refactoring
pre-changes and got another one in just at the finish line that
touched 53 files.  Somehow with me it is always a yak shave.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;The bug seemed pretty simple: WebKit doesn't understand &lt;code&gt;&amp;lt;title
dir="rtl"&amp;gt;RTL titles&amp;lt;/title&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To start with, what does this actually mean?  Here's an attempt to
explain as briefly as possible, glossing over details; bidirectional
text is actually pretty gnarly.&lt;/p&gt;
&lt;p&gt;First, some background for bidi beginners.  You should know that text
is always stored in its logical order: the first letter of an
in-memory string of Hebrew is the first letter of where you'd start
reading, on the right.  The same is true for the order of the
characters as written in an HTML document.  When rendering a string
that contains both right-to-left (RTL) and left-to-right (LTR) text,
you end up "reversing" bits of it.  The algorithm for this reversal is
called the bidi (bidirectional) algorithm, and it is complex and
interesting but out of scope for this post.&lt;/p&gt;
&lt;p&gt;In discussions of bidi, the convention is to write the characters that
should be RTL (representing a language like Hebrew) in uppercase, and
the LTR characters in lowercase.  So, for example, the in-memory
string &lt;code&gt;foo BAR XYZ&lt;/code&gt; should appear as &lt;code&gt;foo ZYX RAB&lt;/code&gt;.  Critically, note
that the last word of the source string ends up in the &lt;em&gt;middle&lt;/em&gt; of the
rendered string -- ommitting many details, at a conceptual level
you're laying out a string of chunks left to right and the BAR XYZ
chunk should be rendered right to left as part of that.&lt;/p&gt;
&lt;p&gt;I've already made an assumption there, though: I wrote that you're
laying out the chunks left to right, but in a right to left document
the overall layout order goes the other way.  (The document starts
from the right, after all.)  &lt;code&gt;foo BAR XYZ&lt;/code&gt; in a Hebrew document should
render as &lt;code&gt;ZYX RAB foo&lt;/code&gt;.  This extra bit of of metadata about the text
-- inventing a term, the direction &lt;em&gt;context&lt;/em&gt; of the string -- is just
what the &lt;code&gt;dir&lt;/code&gt; attribute of the title is for.&lt;/p&gt;
&lt;p&gt;WebKit can display plenty of RTL sites just fine, so it necessarily
gets the all of the above details correct in web content.  The problem
that my bug was about is that WebKit generally isn't responsible for
rendering title tags -- they're handled by the browser.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Aside: amusingly, a stylesheet containing&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;head { display: block } title { display: block }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;will&lt;/em&gt; cause the title to display in the page.  It sorta makes sense
as soon as you see it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the browser to render the title correctly, it needs the same
information that WebKit has -- the text &lt;em&gt;and&lt;/em&gt; its direction.  (We had
many discussions about the proper way to display an RTL title in an
LTR browser -- e.g., do you move the favicon too?)  To fix this bug I
"just" needed to (1) get the direction as specified on the tag (or any
parent of the tag, or CSS, or ...); (2) plumb that extra metadata out
to the browser; (3) make use of that extra metadata browser-side (e.g.
flip the text layout direction when appropriate).&lt;/p&gt;
&lt;p&gt;Unfortunately, titles are used in a variety of places within WebCore
-- as an attribute of documents, of course, but also in data
structures related to loading pages, history, and in APIs used to
communicate state information up to the hosting environment.  Touching
all of these resulted in a larger patch than I'd anticipated.&lt;/p&gt;
&lt;p&gt;I first landed some managable chunks of it: &lt;a href="http://trac.webkit.org/changeset/82090"&gt;a small refactoring&lt;/a&gt;
and then &lt;a href="http://trac.webkit.org/changeset/82422"&gt;a larger one&lt;/a&gt;; it's good I landed them separately
because I got the logic in the latter wrong and needed to &lt;a href="http://trac.webkit.org/changeset/82425"&gt;quick-fix
it&lt;/a&gt; (in my defense, part of the problem is that the code &lt;a href="https://bugs.webkit.org/show_bug.cgi?id=57537"&gt;has a
related bug&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Then comes the &lt;a href="http://trac.webkit.org/changeset/82580"&gt;monster change-the-world patch&lt;/a&gt; where I swap out
the type of a core object; despite my best efforts at modifying nine
WebKit ports simultaneously I managed to break the &lt;a href="http://trac.webkit.org/changeset/82582"&gt;GTK build and Qt
build&lt;/a&gt; and the &lt;a href="http://trac.webkit.org/changeset/82586"&gt;Qt build a second time&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That got me to the point where the data was exposed to the
WebKit-internal platform layer, but not through any public APIs; so
next was &lt;a href="http://trac.webkit.org/changeset/84199"&gt;exposing that through the Chromium WebKit API&lt;/a&gt;, which is
also the layer the testing interface uses so it allowed me to write &lt;a href="http://trac.webkit.org/browser/trunk/LayoutTests/fast/dom/title-directionality.html?rev=84199"&gt;a
test&lt;/a&gt;.  And finally, I screwed up that patch too, &lt;a href="http://trac.webkit.org/changeset/84276"&gt;necessitating
another quick fix&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;And with all that in place, I then turned to the Chrome-side
implementation...&lt;/p&gt;
&lt;p&gt;...and discovered the chicken-and-egg problem of new HTML specs:
because nobody implements &lt;code&gt;&amp;lt;title dir&amp;gt;&lt;/code&gt;, few sites made use of it.  In
fact, I did find a site that frequently had titles that mixed LTR and
RTL text, exactly the sort that would benefit from my change, and the
site &lt;em&gt;did&lt;/em&gt; use the &lt;code&gt;dir&lt;/code&gt; attribute on a &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; tag, despite it not
having any effect in browsrs -- but the site used it in such a way
that was exactly &lt;em&gt;backwards&lt;/em&gt;; my locally patched browser that obeyed
the attribute made the site strictly worse.  The site?  Google Israel.&lt;/p&gt;
&lt;p&gt;And with no better conclusion than that, this post has sadly
languished unpublished on my laptop and the work is in a similar
state.  Now that I look, it seems my bug report about fixing Google
Israel was fixed, perhaps it's worth again trying to land my patch.
But at this point I am suspicious of the sunk cost fallacy: I had
picked this bug because it was so minor that we had guessed it would
be easy, and it's perhaps not worth much more time when I have a
limitless stream of more important bugs.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Your reward for reading this long and anticlimactic post is an example
of one of the many ways software can get RTL wrong.  In this image, I
constructed a page with a specially-crafted title, which Chrome
naively formats as "$PAGE_TITLE - Google Chrome" and then hands it on
to the OS.  The image shows what happens when I alt-tab.&lt;/p&gt;
&lt;img src='http://neugierig.org/software/chromium/notes/static/2011-06-rtl.png'&gt;</content></entry></feed>
