Blog



All timestamps are based on your local time of:

[ « Newer ][ View: List | Cloud | Calendar | Latest comments | Photo albums ][ Older » ]

Your Baby and Child2019-09-14 05:38:28

Book #27 of 2019 is Your Baby and Child by Penelope Leach. I was recommended the book on the grounds that it explained child development processes, which is something I am very interested in. Sadly this turned out not to be the case, but is just another opinionated/prescriptive parenting book from which you have to tease out the development process yourself.

That being said, it's a pretty comprehensive book and covers a lot of ground (I just skimmed some of it, specially the later sections). I did like how it splits the material into different stages (newborn/settled baby/toddler/child/etc.) rather than use explicit age ranges, because those age ranges vary a lot in practice. But the book is dated, and some of the material is no longer "best practice" or has been rejected by the latest scientific research. And that material is mixed in with everything else, so it's hard to take anything the book says at face value.

[ 0 Comments... ]

Joy in the Morning2019-08-28 11:02:39

Book #26 of 2019 is Joy in the Morning, last of my Wodehouse binge. Certainly very similar to the previous books, in that the same scenarios appear over and over but are glued together in different sequences. While it was fun I'm glad I'm done with this series for now.

[ 0 Comments... ]

More Wodehouse2019-08-26 14:55:00

Books #22, #23, #24, and #25 of 2019 are Something Fresh, Heavy Weather, The Inimitable Jeeves, and The Code of the Woosters, all by P. G. Wodehouse. The last of the four I found the best, but they all were pretty good. Amusing as they are, there does appear to be some amount of repetition of themes, more than I would expect.

[ 0 Comments... ]

The Information2019-08-21 04:39:50

Book #21 of 2019 is The Information by James Gleick. It's a comprehensive but easy-to-read book on information. It starts with the transition from oral to written history, and goes all the way to quantum information theory concepts, spending the most amount of time on Claude Shannon's work on developing information theory. I found it quite good, although it took me a while to get through as I had to stop periodically and absorb stuff. There was a bunch of stuff in there that made for interesting thought-fodder. Wouldn't recommend it to a general public though; good for somebody with a general interest in information theory.

As a tangent, here's a (variant of a) puzzle I was forwarded not too long ago on WhatsApp. Usually I dislike those things, but this puzzle intruiged me as it seemed impossible to solve at first and took me a few days to figure out.

You, your friend, and the Devil play a game. You and the Devil are in the room with a 8x8 chess board with 64 tokens on it, one on each square. Meanwhile, your friend is outside of the room. The token can either be on an up position or a down position, and the difference in position is distinguishable to the eye. The Devil randomizes the tokens on the board (so it's a random mix of up and down) and chooses one of the 64 squares and calls it the magic square. Next, you may choose one token on a square and flip its position. Then, you leave the room, and your friend comes in and must guess what the magic square was by looking at the state of the board. You and your friend may agree on some strategy beforehand, but there are no "side channels" for leaking information other than the tokens on the chessboard.

Bonus points if you can explain the solution without using concepts from information theory (I couldn't).

[ 0 Comments... ]

Thank you, Jeeves2019-08-05 06:39:50

Book #20 of 2019 is Thank you, Jeeves by P. G. Wodehouse. Hi-liarious! The writing style kind of reminded me of Douglas Adams or Terry Pratchett, but the content is somewhat different. I very much enjoyed it though, and parts had me LOL'ing.

[ 0 Comments... ]

Investing: The Last Liberal Art2019-07-31 11:21:08

Book #19 of 2019 is Investing: The Last Liberal Art. I saw this randomly while browsing in a library and it sounded interesting so I picked it up. It was a bit of a rollercoaster, because:

(1) What I expected based on the jacket was that it would give a quick overview of the main ideas from different disciplines in a way that would encourage me to learn more about them.

(2) After reading the first chapter, I was very disappointed, because it seemed like it was really "pick a concept from a discipline and shoehorn it into some theory/explanation of how the stock market works". To be specific, the first chapter chose the concept of "equilibrium" from physics. Which just really rubbed me the wrong way, because it seemed like he was taking ideas totally out of context and mis-applying them.

(3) After reading the rest of the chapters, I understand a bit more what the author was trying to do. I still don't think he did a particularly good job, but at least the book pointed me to some interesting ideas that I hadn't thought about before, and can guide me to other interesting books.

Still not a book I would recommend overall, but I'm glad I didn't quit after the first chapter since the later ones redeemed the book a bit.

[ 0 Comments... ]

The Gecko Hacker's Guide to Taskcluster2019-07-15 09:22:44

Don't panic.

I spent a good chunk of this year fiddling with taskcluster configurations in order to get various bits of continuous integration stood up for WebRender. Taskcluster configuration is very flexible and powerful, but can also be daunting at first. This guide is intended to give you a mental model of how it works, and how to add new jobs and modify existing ones. I'll try and cover things in detail where I believe the detail would be helpful, but in the interest of brevity I'll skip over things that should be mostly obvious by inspection or experimentation if you actually start digging around in the configurations. I also try and walk through examples and provide links to code as much as possible.

Based on my experience, there are two main kinds of changes most Gecko hackers might want to do:
(1) modify existing Gecko test configurations, and
(2) add new jobs that run in automation.
I'm going to explain (2) in some detail, because on top of that explaining (1) is a lot easier.

Overview of fundamentals

The taskcluster configuration lives in-tree in the taskcluster/ folder. The most interesting subfolders there are the ci/ folder, which contain job defintiions in .yml files, and the taskgraph/ folder which contain python scripts to do transforms. A quick summary of the process is that the job definitions are taken from the .yml files, run through a series of "transforms", finally producing a task definition. The task definitions may have dependencies on other task definitions; together this set is called the "task graph". That is then submitted to Taskcluster for execution. Conceptually you can think of the stuff in the .yml files as a higher-level definition, which gets "compiled" down to the final taskgraph (the "machine code") that Taskcluster actually understands and executes.

It's helpful to walk through a quick example of a transform. Consider the webrender-linux-release job definition. At the top of the kind.yml file, there are a number of transforms listed, so each of those gets applied to the job definition in turn. The first one is use_toolchains, the code for which you can find in use_toolchains.py. This transform takes the toolchains attributes of the job definition (in this example, these attributes), and figures out the corresponding toolchain jobs and artifacts (artifacts are files that are outputs of a task), The toolchain jobs are defined in taskcluster/ci/toolchains/, so we can see that the linux64-rust toolchain is here and the wrench-deps toolchain is here. The transform code discovers this, then adds dependencies for those tasks to webrender-linux-release, and populates the MOZ_TOOLCHAINS env var with the links to the artifacts. Taskcluster ensures that tasks only get run after their dependencies are done, and the MOZ_TOOLCHAINS is used by ./mach artifact toolchain to actually download and unpack those toolchains when the webrender-linux-release task runs.

Similar to this, there are lots of other transforms that live in taskcluster/taskgraph/transforms/ that do other transformations on the job definitions. Some of the job definitions in the .yml files look very different in terms of what attributes are defined, and go through many layers of transformation before they produce the final task definition. In a sense, each folder in taskcluster/ci is a domain-specific language, with the transforms eventually compiling them down to the same machine code.

One of the more important transforms is the job transform, which determines which wrapper script will be used to run your job's commands. This is usually specified in your job definition using the run.using attribute. Examples include mach or toolchain-script. These both get transformed (see transforms for mach and toolchain-script) into commands that get passed to run-task. Or you can just use run-task directly in your job, and specify the command you want to have run. In the end most jobs will boil down to run-task commands; the run-task wrapper script just does some basic abstraction over the host machine/VM and then runs the command.

Host environment and Docker

Taskcluster can manage many different underlying host systems - from Amazon Linux VMs, to dedicated physical macOS machines, and everything in between. I'm not going to into details of provisioning but there's the notion of a "worker" which is useful to know. This is the binary that runs on the host system, polls taskcluster to find new tasks scheduled to be run on that kind of host system, and executes them in whatever sandboxing is available on that host system. For example, a docker-worker instance will start a specified docker image and execute the task commands in that. A generic-worker instance will instead create a dedicated work folder and run stuff in there, cleaning up afterwards.

If you're adding a new job (e.g. some sort of code analysis or whatever) most likely you should run on Linux using docker. This means you need to specify a docker image, which you can also do using the taskcluster configuration. Again I will use the webrender-linux-release job as an example. It specifies the webrender docker image to use, which is defined with an in-tree Dockerfile. The docker image is itself built using taskcluster with the job definition here (this job definition is an example of one that looks very different from other job definitions, because the job description is literally two lines and transforms do most of the work in compiling this into a full task definition).

Circling back to generic-worker, this is the default for jobs that need to run on Windows and macOS, because Docker doesn't run either of those platforms as a target. An example is the webrender-macos-debug job, which specifies using: run-task on a t-osx-1010 worker type, which will cause it go through this transform and eventually run using the run-task wrapper under generic worker on the macOS instance. For the most part you probably won't need to care about this but it's something to be aware of if you're running jobs targetted at Windows or macOS.

Caching

As we've seen from previous sections, jobs can use docker images and toolchain artifacts that are produced by other tasks in the taskgraph. Of course, we don't want to rebuild these docker images and toolchains on every single push, as that would be quite expensive. Taskcluster provides a mechanism for caching and reusing artifacts from previous pushes. I'm not going to go into too much detail on this, but you can look at the cached_tasks transform if you're interested. I will, however, point to the %include comments in e.g. this Dockerfile and note that these include statements are special because the mark the docker image as dependent on the given file. So if the included file changes, the docker image will be rebuilt. For toolchain tasks you can specify additional dependencies on inputs using the resources attribute; this also triggers toolchain rebuilds if the dependent inputs change.

The other thing to keep in mind when adding a new job is that you want to avoid too much network traffic or redundant work. So if your job involves downloading or building stuff that usually doesn't change from one push to the next, you probably want to split up your job so that the mostly-static part is done by a toolchain or other job, and the result of that is cached and reused by your main job. This will reduce overall load and also improve the runtime of your per-push job.

Even if you don't need caching across pushes, you might want to refactor two jobs so that their shared work is extracted into a dependency, and the two dependent jobs then just do their unique postprocessing bits. This can be done by manually specifying the dependencies and pulling in artifacts from those dependencies using the fetches attribute. See here for an example. In this scenario taskcluster will again ensure the jobs run in the right order, and you can use artifacts from the dependencies, but no caching across pushes takes place.

Adding new jobs

So hopefully with the above sections you have a general idea of how taskcluster configuration works in mozilla-central CI. To add a new job, you probably want to find an existing job kind that fits what you want, and then add your job to that folder, possibly by copy/pasting an existing job with appropriate modifications. Or if you have a new type of job you want to run that's significantly different from existing ones, you can add a new kind (a new subfolder in taskcluster/ci and documented in kinds.rst). Either way, you'll want to ensure the transforms being used are appropriate and allow you to reuse the features (e.g. toolchain dependencies) that you need and that already exist.

Gecko testing

The Gecko test jobs are defined in the taskcluster/ci/test/ folder. The entry point, as always, is the kind.yml file in that folder, which lists the transforms that get applied. The tests transform is one of the largest and most complex transforms. It does a variety of things (e.g. generating fission-enabled and fission-disabled tasks for jobs), but thankfully you probably won't need to fiddle with that too much, unless you find your test suite is behaving unexpectedly. Instead, you can mostly do copy-pasting in the other .yml files to enable test suites on particular platforms or adjust options. The test-platforms.yml file allows you define "test platforms" which show up as new rows on TreeHerder and run sets of tests on a particular build. The sets of tests are defined in test-sets.yml, which in turn reference the individual test jobs defined in the various other .yml files in that folder. Enabling a test suite on a platform is generally as easy as adding the test to the test set that you care about, and maybe tweaking some of the per-platform test attributes (e.g. number of chunks, or what trees to run on) to suit your new platform. I found the .yml files to mostly self-explanatory so I won't walk through any examples here.

What I will briefly mention is how the tests are actually run. This is not strictly part of taskcluster but is good to know anyway. The test tasks generally run using mozharness, which is a set of scripts in testing/mozharness/scripts and configuration files in testing/mozharness/configs. Mozharness is responsible for setting up the firefox environment and then delegates to the actual test harness. For example, running reftests on desktop linux would run testing/mozharness/scripts/desktop_unittest.py with the testing/mozharness/configs/unittests/linux_unittest.py config (which are indicated in the job description here). The config file, among other things, tells the mozharness script where the actual test harness entrypoint is (in this case, runreftest.py) and the mozharness script will invoke that test harness after doing some setup. There's many layers here (run-task, mozharness, test harness, etc.) with the number of layers varying across platforms (mozharness scripts for Android device/emulator are totally different than desktop), and I don't have as good a grasp on all this as I would like, but hopefully this is sufficient to point you in the right direction if you need to fiddle with tests at this level.

Debugging

As always, when modifying configs you might run into unexpected problems and need to debug. There are a few tools that are useful here. One is the ./mach taskgraph command, which can run different steps of the "decision" task and generate taskgraphs. When trying to debug task generation issues my go-to technique would be to download a parameters.yml file from an existing decision task on try or m-c (you can find it in the artifacts list for the decision task on TreeHerder), and then run ./mach taskgraph target-graph -p parameters.yml. This runs the taskgraph code and emits a list of tasks that would be scheduled given the taskcluster configuration in your local tree and the parameters provided. Likewise, ./mach taskcluster-build-image and ./mach taskcluster-load-image are useful for building and testing docker images for use with jobs. You can use these to e.g. run a docker image with your local Docker installation, and see what all you might need to install on it to make it ready to run your job.

Another useful debugging tool as you start doing try pushes to test your new tasks, is the task/group inspector at tools.taskcluster.net. This is easily accessible from TreeHerder by clicking on a job and using the "Inspect task" link in the details pane (bottom left). TreeHerder provides some information, but the Taskcluster tools website provides a much richer view including the final task description that was submitted to Taskcluster, its dependencies, environment variables, and so on. While TreeHerder is useful for looking at overall push health, the Taskcluster web UI is better for debugging specific task problems, specially as you're in the process of standing up a new task.

In general the taskcluster scripts are pretty good at identifying errors and printing useful error messages. Schemas are enforced, so it's rare to run into silent failures because of typos and such.

Conclusion

There's a lot of power and flexibility afforded by Taskcluster, but with that goes a steep learning curve. Thankfully once you understand the basic shape of how Taskcluster works, most of the configuration tends to be fairly intuitive, and reading existing .yml files is a good way to understand the different features available. grep/searchfox in the taskcluster/ folder will help you out a lot. If you run into problems, there's always people willing to help - as of this writing, :tomprince is the best first point of contact for most of this, and he can redirect you if needed.

[ 0 Comments... ]

The Girl in the Plain Brown Wrapper2019-07-11 20:36:11

Book #18 of 2019 is The Girl in the Plain Brown Wrapper, last one of the Travis McGee books available at my local library. Now I need to find something else. This one was slightly better than average of the lot, I'd say.

[ 0 Comments... ]

The Green Ripper2019-07-11 20:34:35

Book #17 of 2019 is The Green Ripper, yet another of the Travis McGee books. This one was a little meh. There's a similar Jack Reacher book, although of course this one came first.

[ 0 Comments... ]

The Lonely Silver Rain2019-07-01 15:18:41

Book #16 of 2019 is The Lonely Silver Rain, again of the Travis McGee series. Also good. My local library only has a couple more of the series (which is part of the reason I've been reading them out of order) so I might as well finish those before moving on to something else.

[ 0 Comments... ]

[ « Newer ][ View: List | Cloud | Calendar | Latest comments | Photo albums ][ Older » ]

 
 
(c) Kartikaya Gupta, 2004-2019. User comments owned by their respective posters. All rights reserved.
You are accessing this website via IPv4. Consider upgrading to IPv6!