Intent to Implement System Add-on: SHIELD/Normandy
mkelly at mozilla.com
Thu Sep 29 18:30:16 UTC 2016
On 9/27/16 8:08 AM, Gijs Kruitbosch wrote:
> You stick code and/or a patch somewhere people in this list can see
> it, you pick one or more of them, and they either approve it or tell you
> what to change before it gets approved, lands, and becomes part of
> Firefox-the-product. I am curious what you mean by "mixed answers".
>  https://wiki.mozilla.org/Modules/Firefox
As an example, one question I had in the past was "How do I get security
review for stuff I want to merge into Firefox, and where is that
checked?" and I've gotten answers like "You pretty much handle getting
security review on your own" or "RelMan will ask for the bug that you
got review in.".
What I'm getting from your answer is that it is the Firefox peer's
responsibility to judge whether what they're merging needs review from
someone else besides the peer and whether it has received that review /
approval. Is that accurate?
> OK, so for this one the answer is "no". This concerns me because that
> code ultimately affects how Firefox-the-product behaves (outside of what
> webpages can do) and normally such changes are subject to Firefox's
> review policy. This is the case for other go-faster add-ons (e.g. I have
> reviewed a number of changes to the pocket "go faster" add-on in the
> last few weeks). When this was not the case for self-repair code, that
> burned us.
Changes to the sandbox require a Firefox review since they're part of
the system add-on code. The intent is that reviewers will use that
opportunity to ensure only new behaviors that they're comfortable with
are added. For example, instead of adding a "change any pref" function,
we'd add a function that can change prefs from a whitelist of approved
prefs. This helps reduce the impact of bugs in the JS code.
Also, all the behavior changes S&I put out via SHIELD involve approvals
via bugs and emails with external stakeholders and product owners.
Experiments have a process for data and implementation review, and the
sampling we use on these recipes makes the impact of issues pretty small.
As for being burned by a lack of review in the past, SHIELD has both
more review than self-repair had and better mitigation when things go
wrong. If a recipe is broken, we can disable it in one step with the
admin interface, whereas self-repair required writing and shipping a
patch to the self-repair repo.
> Followup question: does this code in any way follow a train model (ie
> get tested against Firefox Nightly users first) ? The self-repair code
> didn't and just ran against 10% of release users to test it, which,
> again, burned us in the past.
It doesn't follow a train model, but we often run new recipes / actions
against small percentages (0.001%, for example) to test if it performs
as we expect before bumping up the sample rate. We are also capable of
running recipes on specific channels if the risk is deemed high enough
to warrant that.
> Related, why can't the experiments people "want" to run be run against
> Nightly users first to test the viability of the system add-on and the
> rest of the platforms involved with running these
> recipes/experiments/snippets? Why the need to immediately uplift to
We have ways of doing basic "Does it work, does it break everything"
testing on Nightly that we plan to perform, but that won't take terribly
long to run. The interesting tests we want to do that require real
surveys we want to test are useful because they involve testing the
entire pipeline from downloading and displaying a Heartbeat prompt to
analyzing the results.
We also sample these prompts (to avoid spamming users with repeated
prompts), and rate of engagement for the prompt is low enough that
running a survey on just the Nightly population won't result in much
actual testing of the full flow through the prompt.
Also also, there is some level of "omg the data from this system is
useful enough that if we can get it 6 weeks sooner that'd be awesome"
behind wanting to uplift. The backlog of requested studies is
looooooooong, and landing the system add-on is key to increasing the
rate of studies. These studies help us improve Firefox in ways we know
make a difference, since we've measured them.
> What's required to "back out" or "disable" or "turn off" a recipe or
> this JS code / configuration? The docs say enabling a recipe takes 2
> people, so I'm assuming we're talking about the same thing. How many
> people get access to the "turn it off" switch and where are they located
> (both team/org-wise and timezone/location-wise) ? Again, I don't want to
> end up with a situation where something gets approved and then needs
> changing and nobody is available to do that.
A checkbox in the admin, and it only requires a single person. Approval
is intended for approving a given recipe to be sent out, but once a
recipe and it's filters/configuration have been approved, enabling or
disabling the recipe does not require approval so as to avoid not be
able to pull a recipe quickly.
Time zone distribution of people who can pull recipes is a good point
that we haven't addressed yet. Right now all the people with access are
in North America. What's the concern about org-wise distribution,
though? Are you concerned about getting approval for pulling a recipe?
> Related: are there controls to stop us from deploying stuff to release
> late PDT (or on Friday or whatever) so that the world burns while the US
> and Europe are asleep or on holiday?
There's at least an implicit policy to not deploy when everyone who can
monitor and disable recipes is about to be unavailable for an extended
period of time, but an explicit policy for this would be good. I think
this is really just a symptom of the greater issue of having an explicit
group of people with recipe access and documented policies for that
group. I'll work on getting a better answer for this.
- Mike Kelly
More information about the firefox-dev