Intent to Implement System Add-on: SHIELD/Normandy

Gijs Kruitbosch gijskruitbosch at gmail.com
Fri Sep 30 13:41:23 UTC 2016


On 29/09/2016 19:30, Michael Kelly wrote:
> On 9/27/16 8:08 AM, Gijs Kruitbosch wrote:
>> You stick code and/or a patch somewhere people in this list[0] can see
>> it, you pick one or more of them, and they either approve it or tell you
>> what to change before it gets approved, lands, and becomes part of
>> Firefox-the-product. I am curious what you mean by "mixed answers".
>> [0] https://wiki.mozilla.org/Modules/Firefox
> As an example, one question I had in the past was "How do I get security
> review for stuff I want to merge into Firefox, and where is that
> checked?" and I've gotten answers like "You pretty much handle getting
> security review on your own" or "RelMan will ask for the bug that you
> got review in.".
Security review is separate from Firefox's "regular" code review. 
One-line patches won't normally need a separate security review, but 
will still (almost) always need code review. I'm not really speaking 
about the security review aspect of it, just about Firefox code review. 
If the practices and steps for security review are unclear, we should 
get that sorted.

In this specific case, it sounds like you're already talking to the 
security team. They would be the best people to judge if you (still) 
need a formal security review to happen on the code you're landing. If 
you haven't talked to them about this, now would be a good time. For 
other projects, a quick web search gets me: 
https://wiki.mozilla.org/Security#Request_a_Security_or_Privacy_Review 
which seems fairly straightforward to me.
> What I'm getting from your answer is that it is the Firefox peer's
> responsibility to judge whether what they're merging needs review from
> someone else besides the peer
No, I would say that the responsibility for "who needs to look at this 
patch" is on the patch author. If the patch author is not sure, they 
should ask for help (and probably ask prospective reviewers!), but it 
shouldn't be assumed that setting r?<peer> on bugzilla is equivalent to 
delegating all the due diligence to said peer.

Reviewers will generally not review code that they don't know well and 
redirect you to someone else, within the regular peer review system (so 
if you give me a patch that touches browser.js/tabbrowser.xml but also 
code in widget/, I will review the frontend bits but likely ask someone 
more appropriate to review the widget/-y bits). The peers might 
additionally proactively flag up "I think you should get ux/security 
review" or "have you talked to team X", but strictly speaking the 
responsibility lies with the patch author (and again, they should feel 
free to ask for help if not sure).
>> OK, so for this one the answer is "no". This concerns me because that
>> code ultimately affects how Firefox-the-product behaves (outside of what
>> webpages can do) and normally such changes are subject to Firefox's
>> review policy. This is the case for other go-faster add-ons (e.g. I have
>> reviewed a number of changes to the pocket "go faster" add-on in the
>> last few weeks). When this was not the case for self-repair code, that
>> burned us.
> Changes to the sandbox require a Firefox review since they're part of
> the system add-on code. The intent is that reviewers will use that
> opportunity to ensure only new behaviors that they're comfortable with
> are added. For example, instead of adding a "change any pref" function,
> we'd add a function that can change prefs from a whitelist of approved
> prefs. This helps reduce the impact of bugs in the JS code.
Yes, but I wasn't talking about bugs in the sandbox JS code for the 
atomic functionality.

SHIELD will likely need things like "show a thing to the user". If "show 
a thing to the user" happens every startup, or not at all, or ... - that 
is bad. If the recipe controls when we call the "show a thing to the 
user" method, and the recipe has a bug so we end up not showing the 
thing at the right times, that's not a bug you can catch by reviewing 
all the sandbox code that just handles "if this method is called, show 
thing".

Obviously we will want to have logic in the recipes, because the "whole 
point" of the model in use here is so that there's flexibility and we 
won't have to hand-code a new add-on for every study people want to run. 
That's fine, I'm just saying that the logic in the recipes is important, 
can be buggy and cause problems, and therefore needs appropriate review.

> Also, all the behavior changes S&I put out via SHIELD involve approvals
> via bugs and emails with external stakeholders and product owners.
Sure, but that is not the same as code review. Approval for "it should 
do X" is not the same as "I've reviewed the code and am confident it 
does X rather than Y".
> Experiments have a process for data and implementation review, and the
> sampling we use on these recipes makes the impact of issues pretty small.
Can you elaborate on this? In the past these recipes have gone out to 
the entire (en-US) user population, and any sampling had to happen 
*within* the equivalent of the 'recipe' (so you generate a random number 
in the recipe code and check if it's in some bucket before doing the 
thing the recipe says to do). If that code was/is buggy, the impact 
could still be severe. Is the current setup architectured differently so 
this happens within the server software (and some clients will ask for a 
recipe and get "nope, nothing here") ? Or something else?
> As for being burned by a lack of review in the past, SHIELD has both
> more review than self-repair had and better mitigation when things go
> wrong.
Good! :-)
> If a recipe is broken, we can disable it in one step with the
> admin interface, whereas self-repair required writing and shipping a
> patch to the self-repair repo.
Maybe, but the upside there was that when things went pearshapes I could 
find someone to give me access to said github repo so I could revert 
something and it got autodeployed, even when it wasn't really normal 
daytime in the US.

Can we ensure IT/ops/moc have access to this thing so they can turn 
things off (day or night) if they go pearshapes (which is what we do for 
other web things, AIUI) ?

>> Followup question: does this code in any way follow a train model (ie
>> get tested against Firefox Nightly users first) ? The self-repair code
>> didn't and just ran against 10% of release users to test it, which,
>> again, burned us in the past.
> It doesn't follow a train model, but we often run new recipes / actions
> against small percentages (0.001%, for example) to test if it performs
> as we expect before bumping up the sample rate. We are also capable of
> running recipes on specific channels if the risk is deemed high enough
> to warrant that.
Great! Still interested in how the sampling is happening.

>> Related, why can't the experiments people "want" to run be run against
>> Nightly users first to test the viability of the system add-on and the
>> rest of the platforms involved with running these
>> recipes/experiments/snippets? Why the need to immediately uplift to
> Aurora?
>
> We have ways of doing basic "Does it work, does it break everything"
> testing on Nightly that we plan to perform, but that won't take terribly
> long to run. The interesting tests we want to do that require real
> surveys we want to test are useful because they involve testing the
> entire pipeline from downloading and displaying a Heartbeat prompt to
> analyzing the results.
>
> We also sample these prompts (to avoid spamming users with repeated
> prompts),
I'm really sorry, but this makes it sound like the only way you have of 
avoiding a user gets shown a prompt twice is by using low sampling 
percentages. That... that wouldn't be good (also because birthday 
paradox). Please tell me I'm reading too much into this and you have 
better ways of avoiding "repeat" showings? (FWIW, as I raised this exact 
issue earlier in the email, this was one of the issues the self-repair 
stuff had and I hope we've learned from that...)

>   and rate of engagement for the prompt is low enough that
> running a survey on just the Nightly population won't result in much
> actual testing of the full flow through the prompt.
I assume we could run things on a higher sample rate against Nightly to 
avoid this problem and still get testing of the full flow, also by QA, 
on "production" Nightlies + servers?
> Also also, there is some level of "omg the data from this system is
> useful enough that if we can get it 6 weeks sooner that'd be awesome"
> behind wanting to uplift. The backlog of requested studies is
> looooooooong, and landing the system add-on is key to increasing the
> rate of studies. These studies help us improve Firefox in ways we know
> make a difference, since we've measured them.
Right. We're relatively early in the cycle - I'm not opposed to uplift 
per se, I would just like to see this tested on Nightly (like most other 
things) before that happens.

>> What's required to "back out" or "disable" or "turn off" a recipe or
>> this JS code / configuration? The docs say enabling a recipe takes 2
>> people, so I'm assuming we're talking about the same thing. How many
>> people get access to the "turn it off" switch and where are they located
>> (both team/org-wise and timezone/location-wise) ? Again, I don't want to
>> end up with a situation where something gets approved and then needs
>> changing and nobody is available to do that.
> A checkbox in the admin, and it only requires a single person. Approval
> is intended for approving a given recipe to be sent out, but once a
> recipe and it's filters/configuration have been approved, enabling or
> disabling the recipe does not require approval so as to avoid not be
> able to pull a recipe quickly.
Great, this sound sensible, thanks for clarifying.
> Time zone distribution of people who can pull recipes is a good point
> that we haven't addressed yet. Right now all the people with access are
> in North America. What's the concern about org-wise distribution,
> though? Are you concerned about getting approval for pulling a recipe?
Bad phrasing on my part, I think, sorry. From what I recalled, the team 
that actively worked on self-repair (and therefore SHIELD, I'm 
guessing?) is relatively small, so it just seemed like having more 
people have access to pull things would be a Useful Thing. As noted 
further up, I think IT/ops/#moc would be good candidates.

>> Related: are there controls to stop us from deploying stuff to release
>> late PDT (or on Friday or whatever) so that the world burns while the US
>> and Europe are asleep or on holiday?
> There's at least an implicit policy to not deploy when everyone who can
> monitor and disable recipes is about to be unavailable for an extended
> period of time, but an explicit policy for this would be good. I think
> this is really just a symptom of the greater issue of having an explicit
> group of people with recipe access and documented policies for that
> group. I'll work on getting a better answer for this.
Great, thank you.

~ Gijs



More information about the firefox-dev mailing list