Packaging Google Chrome extensions

I’ve been working at Google for about three years now, and was fortunate enough to transfer onto the Chrome extensions team about a year ago. Mostly, I support developers working on Chrome extensions, but from time to time I work on projects for the team to keep my sanity. A good example of this is the Chrome extensions samples browser. The extension docs are built and hosted automatically from the Chromium source tree so I modified the docs build script to generate the gallery and zip each sample into an easily-downloadable archive.

Of course zips are fine if you want to peek into sample code, but not so good if you just want to quickly test a sample to see what it does. To address this, I’ve been working on a method to offer each zip as a packaged crx.

Some considerations:

  • I didn’t want to check the .crx files into the source tree because of the hassles involved with binary files in source control. We’ve had some issues with the generated zip files, so managing two sets of archives seemed like it could be trouble.

  • The extension docs are hosted on App Engine, meaning there are some restrictions on what kind of libraries I could use. While there are already solutions for packing extensions in Python, they rely on OpenSSL, which isn’t available on App Engine.

I decided to write a Python library which could run on App Engine and convert a directory of files into a Chrome extension crx archive. I didn’t find a ton of information online to help me do this automatically, so I decided to write up my findings for anyone heading down this road in the future. It should be pretty useful if you ever want to host a CRX from an app engine app for whatever reason (offering a debug/trusted tester version, for example).

Figuring out the format

From the CRX format documentation I knew I needed to create a binary file containing a header, an RSA public key, an RSA signature, and the bytes of a zip file with the extension contents.

The RSA key is used to generate a signature of the zip file contents, so I needed to figure out how to get a zip of a directory first.

Obtaining a zip file

Technically, the Chrome extension documentation samples are already zipped and checked into source control, but it’s fairly easy to zip up a directory in Python. Most projects probably won’t have a zip handy, so I’m including the step here.

I wanted some code that wouldn’t have to write the zip to the filesystem, so I used the StringIO module to generate a zip file in memory:

import StringIO
import os
import zipfile

...

zip_buffer = StringIO.StringIO()
zip_file = zipfile.ZipFile(zip_buffer, 'w')
path = 'path/to/extension/directory'
try:
  for root, dirs, files in os.walk(path):
    for file in files:
      # Absolute path to the file to be added.
      abspath = os.path.realpath(os.path.join(root, file))
      # Write a relative path into the zip file.
      relpath = abspath.replace(path + '/', "")
      zip_file.write(abspath, relpath)
except RuntimeError, msg:
  raise Exception("Could not write zip!")
finally:
  zip_file.close()

zip_string = zip_buffer.getvalue()

This way zip_string contains the bytes of a zip file containing the specified directory. You’ll see that I’m using a simple form of generating a relative path:

# Absolute path to the file to be added.
abspath = os.path.realpath(os.path.join(root, file))
# Write a relative path into the zip file.
relpath = abspath.replace(path + '/', "")

There’s actually an os.path.relpath function which would do this a bit more directly, but according to the Python docs, this function is only available in Python 2.6 on Windows and Unix, so I try not to rely on it.

Generating an RSA key

For the rest of the file contents, I needed to sign the zip file with an RSA key, which had to be generated. While App Engine doesn’t have OpenSSL module support (which I would normally use), it does include a simple RSA package which can be used to generate a key:

from Crypto.PublicKey import RSA
import os

...

rsakey = RSA.generate(1024, os.urandom)

This is computationally intensive, so I usually generate a key if needed and store it in the data store for reuse. For local development, you can install the package from the PyCrypto homepage.

At this point I had a zip file and a key, so I needed to figure out exactly how to sign a piece of data according to the RSA specification in order to obtain the signature.

Figuring out the RSA signature payload format

From the packaging instructions, I knew I could use OpenSSL to generate a signature, but wasn’t really sure what it was doing under the covers. So I figured the best approach would be to sign an existing extension and see what the signature was using OpenSSL’s command line tools. I packaged an extension using Chrome to generate a .pem key, then zipped the sources and ran the following:

$ openssl sha1 -sign key.pem extension.zip > extension.sig
$ openssl rsautl -verify -in extension.sig -inkey key.pem -raw -hexdump
0000 - 00 01 ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0010 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0020 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0030 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0040 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff   ................
0050 - ff ff ff ff ff ff ff ff-ff ff ff ff 00 30 21 30   .............0!0
0060 - 09 06 05 2b 0e 03 02 1a-05 00 04 14 2d bf 9f 85   ...+........-...
0070 - 4a a1 68 a9 a0 64 b2 c7-11 36 da ce 92 17 e9 29   J.h..d...6.....)

This gave me a raw hex dump of the signature, so I started going through the RSA-related specifications to figure out what I was looking at. Turns out this is actually formatted according to Section 8.1 Encryption-block formatting of the PKCS#1 specification:

A block type BT, a padding string PS, and the data D shall be
formatted into an octet string EB, the encryption block.

          EB = 00 || BT || PS || 00 || D .           (1)

The block type BT shall be a single octet indicating the structure of
the encryption block. For this version of the document it shall have
value 00, 01, or 02. For a private- key operation, the block type
shall be 00 or 01. For a public-key operation, it shall be 02.

The padding string PS shall consist of k-3-||D|| octets. For block
type 00, the octets shall have value 00; for block type 01, they
shall have value FF; and for block type 02, they shall be
pseudorandomly generated and nonzero. This makes the length of the
encryption block EB equal to k.

The format of the hext dump corresponds with block type 01. Knowing that the octet string started at the first 30 octet, I thought a good approach would be to write a script to decode that data and see what exactly was stored there. I’ve done some work with RSA signatures before and know that everything is encoded using the ASN.1 format. Luckily there’s a great Python pyasn library which will decode this data and work on App Engine, to boot.

Here’s the script I wrote:

from pyasn1.codec.der import decoder

raw_obj = ('3021300906052b0e03021a050004142dbf9'
           'f854aa168a9a064b2c71136dace9217e929').decode('hex')
der_obj = decoder.decode(raw_obj)

You’ll see that the string I’m decoding corresponds with the contents of the hex dump listed above. Dumping the resulting der_obj gave me this ASN.1 structure:

(Sequence()
  .setComponentByPosition(0,
    Sequence()
      .setComponentByPosition(0, ObjectIdentifier('1.3.14.3.2.26'))
      .setComponentByPosition(1, Null('')))
  .setComponentByPosition(1, OctetString(
    '-\xbf\x9f\x85J\xa1h\xa9\xa0d\xb2\xc7\x116\xda\xce\x92\x17\xe9)'))
, '')

It might not look promising, but this was pretty good. The OctetString towards the end was actually the raw hex bytes of a SHA1 hash of the same zip file. You can calculate this in python with:

import hashlib
hashlib.sha1(zip_string).digest()

where zip_string is the contents of a zip file read from the filesystem, or generated using the StringIO approach I listed above.

I wanted to know what the ObjectIdentifier string represented. I had a hunch it was the signing algorithm, and from PKCS #1 I found that the digest information should be encoded in the following way:

DigestInfo ::= SEQUENCE {
   digestAlgorithm DigestAlgorithmIdentifier,
   digest Digest }
DigestAlgorithmIdentifier ::= AlgorithmIdentifier
Digest ::= OCTET STRING

Sure enough, the compnents before the OctetString were algorithm identifiers. I found it strange that SHA1 wasn’t actually listed as an AlgorithmIdentifier (maybe it came after?) but later the spec talks about the md# algorithms:

These object identifiers are intended to be used in the algorithm
field of a value of type AlgorithmIdentifier. The parameters field of
that type, which has the algorithm-specific syntax ANY DEFINED BY
algorithm, would have ASN.1 type NULL for these algorithms.

So cool, at least that explains the .setComponentByPosition(1, Null(“)) in the algorithm identifier.

At this point I was pretty sure that ‘1.3.14.3.2.26’ was the algorithm identifier for SHA1, but I wanted to make sure. Looking a bit more, I found the identifier in the X.509 algorithms spec:

The signature algorithm with SHA-1 and the RSA encryption algorithm
is implemented using the padding and encoding conventions described
in PKCS #1 [RFC 2313].  The message digest is computed using the
SHA-1 hash algorithm.

The RSA signature algorithm, as specified in PKCS #1 [RFC 2313]
includes a data encoding step.  In this step, the message digest and
the OID for the one-way hash function used to compute the digest are
combined.  When performing the data encoding step, the md2, md5, and
id-sha1 OIDs MUST be used to specify the MD2, MD5, and SHA-1 one-way
hash functions, respectively:

  ...
  id-sha1  OBJECT IDENTIFIER ::= {
    iso(1) identified-organization(3) oiw(14) secsig(3)
    algorithms(2) 26 }

Sure enough, iso(1) identified-organization(3) oiw(14) secsig(3) algorithms(2) 26 matches ‘1.3.14.3.2.26’.

That explained the payload format. Now I needed a way to build up the signature from my own code to match this structure.

Generating the RSA signature

Thankfully pyasn1 makes generating the appropriate structures pretty easy.

import hashlib
from pyasn1.codec.der import encoder
from pyasn1.type import univ

...

# Obtain the hash of the zip file contents
zip_hash = hashlib.sha1(zip_string).digest()

# Get the SHA1 AlgorithmIdentifier
sha1identifier = univ.ObjectIdentifier('1.3.14.3.2.26')
sha1info = univ.Sequence()
sha1info.setComponentByPosition(0, sha1identifier)
sha1info.setComponentByPosition(1, univ.Null(''))

# Get the DigestInfo sequence, composed of the SHA1 id and the zip hash
digestinfo = univ.Sequence()
digestinfo.setComponentByPosition(0, sha1info)
digestinfo.setComponentByPosition(1, univ.OctetString(zip_hash))

# Encode the sequence into ASN.1
digest = encoder.encode(digestinfo)

Now digest contains the raw bytes of the ASN.1 DigestInfo structure. I needed to pad it with ff octets according to the PKCS#1 specification.

paddinglength = 128 - 3 - len(digest)
paddedhexstr = "0001%s00%s" % (paddinglength * 'ff', digest.encode('hex'))

Finally, pycrypto supports a method to generate a RSA signature given a key.

from Crypto.PublicKey import RSA
import os

...

rsakey = RSA.generate(1024, os.urandom)
signature_bytes = rsakey.sign(paddedhexstr.decode('hex'), "")[0]
signature = ('%X' % signature_bytes).decode('hex')

There are a lot of conversions back and forth between hex and binary here. I feel that could be cleaned up a bit and everything could probably be kept in binary, but it’d be a bit harder to follow what was going on. At the end of the day, the signature variable contains the raw bytes of a RSA signature of the zip file contents.

Obtaining a public key

At this point I had the contents of the zip file and an RSA signature of that. The last major component of the CRX file that I needed to calculate was the public key portion of the RSA key.

The script in the CRX format documentation uses OpenSSL to generate a public key:

$ openssl rsa -pubout -outform DER

From the documentation the -outform DER option states:

The DER option uses an ASN1 DER encoded form compatible with the PKCS#1
RSAPrivateKey or SubjectPublicKeyInfo format.

Strangely, I couldn’t find a reference to SubjectPublicKeyInfo in PKCS #1, but it was in the X.509 certificate profile spec:

SubjectPublicKeyInfo  ::=  SEQUENCE  {
    algorithm            AlgorithmIdentifier,
    subjectPublicKey     BIT STRING  }

and the appropriate algorithm identifier and key format was once again found in the X.509 algorithms spec:

The OID rsaEncryption identifies RSA public keys.

  pkcs-1 OBJECT IDENTIFIER ::= { iso(1) member-body(2) us(840)
                 rsadsi(113549) pkcs(1) 1 }

  rsaEncryption OBJECT IDENTIFIER ::=  { pkcs-1 1}

The rsaEncryption OID is intended to be used in the algorithm field
of a value of type AlgorithmIdentifier.  The parameters field MUST
have ASN.1 type NULL for this algorithm identifier.

The RSA public key MUST be encoded using the ASN.1 type RSAPublicKey:

  RSAPublicKey ::= SEQUENCE {
     modulus            INTEGER,    -- n
     publicExponent     INTEGER  }  -- e

So I needed to get the bit string format of the RSAPublicKey version of my RSA key, and then encode that together into a SubjectPublicKey format:

from pyasn1.codec.der import encoder
from pyasn1.type import univ

...

# Get a RSAPublicKey structure
pkinfo = univ.Sequence()
pkinfo.setComponentByPosition(0, univ.Integer(rsakey.n))
pkinfo.setComponentByPosition(1, univ.Integer(rsakey.e))

#Convert the key into a bit string
def to_bitstring(self, num):
  buf = ''
  while num > 1:
    buf = str(num & 1) + buf
    num = num >> 1
  buf = str(num) + buf
  return buf
pklong = long(encoder.encode(pkinfo).encode('hex'), 16)
pkbitstring = univ.BitString("'00%s'B" % to_bitstring(pklong))

# Get the rsaEncryption identifier:
idrsaencryption = univ.ObjectIdentifier('1.2.840.113549.1.1.1')

# Get the AlgorithmIdentifier for rsaEncryption
idinfo = univ.Sequence()
idinfo.setComponentByPosition(0, idrsaencryption)
idinfo.setComponentByPosition(1, univ.Null(''))

# Get the SubjectPublicKeyInfo structure
publickeyinfo = univ.Sequence()
publickeyinfo.setComponentByPosition(0, idinfo)
publickeyinfo.setComponentByPosition(1, pkbitstring)

# Encode the public key structure
publickey = encoder.encode(publickeyinfo)

Writing the CRX format

Compared to the research and byte manipulating I needed to do earlier, actually getting the component pieces into the CRX format was incredibly easy. I used another StringIO instance to write the pieces in the following order (obtained from the CRX format docs):

  1. The string “Cr24”, a ‘magic number’ specific to the CRX format
  2. The number 2 (CRX file format version)
  3. The length of the public key in bytes
  4. The length of the signature in bytes
  5. The public key
  6. The signature
  7. The contents of the zip file

Here it is in python:

import StringIO
import struct

...

crx_buffer = StringIO.StringIO("wb")
crx_buffer.write("Cr24")  # Extension file magic number, from the CRX focs
crx_buffer.write(struct.pack('iii', 2, len(publickey), len(signature)))
crx_buffer.write(publickey)
crx_buffer.write(signature)
crx_buffer.write(zip_string)

crx_file = crx_buffer.getvalue()

Outputting crx_file to a file or serving it from a webserver as a binary file will deliver the CRX file in a package that can be installed into Chrome.

Conclusion

It was certainly a lot of research to accomplish the same effect as this 42 line script, but I find it pretty satisfying to be able to figure out the component parts of the CRX format. Having been on the author end of a specification, I really appreciate how much work went into making these RFCs comprehensive yet still understandable.

At the end of the day, I had a script that could run on App Engine and package a directory into a CRX file. If you’re interested in running it, I’ve included a finished version below. You can download the entire sample project on github.

main.py

#!/usr/bin/env python
#
# Copyright 2010 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
import crx
import os

class MainHandler(webapp.RequestHandler):
  def get(self):
    self.response.out.write('Download the '
                            'packaged extension.')

class CrxHandler(webapp.RequestHandler):
  def get(self):
    zipper = crx.Zipper()
    packager = crx.Packager()
    key = crx.SigningKey.getOrCreate()
    base_dir = os.path.realpath(os.path.dirname(__file__))
    extension_dir = os.path.join(base_dir, "extension-dir")
    extension = packager.package(zipper.zip(extension_dir), key)
    self.response.headers['Content-Type'] = 'application/x-chrome-extension'
    self.response.out.write(extension)

def main():
  application = webapp.WSGIApplication([
    ('/', MainHandler),
    ('/extension\.crx', CrxHandler),
  ], debug=True)
  util.run_wsgi_app(application)


if __name__ == '__main__':
    main()

crx.py

#!/usr/bin/env python
#
# Copyright 2010 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import StringIO
import os
import hashlib
import zipfile
import struct
import pickle

from pyasn1.codec.der import encoder
from pyasn1.type import univ
from Crypto.PublicKey import RSA

from google.appengine.ext import db

class Zipper(object):
  """ Handles creating zip files. """

  def zip(self, path):
    """ Returns the contents of a path as a binary string reprentation of a
    zip file."""
    zip_buffer = StringIO.StringIO()
    zip_file = zipfile.ZipFile(zip_buffer, 'w')
    try:
      for root, dirs, files in os.walk(path):
        for file in files:
          # Absolute path to the file to be added.
          abspath = os.path.realpath(os.path.join(root, file))
          # Write a relative path into the zip file.
          relpath = abspath.replace(path + '/', "")
          zip_file.write(abspath, relpath)
    except RuntimeError, msg:
      raise Exception("Could not write zip!")
    finally:
      zip_file.close()
    zip_string = zip_buffer.getvalue()
    return zip_string


class SigningKey(db.Model):
  """ Represents an RSA key that can be used to sign an extension.

  The first time getOrCreate is called, a new key is generated and stored in
  the data store.  Subsequent calls will return the original key."""

  blob = db.BlobProperty()

  def toBitString_(self, num):
    """ Converts a long into the bit string. """
    buf = ''
    while num > 1:
      buf = str(num & 1) + buf
      num = num >> 1
    buf = str(num) + buf
    return buf

  def getRSAKey(self):
    """ Gets a data structure representing an RSA public+private key. """
    return pickle.loads(self.blob)

  def getRSAPublicKey(self):
    """ Gets an ASN.1-encoded form of this RSA key's public key. """
    # Get a RSAPublicKey structure
    pkinfo = univ.Sequence()
    rsakey = self.getRSAKey()
    pkinfo.setComponentByPosition(0, univ.Integer(rsakey.n))
    pkinfo.setComponentByPosition(1, univ.Integer(rsakey.e))

    # Encode the public key info as a bit string
    pklong = long(encoder.encode(pkinfo).encode('hex'), 16)
    pkbitstring = univ.BitString("'00%s'B" % self.toBitString_(pklong))

    # Get the rsaEncryption identifier:
    idrsaencryption = univ.ObjectIdentifier('1.2.840.113549.1.1.1')

    # Get the AlgorithmIdentifier for rsaEncryption
    idinfo = univ.Sequence()
    idinfo.setComponentByPosition(0, idrsaencryption)
    idinfo.setComponentByPosition(1, univ.Null(''))

    # Get the SubjectPublicKeyInfo structure
    publickeyinfo = univ.Sequence()
    publickeyinfo.setComponentByPosition(0, idinfo)
    publickeyinfo.setComponentByPosition(1, pkbitstring)

    # Encode the public key structure
    publickey = encoder.encode(publickeyinfo)
    return publickey

  @staticmethod
  def getOrCreate():
    """ Returns a signing key from the data store or creates one if it doesn't
    already exist. """
    # See if there's already a key in the datastore
    key = SigningKey.get_by_key_name('signingkey')
    if not key:
      # Create one if not
      rsakey = RSA.generate(1024, os.urandom)
      key = SigningKey(key_name='signingkey', blob=pickle.dumps(rsakey))
      # Store it for use later
      key.put()
    return key


class Packager(object):
  """ Handles creating CRX files. """

  def package(self, zip_string, key):
    """ Packages a zip file into a CRX, given a signing key. """
    # Obtain the hash of the zip file contents
    zip_hash = hashlib.sha1(zip_string).digest()

    # Get the SHA1 AlgorithmIdentifier
    sha1identifier = univ.ObjectIdentifier('1.3.14.3.2.26')
    sha1info = univ.Sequence()
    sha1info.setComponentByPosition(0, sha1identifier)
    sha1info.setComponentByPosition(1, univ.Null(''))

    # Get the DigestInfo sequence, composed of the SHA1 id and the zip hash
    digestinfo = univ.Sequence()
    digestinfo.setComponentByPosition(0, sha1info)
    digestinfo.setComponentByPosition(1, univ.OctetString(zip_hash))

    # Encode the sequence into ASN.1
    digest = encoder.encode(digestinfo)

    # Pad the hash
    paddinglength = 128 - 3 - len(digest)
    paddedhexstr = "0001%s00%s" % (paddinglength * 'ff', digest.encode('hex'))

    # Calculate the signature
    signature_bytes = key.getRSAKey().sign(paddedhexstr.decode('hex'), "")[0]
    signature = ('%X' % signature_bytes).decode('hex')

    # Get the public key
    publickey = key.getRSAPublicKey()

    # Write the actual CRX contents
    crx_buffer = StringIO.StringIO("wb")
    crx_buffer.write("Cr24")  # Extension file magic number, from the CRX focs
    crx_buffer.write(struct.pack('iii', 2, len(publickey), len(signature)))
    crx_buffer.write(publickey)
    crx_buffer.write(signature)
    crx_buffer.write(zip_string)
    crx_file = crx_buffer.getvalue()
    return crx_file

Elsewhere

Twitter (@kurrik) Github (kurrik) Google+ (+kurrik) Linkedin (kurrik)

Tags

chrome (7) extensions (6) themanthemachine (6) twitter (4) reviews (3) presentations (3) go (3) appengine (2) google (2) html5 (2) books (2) internet (2) javascript (2) html (2) estonia (2) http (1) games (1) gtug (1) interviewing (1) space (1) recipes (1) googleio (1) freedomweek (1)