I’ve been working at Google for about three years now, and was fortunate enough to transfer onto the Chrome extensions team about a year ago. Mostly, I support developers working on Chrome extensions, but from time to time I work on projects for the team to keep my sanity. A good example of this is the Chrome extensions samples browser. The extension docs are built and hosted automatically from the Chromium source tree so I modified the docs build script to generate the gallery and zip each sample into an easily-downloadable archive. Of course zips are fine if you want to peek into sample code, but not so good if you just want to quickly test a sample to see what it does. To address this, I’ve been working on a method to offer each zip as a packaged crx. Some considerations: * I didn’t want to check the .crx files into the source tree because of the hassles involved with binary files in source control. We’ve had some issues with the generated zip files, so managing two sets of archives seemed like it could be trouble. * The extension docs are hosted on App Engine, meaning there are some restrictions on what kind of libraries I could use. While there are already solutions for packing extensions in Python, they rely on OpenSSL, which isn’t available on App Engine. I decided to write a Python library which could run on App Engine and convert a directory of files into a Chrome extension crx archive. I didn’t find a ton of information online to help me do this automatically, so I decided to write up my findings for anyone heading down this road in the future. It should be pretty useful if you ever want to host a CRX from an app engine app for whatever reason (offering a debug/trusted tester version, for example).
Figuring out the format
From the CRX format documentation I knew I needed to create a binary file containing a header, an RSA public key, an RSA signature, and the bytes of a zip file with the extension contents. The RSA key is used to generate a signature of the zip file contents, so I needed to figure out how to get a zip of a directory first.
Obtaining a zip file
Technically, the Chrome extension documentation samples are already zipped and
checked into source control, but it’s fairly easy to zip up a directory in
Python. Most projects probably won’t have a zip handy, so I’m including the
step here.
I wanted some code that wouldn’t have to write the zip to the filesystem, so I
used the StringIO
module to generate a zip file in memory:
import StringIO
import os
import zipfile
…
zip_buffer = StringIO.StringIO()
zip_file = zipfile.ZipFile(zip_buffer, ‘w’)
path = ‘path/to/extension/directory’
try:
for root, dirs, files in os.walk(path):
for file in files:
# Absolute path to the file to be added.
abspath = os.path.realpath(os.path.join(root, file))
# Write a relative path into the zip file.
relpath = abspath.replace(path + ‘/’, “”)
zip_file.write(abspath, relpath)
except RuntimeError, msg:
raise Exception(“Could not write zip!”)
finally:
zip_file.close()
zip_string = zip_buffer.getvalue()
This way zip_string
contains the bytes of a zip file containing the specified
directory. You’ll see that I’m using a simple form of generating a relative
path:
# Absolute path to the file to be added.
abspath = os.path.realpath(os.path.join(root, file))
# Write a relative path into the zip file.
relpath = abspath.replace(path + ‘/’, “”)
There’s actually an os.path.relpath
function which would do this a bit more
directly, but according to the Python docs, this function is only available in
Python 2.6 on Windows and Unix, so I try not to rely on it.
Generating an RSA key
For the rest of the file contents, I needed to sign the zip file with an RSA key, which had to be generated. While App Engine doesn’t have OpenSSL module support (which I would normally use), it does include a simple RSA package which can be used to generate a key: from Crypto.PublicKey import RSA import os … rsakey = RSA.generate(1024, os.urandom) This is computationally intensive, so I usually generate a key if needed and store it in the data store for reuse. For local development, you can install the package from the PyCrypto homepage. At this point I had a zip file and a key, so I needed to figure out exactly how to sign a piece of data according to the RSA specification in order to obtain the signature.
Figuring out the RSA signature payload format
From the packaging instructions, I knew I could use OpenSSL to generate a
signature, but wasn’t really sure what it was doing under the covers. So I
figured the best approach would be to sign an existing extension and see what
the signature was using OpenSSL’s command line tools. I packaged an extension
using Chrome to generate a .pem key, then zipped the sources and ran the
following:
$ openssl sha1 -sign key.pem extension.zip > extension.sig
$ openssl rsautl -verify -in extension.sig -inkey key.pem -raw -hexdump
0000 - 00 01 ff ff ff ff ff ff-ff ff ff ff ff ff ff ff …………….
0010 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff …………….
0020 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff …………….
0030 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff …………….
0040 - ff ff ff ff ff ff ff ff-ff ff ff ff ff ff ff ff …………….
0050 - ff ff ff ff ff ff ff ff-ff ff ff ff 00 30 21 30 ………….0!0
0060 - 09 06 05 2b 0e 03 02 1a-05 00 04 14 2d bf 9f 85 …+……..-…
0070 - 4a a1 68 a9 a0 64 b2 c7-11 36 da ce 92 17 e9 29 J.h..d…6…..)
This gave me a raw hex dump of the signature, so I started going through the
RSA-related specifications to figure out what I was looking at. Turns out this
is actually formatted according to Section 8.1 Encryption-block
formatting of the PKCS#1 specification:
A block type BT, a padding string PS, and the data D shall be
formatted into an octet string EB, the encryption block.
EB = 00 || BT || PS || 00 || D . (1)
The block type BT shall be a single octet indicating the structure of
the encryption block. For this version of the document it shall have
value 00, 01, or 02. For a private- key operation, the block type
shall be 00 or 01. For a public-key operation, it shall be 02.
The padding string PS shall consist of k-3-||D|| octets. For block
type 00, the octets shall have value 00; for block type 01, they
shall have value FF; and for block type 02, they shall be
pseudorandomly generated and nonzero. This makes the length of the
encryption block EB equal to k.
The format of the hext dump corresponds with block type 01. Knowing that the
octet string started at the first 30
octet, I thought a good
approach would be to write a script to decode that data and see what exactly
was stored there. I’ve done some work with RSA signatures before and know that
everything is encoded using the ASN.1 format. Luckily there’s a great Python
pyasn library which will decode this data and work on App Engine,
to boot.
Here’s the script I wrote:
from pyasn1.codec.der import decoder
raw_obj = (‘3021300906052b0e03021a050004142dbf9’
‘f854aa168a9a064b2c71136dace9217e929’).decode(‘hex’)
der_obj = decoder.decode(raw_obj)
You’ll see that the string I’m decoding corresponds with the contents of the
hex dump listed above. Dumping the resulting der_obj
gave me this
ASN.1 structure:
(Sequence()
.setComponentByPosition(0,
Sequence()
.setComponentByPosition(0, ObjectIdentifier(‘1.3.14.3.2.26’))
.setComponentByPosition(1, Null(“)))
.setComponentByPosition(1, OctetString(
‘-\xbf\x9f\x85J\xa1h\xa9\xa0d\xb2\xc7\x116\xda\xce\x92\x17\xe9)’))
, “)
It might not look promising, but this was pretty good. The
OctetString
towards the end was actually the raw hex bytes of a
SHA1 hash of the same zip file. You can calculate this in python with:
import hashlib
hashlib.sha1(zip_string).digest()
where zip_string
is the contents of a zip file read from the
filesystem, or generated using the StringIO
approach I listed
above.
I wanted to know what the ObjectIdentifier
string represented. I
had a hunch it was the signing algorithm, and from PKCS #1 I
found that the digest information should be encoded in the following way:
DigestInfo ::= SEQUENCE {
digestAlgorithm DigestAlgorithmIdentifier,
digest Digest }
DigestAlgorithmIdentifier ::= AlgorithmIdentifier
Digest ::= OCTET STRING
Sure enough, the compnents before the OctetString
were algorithm
identifiers. I found it strange that SHA1 wasn’t actually listed as an
AlgorithmIdentifier
(maybe it came after?) but later the spec
talks about the md#
algorithms:
These object identifiers are intended to be used in the algorithm
field of a value of type AlgorithmIdentifier. The parameters field of
that type, which has the algorithm-specific syntax ANY DEFINED BY
algorithm, would have ASN.1 type NULL for these algorithms.
So cool, at least that explains the .setComponentByPosition(1,
Null(''))
in the algorithm identifier.
At this point I was pretty sure that ‘1.3.14.3.2.26’
was the
algorithm identifier for SHA1, but I wanted to make sure. Looking a bit more,
I found the identifier in the X.509 algorithms spec:
The signature algorithm with SHA-1 and the RSA encryption algorithm
is implemented using the padding and encoding conventions described
in PKCS #1 [RFC 2313]. The message digest is computed using the
SHA-1 hash algorithm.
The RSA signature algorithm, as specified in PKCS #1 [RFC 2313]
includes a data encoding step. In this step, the message digest and
the OID for the one-way hash function used to compute the digest are
combined. When performing the data encoding step, the md2, md5, and
id-sha1 OIDs MUST be used to specify the MD2, MD5, and SHA-1 one-way
hash functions, respectively:
…
id-sha1 OBJECT IDENTIFIER ::= {
iso(1) identified-organization(3) oiw(14) secsig(3)
algorithms(2) 26 }
Sure enough, iso(1) identified-organization(3) oiw(14) secsig(3)
algorithms(2) 26
matches ‘1.3.14.3.2.26’
.
That explained the payload format. Now I needed a way to build up the
signature from my own code to match this structure.
Generating the RSA signature
Thankfully pyasn1 makes generating the appropriate structures pretty easy.
import hashlib
from pyasn1.codec.der import encoder
from pyasn1.type import univ
…
# Obtain the hash of the zip file contents
zip_hash = hashlib.sha1(zip_string).digest()
# Get the SHA1 AlgorithmIdentifier
sha1identifier = univ.ObjectIdentifier(‘1.3.14.3.2.26’)
sha1info = univ.Sequence()
sha1info.setComponentByPosition(0, sha1identifier)
sha1info.setComponentByPosition(1, univ.Null(”))
# Get the DigestInfo sequence, composed of the SHA1 id and the zip hash
digestinfo = univ.Sequence()
digestinfo.setComponentByPosition(0, sha1info)
digestinfo.setComponentByPosition(1, univ.OctetString(zip_hash))
# Encode the sequence into ASN.1
digest = encoder.encode(digestinfo)
Now digest
contains the raw bytes of the ASN.1
DigestInfo
structure. I needed to pad it with ff
octets according to the PKCS#1 specification.
paddinglength = 128 - 3 - len(digest)
paddedhexstr = “0001%s00%s” % (paddinglength * ‘ff’, digest.encode(‘hex’))
Finally, pycrypto
supports a method to generate a RSA signature
given a key.
from Crypto.PublicKey import RSA
import os
…
rsakey = RSA.generate(1024, os.urandom)
signature_bytes = rsakey.sign(paddedhexstr.decode(‘hex’), “”)[0]
signature = (‘%X’ % signature_bytes).decode(‘hex’)
There are a lot of conversions back and forth between hex and binary here. I
feel that could be cleaned up a bit and everything could probably be kept in
binary, but it’d be a bit harder to follow what was going on. At the end of
the day, the signature
variable contains the raw bytes of a RSA
signature of the zip file contents.
Obtaining a public key
At this point I had the contents of the zip file and an RSA signature of that.
The last major component of the CRX file that I needed to calculate was the
public key portion of the RSA key.
The script in the CRX format documentation uses OpenSSL to generate a public
key:
$ openssl rsa -pubout -outform DER
From the documentation the -outform DER
option
states:
The DER option uses an ASN1 DER encoded form compatible with the PKCS#1
RSAPrivateKey or SubjectPublicKeyInfo format.
Strangely, I couldn’t find a reference to SubjectPublicKeyInfo
in
PKCS #1, but it was in the X.509 certificate profile
spec:
SubjectPublicKeyInfo ::= SEQUENCE {
algorithm AlgorithmIdentifier,
subjectPublicKey BIT STRING }
and the appropriate algorithm identifier and key format was once again found in
the X.509 algorithms spec:
The OID rsaEncryption identifies RSA public keys.
pkcs-1 OBJECT IDENTIFIER ::= { iso(1) member-body(2) us(840)
rsadsi(113549) pkcs(1) 1 }
rsaEncryption OBJECT IDENTIFIER ::= { pkcs-1 1}
The rsaEncryption OID is intended to be used in the algorithm field
of a value of type AlgorithmIdentifier. The parameters field MUST
have ASN.1 type NULL for this algorithm identifier.
The RSA public key MUST be encoded using the ASN.1 type RSAPublicKey:
RSAPublicKey ::= SEQUENCE {
modulus INTEGER, – n
publicExponent INTEGER } – e
So I needed to get the bit string format of the RSAPublicKey
version of my RSA key, and then encode that together into a
SubjectPublicKey
format:
from pyasn1.codec.der import encoder
from pyasn1.type import univ
…
# Get a RSAPublicKey structure
pkinfo = univ.Sequence()
pkinfo.setComponentByPosition(0, univ.Integer(rsakey.n))
pkinfo.setComponentByPosition(1, univ.Integer(rsakey.e))
#Convert the key into a bit string
def to_bitstring(self, num):
buf = “
while num > 1:
buf = str(num & 1) + buf
num = num >> 1
buf = str(num) + buf
return buf
pklong = long(encoder.encode(pkinfo).encode(‘hex’), 16)
pkbitstring = univ.BitString(”‘00%s’B” % to_bitstring(pklong))
# Get the rsaEncryption identifier:
idrsaencryption = univ.ObjectIdentifier(‘1.2.840.113549.1.1.1’)
# Get the AlgorithmIdentifier for rsaEncryption
idinfo = univ.Sequence()
idinfo.setComponentByPosition(0, idrsaencryption)
idinfo.setComponentByPosition(1, univ.Null(“))
# Get the SubjectPublicKeyInfo structure
publickeyinfo = univ.Sequence()
publickeyinfo.setComponentByPosition(0, idinfo)
publickeyinfo.setComponentByPosition(1, pkbitstring)
# Encode the public key structure
publickey = encoder.encode(publickeyinfo)
Writing the CRX format
Compared to the research and byte manipulating I needed to do earlier, actually
getting the component pieces into the CRX format was incredibly easy. I used
another StringIO
instance to write the pieces in the following
order (obtained from the CRX format docs):
1. The string “Cr24”, a ‘magic number’ specific to the CRX format
1. The number 2 (CRX file format version)
1. The length of the public key in bytes
1. The length of the signature in bytes
1. The public key
1. The signature
1. The contents of the zip file
Here it is in python:
import StringIO
import struct
…
crx_buffer = StringIO.StringIO(“wb”)
crx_buffer.write(“Cr24”) # Extension file magic number, from the CRX focs
crx_buffer.write(struct.pack(‘iii’, 2, len(publickey), len(signature)))
crx_buffer.write(publickey)
crx_buffer.write(signature)
crx_buffer.write(zip_string)
crx_file = crx_buffer.getvalue()
Outputting crx_file
to a file or serving it from a webserver as a
binary file will deliver the CRX file in a package that can be installed into
Chrome.
Conclusion
It was certainly a lot of research to accomplish the same effect as this 42 line script, but I find it pretty satisfying to be able to figure out the component parts of the CRX format. Having been on the author end of a specification, I really appreciate how much work went into making these RFCs comprehensive yet still understandable. At the end of the day, I had a script that could run on App Engine and package a directory into a CRX file. If you’re interested in running it, I’ve included a finished version below. You can download the entire sample project on github. main.py #!/usr/bin/env python # # Copyright 2010 Google Inc. # # Licensed under the Apache License, Version 2.0 (the “License”); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an “AS IS” BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.appengine.ext import webapp from google.appengine.ext.webapp import util import crx import os class MainHandler(webapp.RequestHandler): def get(self): self.response.out.write(‘Download the ’ ‘packaged extension.’) class CrxHandler(webapp.RequestHandler): def get(self): zipper = crx.Zipper() packager = crx.Packager() key = crx.SigningKey.getOrCreate() base_dir = os.path.realpath(os.path.dirname(file)) extension_dir = os.path.join(base_dir, “extension-dir”) extension = packager.package(zipper.zip(extension_dir), key) self.response.headers[‘Content-Type’] = ‘application/x-chrome-extension’ self.response.out.write(extension) def main(): application = webapp.WSGIApplication([ (‘/’, MainHandler), (‘/extension.crx’, CrxHandler), ], debug=True) util.run_wsgi_app(application) if name == ‘main’: main() crx.py #!/usr/bin/env python # # Copyright 2010 Google Inc. # # Licensed under the Apache License, Version 2.0 (the “License”); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an “AS IS” BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import StringIO import os import hashlib import zipfile import struct import pickle from pyasn1.codec.der import encoder from pyasn1.type import univ from Crypto.PublicKey import RSA from google.appengine.ext import db class Zipper(object): “”” Handles creating zip files. “”” def zip(self, path): “”” Returns the contents of a path as a binary string reprentation of a zip file.“”” zip_buffer = StringIO.StringIO() zip_file = zipfile.ZipFile(zip_buffer, ‘w’) try: for root, dirs, files in os.walk(path): for file in files: # Absolute path to the file to be added. abspath = os.path.realpath(os.path.join(root, file)) # Write a relative path into the zip file. relpath = abspath.replace(path + ‘/’, “”) zip_file.write(abspath, relpath) except RuntimeError, msg: raise Exception(“Could not write zip!”) finally: zip_file.close() zip_string = zip_buffer.getvalue() return zipstring class SigningKey(db.Model): “”” Represents an RSA key that can be used to sign an extension. The first time getOrCreate is called, a new key is generated and stored in the data store. Subsequent calls will return the original key.“”” blob = db.BlobProperty() def toBitString(self, num): “”” Converts a long into the bit string. “”” buf = “ while num > 1: buf = str(num & 1) + buf num = num >> 1 buf = str(num) + buf return buf def getRSAKey(self): “”” Gets a data structure representing an RSA public+private key. “”” return pickle.loads(self.blob) def getRSAPublicKey(self): “”” Gets an ASN.1-encoded form of this RSA key’s public key. “”” # Get a RSAPublicKey structure pkinfo = univ.Sequence() rsakey = self.getRSAKey() pkinfo.setComponentByPosition(0, univ.Integer(rsakey.n)) pkinfo.setComponentByPosition(1, univ.Integer(rsakey.e)) # Encode the public key info as a bit string pklong = long(encoder.encode(pkinfo).encode(‘hex’), 16) pkbitstring = univ.BitString(“‘00%s’B” % self.toBitString_(pklong)) # Get the rsaEncryption identifier: idrsaencryption = univ.ObjectIdentifier(‘1.2.840.113549.1.1.1’) # Get the AlgorithmIdentifier for rsaEncryption idinfo = univ.Sequence() idinfo.setComponentByPosition(0, idrsaencryption) idinfo.setComponentByPosition(1, univ.Null(“)) # Get the SubjectPublicKeyInfo structure publickeyinfo = univ.Sequence() publickeyinfo.setComponentByPosition(0, idinfo) publickeyinfo.setComponentByPosition(1, pkbitstring) # Encode the public key structure publickey = encoder.encode(publickeyinfo) return publickey @staticmethod def getOrCreate(): “”” Returns a signing key from the data store or creates one if it doesn’t already exist. “”” # See if there’s already a key in the datastore key = SigningKey.get_by_key_name(‘signingkey’) if not key: # Create one if not rsakey = RSA.generate(1024, os.urandom) key = SigningKey(key_name=‘signingkey’, blob=pickle.dumps(rsakey)) # Store it for use later key.put() return key class Packager(object): “”” Handles creating CRX files. “”” def package(self, zip_string, key): “”” Packages a zip file into a CRX, given a signing key. “”” # Obtain the hash of the zip file contents zip_hash = hashlib.sha1(zip_string).digest() # Get the SHA1 AlgorithmIdentifier sha1identifier = univ.ObjectIdentifier(‘1.3.14.3.2.26’) sha1info = univ.Sequence() sha1info.setComponentByPosition(0, sha1identifier) sha1info.setComponentByPosition(1, univ.Null(“)) # Get the DigestInfo sequence, composed of the SHA1 id and the zip hash digestinfo = univ.Sequence() digestinfo.setComponentByPosition(0, sha1info) digestinfo.setComponentByPosition(1, univ.OctetString(zip_hash)) # Encode the sequence into ASN.1 digest = encoder.encode(digestinfo) # Pad the hash paddinglength = 128 - 3 - len(digest) paddedhexstr = “0001%s00%s” % (paddinglength * ‘ff’, digest.encode(‘hex’)) # Calculate the signature signature_bytes = key.getRSAKey().sign(paddedhexstr.decode(‘hex’), “”)[0] signature = (‘%X’ % signature_bytes).decode(‘hex’) # Get the public key publickey = key.getRSAPublicKey() # Write the actual CRX contents crx_buffer = StringIO.StringIO(“wb”) crx_buffer.write(“Cr24”) # Extension file magic number, from the CRX focs crx_buffer.write(struct.pack(‘iii’, 2, len(publickey), len(signature))) crx_buffer.write(publickey) crx_buffer.write(signature) crx_buffer.write(zip_string) crx_file = crx_buffer.getvalue() return crx_file
Comments? If you have feedback, please share it with me on Twitter!