Jay Taylor | programmer notes

CAT | Uncategorized

# Shut down both VMs.
VBoxManage controlvm gw-lab_mesos-primary1a poweroff
VBoxManage controlvm gw-lab_mesos-primary2a poweroff

# Add a SATA controller port to the target VM (the one where fsck will be run from).
VBoxManage storageattach gw-lab_mesos-primary2a --medium none --storagectl SATAController --port 1 --device 0 --type hdd

# Attach the other hard drive to the target VM.
VBoxManage storageattach gw-lab_mesos-primary2a --medium /mnt/VirtualBox\ VMs/gw-lab_mesos-primary1a/Snapshots/\{4695a86f-e9f3-4e4f-8b48-0336af217815\}.vmdk --storagectl SATAController --port 1 --device 0 --type hdd

# Start the target VM.
VBoxManage startvm --type headless gw-lab_mesos-primary2a

ssh mesos-primary2a sudo fsck /dev/sdb1
y
y
y
y
y
...

Note: At first I somehow managed to attach the drive the mesos-primary2a, such that it showed up in `showhdinfo` but it wasn’t available in the target VM, and couldn’t be removed. Rebooting the host got VBox out of the funky state.

jaytaylor@host:/mnt/VirtualBox VMs$ VBoxManage showhdinfo /mnt/VirtualBox\ VMs/gw-lab_mesos-primary1a/Snapshots/\{4695a86f-e9f3-4e4f-8b48-0336af217815\}.vmdk
UUID: 50d87b4c-2c8d-40df-aeba-2153cbb7066d
Parent UUID: base
State: created
Type: normal (base)
Location: /mnt/VirtualBox VMs/gw-lab_mesos-primary1a/Snapshots/{4695a86f-e9f3-4e4f-8b48-0336af217815}.vmdk
Storage format: VMDK
Format variant: dynamic default
Capacity: 40960 MBytes
Size on disk: 38072 MBytes
In use by VMs: gw-lab_mesos-primary1a (UUID: 2160cfb5-1b5b-4f32-81bf-385f3d7a796a)
gw-lab_mesos-primary2a (UUID: c7a80492-cc66-4460-9b5a-53572875653c)

jaytaylor@host:/mnt/VirtualBox VMs$ VBoxManage showvminfo c7a80492-cc66-4460-9b5a-53572875653c --details
Name: gw-lab_mesos-primary2a
...
Default Frontend:
Storage Controller Name (0): SATAController
Storage Controller Type (0): IntelAhci
Storage Controller Instance Number (0): 0
Storage Controller Max Port Count (0): 30
Storage Controller Port Count (0): 2
Storage Controller Bootable (0): on
SATAController (0, 0): /mnt/VirtualBox VMs/gw-lab_mesos-primary2a/Snapshots/{7149016e-a75d-4612-b63e-52c8c5e45ad8}.vmdk (UUID: 64eb88f4-47cc-43b9-997a-6b1d440015da)
NIC 1: MAC: 0800278FD4DC, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
...

No tags Hide

I needed the binary “grpc_python_plugin” to follow the Python gRPC tutorial.

I’ve hit quite a few snags.

And it appears I’m not the only one

pip install grpio-tools

...

grpc/tools/main.cc:33:10: fatal error: 'src/compiler/python_generator.h' file not found

And the grpc docs don’t include macOS instructions.

Let’s start hacking:

wget https://pypi.python.org/packages/7b/22/93b83676787ab07fb7f8d8dcea5351efd6ee62ca0dfba8799cc06f375b37/grpcio_tools-0.14.0.tar.gz#md5=18dd40dd0ffba48bbb8ab865b7fbd23a

tar zxvf grpcio_tools-0.14.0.tar.gz

cd grpcio_tools

I found the python_generator.h file at https://github.com/grpc/grpc/blob/master/src/compiler/python_generator.h, so:

git clone https://github.com/grpc/grpc grpc_root

python setup.py build

...

clang -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -DHAVE_PTHREAD=1 -I. -Igrpc_root -Igrpc_root/include -Ithird_party/protobuf/src -I/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c grpc/tools/main.cc -o build/temp.macosx-10.9-x86_64-2.7/grpc/tools/main.o -frtti -std=c++11
clang -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -DHAVE_PTHREAD=1 -I. -Igrpc_root -Igrpc_root/include -Ithird_party/protobuf/src -I/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c grpc_root/src/compiler/python_generator.cc -o build/temp.macosx-10.9-x86_64-2.7/grpc_root/src/compiler/python_generator.o -frtti -std=c++11
clang -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -DHAVE_PTHREAD=1 -I. -Igrpc_root -Igrpc_root/include -Ithird_party/protobuf/src -I/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c third_party/protobuf/src/google/protobuf/compiler/zip_writer.cc -o build/temp.macosx-10.9-x86_64-2.7/third_party/protobuf/src/google/protobuf/compiler/zip_writer.o -frtti -std=c++11
clang: error: no such file or directory: 'third_party/protobuf/src/google/protobuf/compiler/zip_writer.cc'
clang: error: no input files
error: command 'clang' failed with exit status 1

clang: error: no such file or directory: 'third_party/protobuf/src/google/protobuf/compiler/zip_writer.cc'

Okay, what a mess! Well okay, I found the set of files in question.

mkdir tmp
cd tmp
wget https://android.googlesource.com/platform/external/chromium_org/+archive/00d67fb/third_party/protobuf/src/google/protobuf.tar.gz
tar xzvf protobuf.tar.gz
rm protobuf.tar.gz
mkdir -p ../grpc_root/third_party/protobuf/src/google/protobuf
mv * ../grpc_root/third_party/protobuf/src/google/protobuf
cd ..

Sadly, even after locating the files and lovingly injecting them, it has no effect and the build still errors out with the same error.

Okay, it turns out that was all wrong. The zip_writer.cc is included with the main grpc repository, it just comes from a submodule.

Let’s try just building that:

GRPC_PYTHON_BUILD_WITH_CYTHON=1 pip install .
...
commands.CommandError: could not find grpc_python_plugin (protoc plugin for GRPC Python)

Reviewing the relevant github issue #5378 grpc_python_plugin is not included with pip install grpcio, it became clear that @revantk hit the exact same problem and set of errors.

Here is my final solution:

Just in case it helps someone else..

If you're missing the `grpc_python_plugin` binary on macOS (Mac OS X?):

git clone https://github.com/google/protobuf.git
cd protobuf
./autogen.sh
./configure
make
make install
cd ..

Then:

git clone https://github.com/grpc/grpc
cd grpc
git submodule update --init --recursive
make grpc_python_plugin
cp bins/opt/grpc_python_plugin /usr/local/bin/

After this I was good to go!

No tags Hide

Frozen Virtual Machines

Lately one of my testing lab Ubuntu Linux hosts has been hanging and/or freezing (requiring a hard system reset) when load was introduced to any of the guest VMs.

A bit of research revealed VBox Ticket #8511: “Regular crashes or freezing”:

One of the most important answers is right there - if you're on a
Linux host and doing heavy disk I/O, do not use the host cache
for the VMs, ever. The Linux I/O subsystem not very smart, it
batches gobs of dirty pages in the filesystem cache, and when it
runs out of free memory, flushes out everything to disk. That can
take quite a long time (minutes) and there's nothing VirtualBox
can do about it.

The asynchronous I/O in VirtualBox was designed explicitly to work
around this host OS deficiency. The I/O doesn't go through the
host's cache and is written to disk much more frequently in smaller
chunks. However, VirtualBox isn't necessarily the only process
running on the host and something else still may trigger the
undesirable behavior.

The corollary to the above is obvious: If your host can't cope with
the I/O load generated by the VMs plus the rest of the system,
there will be trouble. Virtualization isn't magic and can't turn a slow
disk into a fast one.

The operative portion being:

If on a Linux host ... do not use the host cache for the VMs, ever.

Digging into the documentation I found out how to disabled host- caching on a per-VM-controlller basis:

VBoxManage storagectl VM-NAME-HERE --name SATAController --hostiocache off

After applying that to all VMs, voila! All fixed!

They may become slow under heavy load (still better than the freeze ups in the past).

No tags Hide

Today after I pulled the latest from the Play 2.0 repository and rebuilt the project, the local Play20 repository was wiped out and then my play apps were no longer able to run! SBT and the Play SBT Plugin could no longer be found, or the versions I had were no longer compatible w/ the latest version of Play. Here are some of the errors I was getting:

$ play run
Getting org.scala-sbt sbt 0.11.3 ...

:: problems summary ::
:::: WARNINGS
		module not found: org.scala-sbt#sbt;0.11.3

	==== local: tried

	  /usr/local/Play20/repository/local/org.scala-sbt/sbt/0.11.3/ivys/ivy.xml

	==== Maven2 Local: tried

	  file:///Users/user/.m2/repository/org/scala-sbt/sbt/0.11.3/sbt-0.11.3.pom

	==== typesafe-ivy-releases: tried

http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt/0.11.3/ivys/ivy.xml

	==== Maven Central: tried

http://repo1.maven.org/maven2/org/scala-sbt/sbt/0.11.3/sbt-0.11.3.pom

		::::::::::::::::::::::::::::::::::::::::::::::
		::          UNRESOLVED DEPENDENCIES         ::
		::::::::::::::::::::::::::::::::::::::::::::::
		:: org.scala-sbt#sbt;0.11.3: not found      ::
		::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
unresolved dependency: org.scala-sbt#sbt;0.11.3: not found
Error during sbt execution: Error retrieving required libraries
  (see /usr/local/Play20/framework/sbt/boot/update.log for complete log)
Error: Could not retrieve sbt 0.11.3

which eventually further degraded into..

[warn] 	module not found: play#sbt-plugin;2.1-07132012
[warn] ==== typesafe-ivy-releases: tried
[warn]   http://repo.typesafe.com/typesafe/ivy-releases/play/sbt-plugin/scala_2.9.2/sbt_0.12/2.1-07132012/ivys/ivy.xml
[warn] ==== sbt-plugin-releases: tried
[warn]   http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/play/sbt-plugin/scala_2.9.2/sbt_0.12/2.1-07132012/ivys/ivy.xml
[warn] ==== local: tried
[warn]   /Users/jay/sendhub/api/Play20/repository/local/play/sbt-plugin/scala_2.9.2/sbt_0.12/2.1-07132012/ivys/ivy.xml
[warn] ==== Typesafe repository: tried
[warn]   http://repo.typesafe.com/typesafe/releases/play/sbt-plugin_2.9.2_0.12/2.1-07132012/sbt-plugin-2.1-07132012.pom
[warn] ==== sbt-plugin-releases: tried
[warn]   http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases/play/sbt-plugin/scala_2.9.2/sbt_0.12/2.1-07132012/ivys/ivy.xml
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/play/sbt-plugin_2.9.2_0.12/2.1-07132012/sbt-plugin-2.1-07132012.pom
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	::          UNRESOLVED DEPENDENCIES         ::
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn] 	:: play#sbt-plugin;2.1-07132012: not found
[warn] 	::::::::::::::::::::::::::::::::::::::::::::::
[warn]
[warn] 	Note: Some unresolved dependencies have extra attributes.  Check that these dependencies exist with the requested attributes.
[warn] 		play:sbt-plugin:2.1-07132012 (sbtVersion=0.12, scalaVersion=2.9.2)
[warn]

Here is the content of the relevant project configuration files:

$ cat project/build.properties
sbt.version=0.11.3
$ cat project/plugins.sbt
// Comment to get more information during initialization
logLevel := Level.Warn

// The Typesafe repository
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"

// Use the Play sbt plugin for Play projects
addSbtPlugin("play" % "sbt-plugin" % "2.1-07132012")

I dug around and did a little googling to find a more up to date repository for sbt/play.sbt-plugin, and found what seems like a new typesafe repo at http://typesafe.artifactoryonline.com/typesafe/repo. I added it to my project/Build.scala:

    val main = PlayProject(appName, appVersion, appDependencies, mainLang = SCALA).settings(
        resolvers ++= Seq(
            "Sonatype Releases" at "http://oss.sonatype.org/content/repositories/releases",
            "JBoss Repository" at "http://repository.jboss.org/nexus/content/groups/public",
            "CodaHale Repository" at "http://repo.codahale.com",
            "Scala.sh Releases" at "http://scala.sh/repositories/releases",
            "Scala.sh Snapshots" at "http://scala.sh/repositories/snapshots",
            "Maven1" at "http://repo1.maven.org/maven2",
            "Typesafe Artifactory" at "http://typesafe.artifactoryonline.com/typesafe/repo"
        )

I also found a recent question on their discussion forum which helped me solve my problem, where Peter Hausel revealed that the new correct version of the play sbt plugin was “2.1-08072012″.

So I edied project/build.properties to contain:

sbt.version=0.12.0

Then edited probject/plugins.sbt to contain:

// Comment to get more information during initialization
logLevel := Level.Warn

// The Typesafe repository
resolvers += "Typesafe repository" at "http://repo.typesafe.com/typesafe/releases/"

// Use the Play sbt plugin for Play projects
addSbtPlugin("play" % "sbt-plugin" % "2.1-08072012")

After doing all this, I am back up and running. Sometimes JVM jar dependencies can be quite the adventure.

No tags Hide

Recently, I went a little too far with my usage of Scala’s syntactic (very sugary and sweet!) ability to allow:
SomeObject.someFunction(param)

to be written as:
SomeObject someFunction param

This is cool. However, it is also possible to do something which I have decided is difficult to read and understand:

SomeObject anotherFunction (param1, param2, param3)

Regretful as the situaton is, I wrote a quick line of sed to fix it in the affected files:

The first step was to identify which files had this ugliness:
jay@secretcode:~$ grep ' *[a-z0-9_\.]\+ \+[a-z0-9_]\+ \+(.*,.*) *$' app/* -r -n

Then it was a matter of formulating the regular expression transform to be evaluated by sed:
jay@secretcode:~$ sed -i.bak -e 's/\( *[a-z0-9_\.]\{1,\}\) \{1,\}\([a-z0-9_]\{1,\}\) \{1,\}\((.*,.*) *\)$/\1.\2\3/g' Perk.scala
jay@secretcode:~$ diff Perk.scala.bak Perk.scala
114c114
< val ch = ContentHelper apply (false, content.jsonData)
---
> val ch = ContentHelper.apply(false, content.jsonData)
137c137
< val hashtag = ch get ("hashtag", "html")
---
> val hashtag = ch.get("hashtag", "html")

NB: The above sed expression is compatible with both the OS-X and Linux versions of sed

Whew, catastrophe averted!

No tags Hide

The Advanced PHP Debugger (apd) PHP script profiler worked wonderfully once the module was built and installed. However, getting to that point was quite painful.

Compilation initially wasn’t working the latest package code in the apd PECL repository. Initially, I thought the compilation problem was Ubuntu-specific, but after some googling I found this article by the apparently extremely capable jjf, in which the author dives into great detail about the exercise of tracking down and fixing the compilation problems with this package. This saved me a GREAT deal of time, and in the interest of making it even easier to obtain a working package I created an automated build system tool to automatically apply the changes that were required to “make it work.” The utility is available as “apdBuilder.sh” in the git repository.

Here are the full sources: https://github.com/jaytaylor/apd

No tags Hide

HOWTO: Make IntelliJ + Scala run fast/smooth (i.e. without lockups) on Mac OS-X (This worked for me on both Leopard [10.6] and Lion [10.7]).

Technical Specs:
2011 15″ MBP w/ a normal 750G HDD and 8GB of RAM

Instructions:

Edit the Info.plist for IntelliJ:

vi /Applications/IntelliJ\ IDEA\ 10\ CE.app/Contents/Info.plist

Find the following line:

      <key>VMOptions.x86_64</key>

And then edit the line below to look like so:

      <string>-Xss2m -Xmn128m -Xms512m -Xmx2048m -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=64m -XX:+UseCompressedOops</string>

After I did this, the performance on the Scala plug-in became acceptable.

No tags Hide

Mar/11

13

K-Means calculations in Python

So, having used K-Means in PHP in the past, I expected that it would be similarly straightforward in Python. Simply install numpy, scipy, Pycluster, and *yikes* that didn’t work quite like I hoped. Quite a bit more complicated than what I needed. So I ported the trusty PHP implementation to Python. A great big thank you goes to Jose Fonseca for providing the original implementation.

Both kmeans.py and it’s dependency, ordereddict.py are available.

from math import ceil

from ordereddict import OrderedDict

"""
This code was originally created in PHP by Jose Fonseca
(josefonseca@blip.pt), and ported to Python by Jay Taylor
(jaytaylor.com/@jtaylor on Twitter).

Please feel free to use it in either commercial or non-comercial applications.
"""

def kmeans(data, k):
    """
    This def takes a array of integers and the number of clusters to create.:
    It returns a multidimensional array containing the original data organized
    in clusters.

    @param array data
    @param int k

    @return array
    """
    cPositions = assign_initial_positions(data, k)
    clusters = OrderedDict()
    while True:
        changes = kmeans_clustering(data, cPositions, clusters)
        if not changes:
            return kmeans_get_cluster_values(data, clusters)
        cPositions = kmeans_recalculate_cpositions(data, cPositions, clusters)

def kmeans_clustering(data, cPositions, clusters):
    """
    """
    nChanges = 0
    for dataKey, value in enumerate(data):#.items():
        minDistance = None
        cluster = None
        for k, position in cPositions.items():
            dist = distance(value, position)
            if None is minDistance or minDistance > dist:
                minDistance = dist
                cluster = k
        if not clusters.has_key(dataKey) or clusters[dataKey] != cluster:
            nChanges += 1
        clusters[dataKey] = cluster
    return nChanges

def kmeans_recalculate_cpositions(data, cPositions, clusters):
    kValues = kmeans_get_cluster_values(data, clusters)
    for k, position in cPositions.items():
        if not kValues.has_key(k):
            cPositions[k] = 0
        else:
            cPositions[k] = kmeans_avg(kValues[k])
        #cPositions[k] = empty(kValues[k]) ? 0 : kmeans_avg(kValues[k])
    return cPositions

def kmeans_get_cluster_values(data, clusters):
    values = OrderedDict()
    for dataKey, cluster in clusters.items():
        if not values.has_key(cluster):
            values[cluster] = []
        values[cluster].append(data[dataKey])
    return values

def kmeans_avg(values):
    n = len(values)
    total = sum(values)
    if n == 0:
        return 0
    else:
        return total / (n * 1.0)

def distance(v1, v2):
    """
    Calculates the distance (or similarity) between two values. The closer
    the return value is to ZERO, the more similar the two values are.

    @param int v1
    @param int v2

    @return int
    """
    return abs(v1-v2)

def assign_initial_positions(data, k):
    """
    Creates the initial positions for the given
    number of clusters and data.
    @param array data
    @param int k

    @return array
    """
    small = min(data)
    big = max(data)
    num = ceil((abs(big - small) * 1.0) / k)
    cPositions = OrderedDict()
    while k > 0:
        k -= 1
        cPositions[k] = small + num * k
    return cPositions

if __name__ == '__main__':
    print kmeans([1, 3, 2, 5, 6, 2, 3, 1, 30, 36, 45, 3, 15, 17], 3)
>python kmeans.py
OrderedDict({0: [1, 3, 2, 5, 6, 2, 3, 1, 3], 2: [30, 36, 45], 1: [15, 17]})

A simple port, after fixing a few spacing typos it worked right out of the gate. If you see any problems pretty please let me know!

Now if someone would just make kmeans++ into a Python module..that would be cool! Hmm..

No tags Hide

I found a great blog post on how to catch ctrl-c keyboard interrup signals within multi-threaded Python programs: http://www.regexprn.com/2010/05/killing-multithreaded-python-programs.html

#!/usr/bin/python

import os, sys, threading, time

class Worker(threading.Thread):
  def __init__(self):
    threading.Thread.__init__(self)
    # A flag to notify the thread that it should finish up and exit
    self.kill_received = False

  def run(self):
      while not self.kill_received:
          self.do_something()

  def do_something(self):
      [i*i for i in range(10000)]
      time.sleep(1)

def main(args):

    threads = []
    for i in range(10):
        t = Worker()
        threads.append(t)
        t.start()

    while len(threads) > 0:
        try:
            # Join all threads using a timeout so it doesn't block
            # Filter out threads which have been joined or are None
            threads = [t.join(1) for t in threads if t is not None and t.isAlive()]
        except KeyboardInterrupt:
            print "Ctrl-c received! Sending kill to threads..."
            for t in threads:
                t.kill_received = True

if __name__ == '__main__':
  main(sys.argv)

Worked like a charm.

No tags Hide

So recently I found myself needing to minimize the number of external resources in a webpage, and I ended up resorting to encoding each CSS image resource into base64 and then pasting it in by hand. It took considerable effort and focus to do by hand, and I never want to do it again that way. So I wrote a little python utility called python-inlinify-html to solve this kind of problem. I just made a repository on github for it and it’s good to go~

Example usage:

jay@macpro:~/python-inlinify-html (master)$ ./inlinify.py -d jaytaylor.com -i ~/error.html

Output snippet:


...


...
<img src="" alt="" />
...

It uses PyQuery to minimize the included CSS rules to those that exist within the document. Not too shabby.. ;)

No tags Hide

Older posts >>

Find it!