Jay Taylor's notes

back to listing index

Pybind11 – Seamless operability between C++11 and Python | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: python c++11 news.ycombinator.com
Clipped on: 2016-06-11

Image (Asset 1/2) alt= Hacker News new | threads | comments | show | ask | jobs | submit jaytaylor (1899) | logout
Image (Asset 2/2) alt=
Pybind11 – Seamless operability between C++11 and Python (github.com)
121 points by jtravs 181 days ago | past | web | 39 comments

If you like to program in C++ and Python, the Nim language might appeal to you: C++-like features (generic types, operator overloading, function overloading, inline functions, optional O-O, optional data hiding, C pointers, bitwise-compatibility with C, your choice of manual memory management or GC) and C++-like run-time speed, combined with Python-like syntax and compile-time Lisp-like macros.

I spent most of a decade mastering pre-11 C++, learning to apply the recommended tricks & idioms (like the copy-swap idiom for strong exception safety guarantees), learning to sidestep the gotchas. Meyers books, Sutter books, GotW, "Modern C++", etc. Then, when C++11 came out, the language became even more complicated, not less. That was a breaking point for me.

As a long-time programmer (and fan) of both C++ and Python, Nim offers the best balance that I've yet found between C++'s ethos of thoughtful, precise control and Python's user-friendliness.

(And if you ever happen to need seamless Nim-Python compatibility, including native Nim support for Numpy arrays with C++-style iterators, my Pymod project may be of assistance: https://github.com/jboy/nim-pymod )


If only it didn't have some straight up bizarro choices, e.g. with regards to case sensitivity


Nim has made 4 syntax choices that might seem bizarre to a C++ programmer: 1. blocks by indentation instead of braces (aka "Off-Side Rule" syntax [0]); 2. Uniform Function Call Syntax [1][2]; 3. case-insensitivity for identifiers; and 4. dropping empty parens from a function call.

[0] https://en.wikipedia.org/wiki/Off-side_rule , [1] https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax , [2] http://nim-lang.org/docs/manual.html#procedures-method-call-...

Choice #2, Uniform Function Call Syntax (UFCS), allows you to write `a.someFunc(b)` or `someFunc(a, b)` interchangeably.

Choice #3 is case-insensitivity for identifiers: You can write `a.to_lower()`, `a.toLower()` or `a.tolower()` interchangeably.

Choice #4 is the ability to drop empty parens from the end of a function call: `a.len()` can be written as `a.len`. Combined with UFCS, this allows you to write `len(a)`, `a.len()` or `a.len` interchangeably.

I'd already been programming Python for years, so I wasn't surprised by choice #1 -- in fact, I was pleased. After initially being highly skeptical of Python's OSR syntax when I first encountered it, I've since come around completely. The most common complaint against Python's OSR syntax is that it allows both tabs & spaces, interchangeably, which has bitten just about everyone who has ever used Python in a team. Nim avoids this problem by allowing only spaces to be used for indentation, not tabs: http://nim-lang.org/docs/manual.html#lexical-analysis-indent...

My eyebrows certainly went up about #2, #3 and #4 when I first encountered them. But you know what? Much like Python's OSR syntax, I've now come around completely. Now I actually prefer #2, #3 and #4 the way Nim does them. When I'm back in Python, C++ or C, I wish they behaved the same way as Nim!

Think about it: How many stylistic debates have there been about whether an operation in C++ should be a function or a method? How many times have you pondered whether an object attribute should be a member or a method? It's just a distracting detail with no benefit. And now you don't need to care! Apparently Bjarne Stroustrup is a convert to UFCS too: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n417...

Nim is, above all, a pragmatic language. I think that in another 5 or so years, Nim's syntax choices #2, #3 and #4 will seem just as sensible as #1 seems to Python programmers today.


>Nim has made 4 syntax choices that might seem bizarre to a C++ programmer: 1. blocks by indentation instead of braces (aka "Off-Side Rule" syntax [0]); 2. Uniform Function Call Syntax [1][2]; 3. case-insensitivity for identifiers; and 4. dropping empty parens from a function call.

Most of those things are either totally accepted in the PL community (like "Uniform Function Call Syntax") or common and matter of taste, but nothing bad in themselves ("block indentation, paren-less function calls").

Case-insensitivity for identifiers WITH the underscore thing, though, is just plain bad, error prone, hampers auto-completion and IDE intelligence, and is plainly bat-shit crazy. Not to mention opening the stage for tons of bike shedding and holy style wars...

While there were languages with case-insensitivity in the past (especially pre-80s) nobody thinks that it was such a great idea in the first place, and pretty much everybody rejoiced when such features were abandoned.


See, this is why I compare it to Python's OSR syntax.

People react with these strong responses ("plainly bat-shit crazy"). People focus on inconveniences due to the limitations of limited tools ("I don't want to press spacebar FOUR TIMES at the start of EVERY line of code!"). People come up with elaborate worst-case hypotheticals ("What if you want to share your code with someone using a pastebin service that removes leading whitespace?") that just don't happen in practice.

In practice, it's just all upside and no downside. Now I don't need to remember whether it's `openHTTPConnection()`, `openHTTPconnection()`, `open_http_connection()`, `openHttpConnection()`, etc. If I can say it, I know how to type it.

Any ambiguous overloads (same name, same parameter types -- which again, really doesn't occur by accident in practice) will be reported & resolved at compile-time. There's no more mystery in this than there is in any function overloading scenario.

And in practice, it seems to cause the opposite of holy wars: People realize how pointless all those identifier case-wars are in the first place.

There's really not much more that I can say. "In my experience, there's no downside to this feature, only upside."


Long time Python dev here so choice #1 is a no-brainer and #2 looks reasonable too but I'm not so sure about the other two. Regarding #4, `obj.method` has already a different meaning than `obj.method()` in Python, so how does Nim express it?

As for #3, I kinda see the motivation in statically typed languages for integrating independent libraries and frameworks with different naming conventions. In a dynamically typed language though refactoring is already bad enough, this would make it even harder. My "refactoring tool" of choice is grep (or rather ag, https://github.com/ggreer/the_silver_searcher), with choice #3 you can't just grep for an identifier anymore.


In general in Nim (see footnote for fine print), `obj.someFunc` == `obj.someFunc()` == `someFunc(obj)` [0]. It's not a method bound to an object, as in `obj.method` in Python. So whenever there is some `obj.` in front of the function name, that's an argument being passed to a function call.

[0] http://nim-lang.org/docs/manual.html#procedures-method-call-...

You can pass functions as first-class values (which Nim calls "procedural types" [1]) by supplying the function name without any arguments or parentheses, eg, `someFunc` on its own.

[1] http://nim-lang.org/docs/manual.html#types-procedural-type

The base case occurs when the function takes no parameters: In this case, `someFunc` is a procedural type; `someFunc()` is a function invocation.

Footnote for interested readers: closures [2]; setter properties [3]; multi-methods that use dynamic dispatch [4].

[2] http://nim-lang.org/docs/manual.html#procedures-closures , [3] http://nim-lang.org/docs/manual.html#procedures-properties , [4] http://nim-lang.org/docs/manual.html#multi-methods


>Choice #2, Uniform Function Call Syntax (UFCS), allows you to write `a.someFunc(b)` or `someFunc(a, b)` interchangeably.

Interesting idea.

The topic of function call syntax reminded me of something:

I'm not a language creator, but years ago, when I was fairly new to programming, I had this idea that it would be cool and maybe useful for a language to have function call syntax something like this, to more closely resemble English sentences:


or just:


where ball and bat are arguments. (Not too good an example, I know, because in a cricket or baseball game, the ball and bat would not vary at all, or not vary often, unlike arguments tend to do, on different calls. But I think people will get the idea.) IOW, instead of having all the arguments grouped together inside one set of parentheses, make it more like an imperative English sentence, by having multiple parentheses, one per argument, separated (if needed) by other words (which are part of the function name).

The above example would be for a language where types of function arguments are not specified. For a language where they are, it could be like this:

hit_(ball: Ball)_with_(bat: Bat)

Of course, I've left out the syntax for something like "def" (as in Python) to say that this is a function definition and for the return value/type, if any.

Don't know what the complexity of implementing something like that would be.


Smalltalk, Objective-C and Swift (from the languages I know) have a syntax similar to that. Only it doesn't need parenthesis, so it's more like:

[hit: ball with: bat]

The method is called 'hit:with:'.

Thought they don't go overboard with the "like english" thing, they mostly use the extra keys for disambiguating arguments. E.g.:

[fileWrapper writeToFile: path atomically: YES updateFilenames: YES];


Didn't know that. Cool, thanks for mentioning.


> Choice #3 is case-insensitivity for identifiers: You can write `a.to_lower()`, `a.toLower()` or `a.tolower()` interchangeably.

This looks like an enjoyable source for weird bugs: Write to functions with similar names (e.g., to_me() and tome()) and be surprised when nim doesn't distinguish between these two.


If your functions take any parameters, Nim will use the parameter types to distinguish between them, just like C++ does when you overload functions.

OTOH, if your hypothetical is that programmer X writes `to_me()` while a different programmer Y writes `tome()`, and the two functions just happen to be identical in all parameter types... well, that can already happen anyway where two programmers each independently write a function with the same name.

Nim has a simple, clear specification of how identifiers will be compared: http://nim-lang.org/docs/strutils.html#normalize,string

There's no secret magic happening.


It's doubtful that "convert it to lower case" (as documented for the function you pointed to) actually means anything in this day and age, what with Unicode and all...

Though, looking at the code for "normalize" it appears to only support ASCII. That's even worse since now "lowercase" doesn't even apply uniformly if your identifiers have non-ASCII characters in them[0].

Doing a general normalization (as in Unicode) would probably also be bad, but for different reasons: You probably don't want the validity of programs to depend on the current locale of the machine you're compiling on. (See e.g. the "Turkish I" problem.)

In short: Case-insensitive identifiers[1] are a terrible, terrible idea.

[0] Not generally a good idea, but it happens and there are legitimate cases for it.

[1] Or, rather, "doing anything non-trivial to identifiers before lookup", I suppose.


> This looks like an enjoyable source for weird bugs

It's meant to prevent them: in Python applications it's not uncommon to see bugs introduced by an incorrect completion, e.g. updatePlayerstatus / updatePlayerStatus / update_player_status

With case/underscore insensitivity you know in advance that there can be only one "updateplayerstatus" in your codebase and write it according to your style, e.g. always update_player_status


... and underscores.


Looks neat, the tooling could use a little work though.

http://docs.cython.org/src/quickstart/build.html shows a few ways this could be integrated to be a little more... seamless.


Using Cython for simple C++ binding is pretty great, esp. if one takes care to use as much STL data structures at the API level as possible.


I can't even begin to describe how timely this project is to me. I'm working on a research project that had chosen C++ over Python for performance reasons. We started with boost and later decided to ditch it because, well, it's boost.

Later when we decided to provide a Python binding, I find myself in the same place all over again -- we might have to use boost for the binding. Luckily I haven't written anything yet and spotted this library on HN :)


Similar situation here but a bit too late for us :] We want C++ with Python on top, and evantually able to run on some RTOS. CPython didn't quite cut it in how easy it was to get running on small platforms (if it sees a windows compiler it assumes there's a registry etc, go figure) but luckily we found http://micropython.org/ which has enough of the Python 3.4 syntax yet is small, well written and C99. But because it's rather 'young' there are no binding solutions so we quickly had to come up with one ourselves. (quickly because at the time we still had to decide whether to really go with MicroPython or not) Result is here https://github.com/stinos/micropython-wrap and the similarities with the way better written PyBind11 are still striking so I guess we didn't do that bad of a job. Though we're definitely going to look into how PyBind11 solved keyword arguments (never got into that).


Similar situation here. Working on a research project that uses C++ for MPI/RDMA/performance, but we would like to provide bindings in a more user friendly language


Why not Swig2?


You could expose the user to a c-style header?


As an experiment in doing something interesting with C++11, I attempted a "literal translation" of an algorithm for the medcouple that I initially prototyped in Python. I was very happy to see how remarkably easy this was:



Most Python idioms have an almost literal C++ translation. Even Python sugar like a,b = v can be done with tie(a,b) = move(v). C++11 is an almost pleasant language. If only it could completely break backwards compatibility and lose all of the ugly parts... I suppose this is what D is.


The rationale described there for pybind sums up my experiences with boost. Now that much of the best parts of boost are available in C++11, boost ends up being more trouble than it is worth. In general, I think it is best avoided if you're writing software that you expect to still need supporting in a decade. And you can always borrow ideas from it - we have our own lexical_cast like template for example.

I also find from past experience that seamless interoperability is not always a good thing because it obscures the interface boundaries. It can make it hard to replace one side of the interface and can lead to programmers not familiar with the full design of the software doing things inefficiently. I once worked on a huge project using Corba and it wasn't always easy to know which calls were remote.


Yes, anytime something makes it from a third-party library into the language, it's not worth the hassle of a third-party dependency.

Of course, this "hassle" is off-loaded onto the language, ie all C++11 users having to learn it.


Very interesting. With numpy I haven't felt the need to C-ify anything in a while, but making the process easier is always a plus (or plusplus). Does anyone know how this compares with CFFI or Cython? Is there a hello pybind11?




C++ support in Cython is still quite incomplete, especially when it comes to C++11 features.


This looks very interesting. My friend has been telling me how C++ had improved recently in terms of "pleasantness" to write, and now calling it from Python makes it all the better: https://github.com/wjakob/pybind11/blob/master/docs/basics.r...

This bridge will be great for numerically-intensive projects like implementing ML algos. I wonder how easy it would be to interop with scikit-learn data structures from the C++ side.


Very interesting. We use Boost.Python to provide Python binding for Bond [0]. It works well but having something lighter weight would is great. One big issue with Boost.Python is long compilation times. Does your library improve on that?

[0] https://microsoft.github.io/bond/manual/bond_py.html


The benchmark page of the pybind11 documentation [0] compares build times and generated library size with Boost.Python.

[0] http://pybind11.readthedocs.org/en/latest/benchmark.html


Very cool, I'm a big fan of Wenzel Jakob's other major project, Mitsuba. Was this developed to support Python bindings with Mitsuba?


Yes, that's the plan! (eventually...) :)


Could Ruby have something thing like this too or is MRIs C api just too limited? Rice is the closest thing I know of and it was pretty gnarly to use.


Maintainer of Rice here (https://github.com/jasonroelofs/rice). The problem set of trying to seamlessly integrate C++ and dynamic languages is not a simple or trivial thing to accomplish. I know Rice isn't particularly easy to use but it does have a lot of similarities to the OP's library and Boost.Python API wise.

I'm curious what you found gnarly or difficult to use, maybe there are some improvements that can be made?

Also as far as I know, Rice is the only library of its kind in the Ruby space (ignoring SWIG, of course).


The MRI C extension API is non-existent compared to Lua or CPython.




no pypy?


Does boost-python now support pypy?


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact