Jay Taylor's notes

back to listing index

A Git query language | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: query git version-control sql news.ycombinator.com
Clipped on: 2016-12-14

Image (Asset 1/2) alt= Hacker News new | threads | comments | show | ask | jobs | submit jaytaylor (2517) | logout
A Git query language (github.com)
168 points by bryanrasmussen 5 hours ago | unvote | flag | hide | past | web | 34 comments | favorite

Image (Asset 2/2) alt=

This might benefit from SQLite's Virtual tables: https://sqlite.org/vtab.html

With Virtual Tables you can expose any data source as a SQLite table -- then you can use every SQL feature that sqlite offers. You can just tell sqlite how to iterate through your data with a few functions, with an option to push down filtering information for efficiency.

You can also create your own aggregates, functions etc.

Here's an article where the author exposes redis as a table within sqlite: http://charlesleifer.com/blog/extending-sqlite-with-python/

My thoughts went straight to PostgreSQL Foreign Data Wrappers. Something like that would be really helpful!

Mercurial has a somewhat similar concept predating this (added circa 2010): revision sets (https://www.selenic.com/mercurial/hg.1.html#revsets) (for selection, and templates for selection but git has that built-in, kind-of, via log --format)

Mercurial's are completely general, though. Any Mercurial command that can accept a revision as an argument can also accept a revset expression. And templating isn't just for log, but for many other commands, such as grep or annotate (blame), and it's the same templating language for all of them. I also find hg templates a bit easier to read, because they're Djangoish/Jinjaish instead of being printf-ish like git's. Plus, you can save and compose Mercurial templates and revsets.

I was actually hoping that gitql had finally gotten inspiration from Mercurial and git would grow a general purpose query language, but it's read-only. :-(

Revsets are a wonderful feature, and it's something I wish git had. Just being able to say I want to see what has changed between this branch head and its latest common ancestor with trunk is an incredibly simple and useful thing to be able to do.

You can do that! "master..experiment" means "all commits reachable by experiment that aren’t reachable by master" https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection

Complete guess based on [1], but wouldn't

  git diff HEAD $(git merge-base HEAD master)

[1]: https://stackoverflow.com/questions/1549146/find-common-ance...

In this case git can do the same thing, but notice you can only do it because git provides a special command for getting that revision. Recasts are really general (a greatest common ancestor function is provided, but can be easily synthesised from more primitive building blocks) and can be used everywhere, so you can bisect over changes you made that touched files matching a pattern, or whatever. They aren't something I use every day, but they are really useful on occasions and allow for some pretty robust tooling to be written.

True all the way, but that may be a bit too much to chew on for people whose mind is already blown by gitql.

This is pretty cool. Looks like it's local to the current repo which makes sense for most usage. Having something like this across a swathe of repos would be useful in different ways (ex: "What has Bob committed over all the repos for our projects that involves the string 'billing'?".

Minor off topic rant about the animated example: Who doesn't put a space at the end of their prompt after the $?! Ugh!

Seconded on a multirepo tool for that.

I have a big "projects" folder with lots of different repos. I'd like to know all commits I've made on the past month, across all projects.

Currently I have a post-commit hook which sends the one line shortlog to a common file, but would be nicer to have a tool for ad-hoc queries.

Git map would work for many cases


  Git map ql ...

Nice. I can see a need for this as a lot of my projects are structured like that (multiple sibling repos). Running 'git map log --grep ...' seems particularly useful.

Quick eyeballing of the source, it does not handle whitespace in directory names properly. The for loop would treat them as separate, invalid, entries.

Of course I'd also fire someone on the spot that commits a project directory with whitespace in it...

Something seemingly related is Myrepos [1], for doing stuff like "update all these repos". Probably more capable, and complicated to use.

It's also "vcs agnostic".

[1]: https://myrepos.branchable.com/

> Who doesn't put a space at the end of their prompt after the $?! Ugh!

I've always had mixed feelings about that. Resolved it now, by using '>', no space, which doesn't look cluttered since there's only ~1px right next to the first char entered.

I agree. Since we work with microservices, we have +10 repo per project. Would be nice to be able to scan through all of them.

Plug: if you ever find yourself wanting to merge everything painlessly: https://github.com/unravelin/tomono :)

accompanying blog post: https://syslog.ravelin.com/multi-to-mono-repository-c81d004d...

You could submodule them all in an otherwise empty parent repo.

Bit of a hack, but could come in handy for other things (off the top of my head: "welcome to the team! Clone this one thing, it contains everything you need.").

This one actually seems very promising.

    A Git query language (github.com)
    10 points by bryanrasmussen 1 hour ago
This ought to have (2014) in the title: Latest commit 49c1c17 on 22 Jun 2014.

That makes it even more interesting, I mean the fact that this or something similar didn't get traction. I often have an idea of what I would like to know about my repo, but don't want to start hacking the answer together.

https://github.com/gitql/gitql (last commit 12 days ago)

Not the same project (look at the number of commits, and the top issue).

sorry, didn't note the date, I just found it because I was needing something like it and was ready to start making it because I figured - better look if someone else did the work for me first.

It would be nice to see a plugin for presto

Imagine if we could just have this automatically for every program that generated text output. It doesn't seem beyond the realms of possibility that every tool could either a) structure its text output in a way that can guarantee simple command-piping to a general purpose query-language processing tool or b) in the presence of a "--output-json" flag, produce json which can then easily be queried.

Sounds like you'd like the object-based Powershell.

Or you could have a single address space (https://en.wikipedia.org/wiki/Single_address_space_operating...), and share objects directly.

Sounds like a security nightmare.

But is it really? Couldn't there be a way to isolate things at the system level?

support for "SELECT DISTINCT" would be great !

But why does it have to look like SQL (and not like xpath or jquery)?

Not many people enjoy writing SQL statements on the command line. It's verbose, the order of things is arbitrary...

That's a great idea!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact