Data journalism & street reporting at The New York World

Which are the blocks in New York City that had the most New York Police Department stops in 2011? 

Columbia Journalism School’s The New York World has published this week a nice feature combining data journalism and street reporting that offers a new insight on this issue. 

Here you can read the piece by Alice Brennan: The NYPD’s hottest stop-and-frisk spots. And this is the interactive map, using CartoDB. It shows the number of incidents recorded for every block in NYC throughout 2011.

We emailed the team behind the project to gather info about their work and the use of CartoDB for data journalism. 

One of the goals was to show the analysis of the data alongside personal interviews of people living in hotspots or involved in any New York Police Department stop, says data journalist Michael Keller.

Reporter Alice Brennan and Keller worked together to clean the data. Brian Abelson, a graduate student at Columbia University’s applied statistics program Quantitative Methods for the Social Sciences, helped to identify the hotspots.

Keller also worked on the map code — at the bottom of the map there’s the full list of contributors, including Mike Sullivan, deputy editor at The New York World, who worked on the map and the interface.  

The Spanish elections map from CartoDB, available here, was used as a template for the mouseovers. To mashup the stop-and-frisk counts with the census blocks data, Keller did a ST_Intersects query. You can check here for the PostGIS documentation about this function. 

The data is fetched from CartoDB through the SQL API and displayed using CartoDB library for Leaflet

The resulting map shows 685,724 stops throughout the city and allows users to visually browse all the data. Hovering over each block reveals the number of stops that took place there. The top 50 hotspots are highlighted in pink with the top six hotspot blocks listed on the left by their general location. Clicking on each of the top six will show the number of arrests for that block along with the number of stops.

Personal stories and reporting give a new, compelling perspective of the data. We love the approach. Great work!

Get a full report of your OpenPaths location data

Last weekend we participated in the Science Hack Day. This 24-hour marathon event brings together “software developers, designers, scientists, web enthusiasts, educators, and anyone with a passion for making cool things”. It started in London back in 2007 and this time it was held in Chicago. 

All participants gathered at the Adler Planetarium and Astronomy Museum -an awesome place, totally inspiring. Participants spent the night hacking on the planetarium and they let us played with the most advanced planetarium theater in the world. 

We were part of a team working on an app called Geo Stats that allows users to upload their location data from OpenPaths and analyze it. By looking at where you have been the app detects how you travel and where you travel.

The app gives you a full report of your geo stats with some curious units, like the total distance you have traveled in “whales”, or the time you have lost due to the effect on general relativity. Additionally, it calculates the flights that you have taken and calculate the Carbon Footprint of those flights, very shocking by the way.

You can try it and check an example here: http://geostats.herokuapp.com/. The app uses CartoDB to store and analyze the data, of course ;)

An analysis of the bike-share program in NYC

If you live in NYC, you’ve probably heard about a new bike-share program that will come to life this summer. The program will be launched with around 420 stations and is scheduled to grow to some 10,000 bikes and 600 rental stations throughout the city by summer 2013. 

Last Friday, the Department of Transportation released a map with draft locations for Citi Bike stations based on suggestions collected through an online map. New Yorkers will be able to send their feedback and the stations map will be refined. 

The issue at stake is which are the best spots for the bike stations considering the goal of the program: To facilitate the “first and last mile” of local commutes and tourist trips to and from their destinations. 

Steven Romalewski, the director of the CUNY Graduate Center’s Mapping Service, used CartoDB to visualize the 413 bike-share stations spotted so far and check if the draft locations fit New Yorkers needs. He posted his results and the resulting map on his Spatiality blog

“Here’s what I found: most of the proposed bikeshare locations are relatively close to subway entrances, and even more are closer to bus stops. At least regarding the locations, the system seems right on track to meet its goals of facilitating New York’s commuter and tourist trips”, Steven concludes. 

Streetblog.org and The Ghotamist have also posted an interesting take on Steven’s analysis.

“Before we talk about the money though, let’s talk about the map!”, says The Ghotamist. 

CartoDB with a capital R

We love R users, and for a while now we have known that R users would probably love CartoDB. Getting the two together has taken some time and our commitment to that goal is far from over. But today, we are happy to show off our first attempt at helping more R users to analyze and share their geospatial data, our new R package for CartoDB.

As we were developing the CartoDB-R package, the reasons why it would be useful started to pile up, but we’ll do our best to highlight some of the best here and leave the rest for you to discover. 

PostGIS is for lovers!

Using your CartoDB account with this package means you get all the power of PostgreSQL/PostGIS without any installation. For many R users, that is a huge bonus. It also means that you can host and share your datasets remotely, which means you can pass URLs instead of huge files. Since the package has methods for passing queries to CartoDB, you can query only the portions of the data or the outputs of server based data analysis instead of the entire dataset at any one time. It can also just be really fun to use R for mapping!

Simplicity is our hero

We are doing our best to simplify the process of interacting with your data using R. Right now, it is already pretty close to dead simple. No need for SQL unless you want. In just a couple lines you can setup a new connection to CartoDB and download your data to a dataframe. That’s right! CartoDB outputs go directly into a dataframe (though you can override that) so you can use them in your R analysis immediately. 

After you are done cleaning, analyzing and transforming your data, use the CartoDB R package to write it back to your account. Not only does this give you a place to host your data, but it also gives you a way to then share maps of your results with collaborators, in blogs, or elsewhere.

Developers and dreamers

Anyone using CartoDB can give their users access to data directly in R! Because the R package can work as read-only, your users can now use your CartoDB SQL API to query and analyze data directly in R. For most projects, we think this will be a value added service they can immediately share with their users. If you are a project hoping to serve data to scientists, hopefully we just made your life, and theirs, a heck of a lot easier!

Conclusion 

There are still improvements/enhancements to come, but if you start using the CartoDB-R package let us know and shoot us any feature requests! We are going to work on moving it to CRAN soon, but see the GitHub readme for instructions on installing it from there. We hope this helps a lot more people do cool things with their data. Onward and upward!

Easier authentication and new visualizations

We’ve just released a boatload of performance fixes, bug fixes and new features into CartoDB, along with a few toys…

The performance and bug fixes mean you’ll have a much swifter dashboard experience (especially with large datasets), shorter waits when importing shapefiles, faster map tile generation and better speeds for your published maps.

We know (believe us, we know…) that OAuth is not the right choice for all applications. This is why we’re really happy to release simple key based authentication for our SQL-API. Simple API key based auth allows full SQL read/write access on public and private tables to CartoDB from the command line or browser, which is ultra handy when you are developing.

The API key auth is targeted at apps that use CartoDB as a geospatial backend, trusted internal applications, or for those of us who love writing simple, hacky scripts to generate a map for all.  You can find your Simple API key in “Your API Keys” inside CartoDB, or learn more about how you can use the SQL-API to build Geospatial applications here.

Onto the toys…. If you’ve seen our Carbon Calculator project you’ll know that we’re big fans of hexagon tessellation maps. Used in the correct situation you can create very easy to digest choropleths, with a nod towards heat mapping. We developed a new function to create this kind of maps with ease.

As a fun demo that actually has a little bit of use we used the new hexagon binning functionality to display ATM density within OpenStreetMap data (about 160,000 points). Check the map above. Techniques such as hexbinning can quickly give you an impression of data density when initially digging into a new dataset, or can draw out fresh insights from large datasets.

This is the query used to build the map:

WITH hgrid AS (SELECT CDB_HexagonGrid(ST_Expand(CDB_XYZ_Extent({x},{y},{z}),CDB_XYZ_Resolution({z}) * ({z}+1)),CDB_XYZ_Resolution({z}) * ({z}+1)) as cell) SELECT hgrid.cell as the_geom_webmercator, count(i.cartodb_id) as prop_count FROM hgrid, osm_atm i WHERE ST_Intersects(hgrid.cell, i.the_geom_webmercator) GROUP BY hgrid.cell

We’ll be going into more detail about how to get the most out of the hexbinning functions in the near future, but for those intrepid explorers, the hexbin SQL functions are now live in all CartoDBs!

Join our mailing list if you want more info on this or other CartoDB topics. The mailing list is also a great place to share your suggestions for improvements to CartoDB. 

Have fun!

Balloon Mapping at the NY office

This week we’ve been experimenting with balloon mapping. Sense Maker (the team behind the Air Quality Egg and other projects) donated helium for the EcoHackNYC, and there was some left. After the first tests, and thanks to the big help of Liz and Leaf from The Public Laboratory, we flied our own balloon over the office in NYC. 

These are a couple of images from the experiment, taken with a Canon camera attached to the balloon:

The balloon mapping kit is developed by The Public Laboratory and offers “a low cost, easy to use, and safe methods for making maps and aerial images”. The project started by mapping the BP oil spill in the Gulf of Mexico and it’s being mainly used for civic and environmental issues.

At EcoHack, The Public Laboratory worked on new sensors for Air Quality monitoring that transmits real-time data to the ground. We added them to our balloon and learned how to use the stabilizers and everything else needed.

We are going to do a new test also in NYC and we’ll show you the results. We’ll keep you updated on this exciting project.  

Meanwhile check some of the pictures in Flickr.

Visualizing endangered species trades at EcoHackNYC

We spent yersterday’s Earth Day at our offices in NYC trying to do balloon mapping, a nice way to end an intense EcoHackNYC weekend. It is the second time we co-organized this (un)conference. Last fall we gathered at NYU and this time we met at Parsons. 

The event started Friday with a series of 5-minute ignite talks. On Saturday, we divided in small groups and worked on solutions. Here is a sample of some of our favorite geospatial projects. 

Visualizing Species Trading

A five-team group, including three from Vizzuality, worked with a big and interesting dataset of 12 million endangered species trades, to create an interactive visualization called the Species Sphere. All data is fetched from CartoDB and visualized using d3.js. Amazing work done in just 8 hours!

Improving the creation of Community-Supported Agriculture (CSA)

Another group worked on a simple map interface to communicate local demand for community-supported agriculture (CSA). The project is built on Heroku for hosting, MapBox for map tiles, Leaflet for the mapping interface, and jQuery with CartoDB for all the analysis. 

Visualizing deforestation

Check too this map of global forest height showing deforestation in the Amazon, build at the EcoHack: 

We weren’t there alone

This time we also expanded the meeting beyond data and code and we teamed up with The Public Laboratory to add a hardware-hacking to the event. The Public Laboratory has developed a balloon mapping kit that enables you to collect your own aerial photos from up to 1000 ft. Using the open source MapKnitter web-based software, you can stitch the resulting images into a web-viewable map.  

The Air Quality Egg team brought also their DYI sensor-box for real-time air column monitoring. It was customized and added to the balloon kit to gain additional capabilities. The system contains sensors to read NO2, CO, temperature, humidity, compass (to calculate wind direction), wind speed, dust (particulate matter), VOC’s, altitude, as well as O3, and streams the data in real-time via XBee to Pachube.com.

It was a great, brilliant event. We are looking forward to the next one!

Congratulations Scene Near Me! filming locations in NYC

The Award Ceremony for the NYC BigApps3.0 competition was held yesterday. Ninety-six apps were submitted to this city-run contest that encourages developers to play with government data. There were 4 apps, that we know!, using CartoDB:

Taxono.my | Scene Near Me | NYhousing.info | NYC Datascape

These are all great examples of applications showcasing what can be do with public data, specially geospatial.

We are delighted to hear that Scene Near Me was awarded second place in the Popular Choice Award. 

Scene Near Me is a great example of how CartoDB can help you to store and query data in real time. Whenever you check in with Foursquare within a quarter mile of a NYC film location, Scene Near Me will shoot you a text message letting you know what movie was filmed near you. All data about filming locations, made public by NYC Open Data initiative, is stored in CartoDB and fetched in real time by the users.  

We’ve written about Taxono.my, “your taxi traveling companion”, an amazing app developed by Alastair Coote. It uses CartoDB to store and query street intersections for any address in the city -don’t forget to download it if you are in NYC or planning a trip to the city. 

The NYC BigApps3.0 is the largest open government initiative of its kind, and provides access to more than 700 city data sets. You can check the other 11 winners of the NYC BigApps3.0. We are very happy to have participated as partner APIs and have helped several of the projects. We are more excited than ever on working with the NY Tech Scene specially when working around Open Data. 

See you next year!

Comparing Fusion Tables to Open Source CartoDB

It is not a secret that one of the reasons we created CartoDB was the lack of alternatives for geospatial data visualizations. For several years we developed our own custom solutions to visualize large amounts of dynamic data, or to develop location aware applications. Many geospatial applications run on top of the fantastic PostgreSQL/PostGIS database, but most of the software built to use them on the web end up being slow or don’t access the full potential of PostGIS. Also, a lot of them are proprietary and include very tough terms of service. We wanted to create a software that would explode the possibilities of PostGIS while keeping it Open Source, scalable, customizable and leaving you the owner of your own data.

So, one of the most common questions we get on conferences is: How does CartoDB compare to Fusion Tables. Today, we want to write about the comparisons and show you some examples. We will try to be as fact based as we can, but if you find any questionable items please let us know.

How are Fusion Tables and CartoDB similar

Both projects allow users to upload geospatial data and then use that data to create visualizations. If the data changes in the table, the maps automatically update on the website where they have been embedded. This is really powerful. Additionally, both products allow you to programmatically access data via a set of APIs.

How are Fusion Tables and CartoDB different

There is a lot to be said based on different licenses and different terms of use. For example, Google reserves the right to include advertisement in your maps, and obviously it is not Open Source. Also, according to the license, Fusion Tables probably can only be used with Google Maps API (need confirmation), and therefore make it very hard to use together with OpenStreetMap.

UPDATE: This is not entirely correct. You can use Fusion Tables via KML or API queries without a restriction on it being displayed along with a Google Map. The FusionTablesLayer, which is part of the Google Maps API, then applies the terms of the API of course.

But we want to focus more on the limits of both projects to help people understand the differences from a technical point of view.

Amount of data limits

Fusion Tables

CartoDB

You can use the Maps API to add up to five Fusion Tables layers to a map, one of which can be styled with up to five styling rules.

No limitation on the amount of layers you display or the amount of styles you apply.

Only the first 100,000 rows of data in a table are mapped or included in query results.

No limit on the amount of rows to query on the SQL API or to display on maps.

500 vertices per tile limit.

No limit.

250 MB size limit.

On dedicated servers you can go as much as 500GB of data (soft limit). That does not mean we have the capacity to display on a map 500GB of data, biggest limitation is RAM, which on dedicated instances is 32GB.

Fusion Tables layers are made available as part of the Google Maps API. We’ve seen that the cost of using the Maps API can quickly grow out of hand for businesses.

CartoDB comes for various prices from our free Newbie server to our $300 fully dedicated instances.

The limitation on amount of data in Fusion Tables are described here. And this is how they compare to CartoDB.

Now, to see the implication of some of this limits we loaded the same dataset on Fusion Tables and CartoDB. A total of 100,851 points from our OldWeather project. These are within the limits of Fusion Tables, but because of the restrictions on the amount of vertices per tile, the data does not look the same.

On the left is Fusion Tables, on the Right CartoDB for the EXACT same dataset. You can see how Fusion Tables remove points at low zoom levels due to the 500 limit per tile.

Map customization limits

But this map in any case is wrong. You should not visualize on a map 100,000 points like this because the size of the marker is so big that does not allow you to really understand the data. In Fusion Tables you are very limited to how you want the marker to look like, it’s just a circle in different colors, like in the example. But in CartoDB, because we support a full styling language like Carto CSS, you can do many more things, like changing the opacity, using a symbol, or more important, changing how it looks at different zoom levels. Compare that previous map with this one with a custom style in CartoDB, again, the same dataset.

Now you can really see the trends on the data, and the more you zoom in, the bigger the dots become.

The next example is a dump of all roads from Belarus in OpenStreetMap. It is an 18MB shapefile downloaded from Geofabrik. You can see straight away how Fusion Tables does not show all features. Unlike the last case it would almost never make sense to use dots to represent the roads. Zoom out one zoom level and you will see how Fusion Tables automatically turns the roads into points.

Conclusion

We have only covered a small subset of the differences between Fusion Tables and CartoDB. But essentially: CartoDB is Open Source, does not impose data size limits, you can use full SQL from PostGIS, usage of Carto CSS to style your map, are the main reasons why we created CartoDB and that differentiate it from Fusion Tables.

If you want to comment on this, do it on twitter mentioning @cartodb.

CartoDB at the Open Government Partnership meeting

The Open Government Partnership (OGP) first annual conference started yesterday in Brasilia, Brasil. The meeting will welcome near 1,000 representatives from more than 60 countries to discuss the latest reforms, tools and innovations in the open government field.

The partnership has grown rapidly. Just last September, eight countries launched it to formalize their commitment to a more open, transparent use of information. Another 43 additional governments have joined the OGP in the last months. 

The Brasilia two-day event is the biggest conference related with Open Government ever, and it will be supported with the attendance of Dilma Rousseff and Hillary Clinton. You can check the agenda here.

Vizzuality will be presenting several projects, particularly the Open Goverment Experience Locator (image above), developed together with the OGP and the World Bank Institute. This tool allows the exploitation of different Open Government implementation experiences. The Experience Locator features initiatives from around the world, with special emphasis on presenting the insights from practitioners involved, links to implementing partners and related resources for further exploration. It has been entirely developed on CartoDB.

Ruth del Campo will be at the Innovation Village in the Open Aid Register stand (another CartoDB-based platform) showing demos of the OGP Locator tool, together with CartoDB. She will also be showcasing how CartoDB can help you create maps with your data fast and easy. This is especially interesting in the field of Open Government, where Open Data is crucial and there is so much to be visualized and geolocated.

So, if you are around in Brasilia feel free to pass by our booth at the Innovation Village and say hi or try to reach @ruthdelcampo at anytime. We would love to show you how Open Source CartoDB can enable you to do much more with much less.