Services

Data-capture

We help you to get and organize information from public data or different websites with scrapping.

Data-cleaning

We structure databases with information from multiple databases in multiple formats. Organization of information and standardization of variables.

Visualization apps

We create public data visualization applications so that your users can know and explore databases. We use the latest technologies in data visualization to communicate information.

Algorithms

We implement artificial intelligence algorithms to facilitate your work with data, from predictive algorithms to pattern recognition.

Web specials

We develop interactive web specials based on data. The specials have different visual components to guide your readers. See examples of our specials.

About us

Datasketch is a digital platform of investigative and data journalism. Our portal allows journalists, data scientists, social scientists and citizens in general to learn and consult on data visualizations, tools, software and in-depth research on various short-term issues. We have free data tools and different projects to bridge the gap between data and citizenship that facilitates the democratization of knowledge and a critical review of social realities based on information contrasts.

Our team

Juan Pablo Marín

Electronic engineer with a master's degree in computational statistics. Expert in data science with applications in multiple areas such as economics, hydrology and journalism.

Camila Achuri

Statistics and expert in R programming language. She has developed various applications of data visualization in mobility and open data subjects.

Juliana Galvis

Politologist and candidate for a Master in Digital Humanities. She is currently leading the development of the Who Is database, as well as supporting journalistic research and the creation of databases.

David Daza

Bachelor of Electronics. Expert in development of applications and websites with emphasis on data journalism and content management of multiple databases.

Verónica Toro

Anthropologist and researcher. Responsible for the management and organization of the data-community in Colombia and Latin America and provide support in journalistic investigations and the creation of databases.

Andrea Cervera

Journalist responsible for writing articles, provide investigative support and community manager.

Mariana Villamizar

Systems engineer and designer. Expert in user experience, data visualization and graphic communication. Feminist.

Contact

Open letter to the rstats community

September 22, 2019

Open source contributions may have a lot of impact, it can even help tackling difficult issues in a whole country. So thank you for your support, hours of work and dedication. We can do even better lets built tools that are easier to use for non-techies.

 

 

Bogota, September 2019

 

 

Dear #rstats community

 

 

You may not know the impact you have

 

There have been around 86.000 forced disappearances in the colombian conflict in the past 60 years. This map shows (red) where they went missing, and where they were found years laters (green). This map was the first time anyone saw the case they call “false positives” where the military was kidnapping poor young men from Bogota, driving them 500 kilometers north, killing them and making them pass as guerrilla members only to claim bounties.

 

 

This is an example of a very impactful use of your hours of open source contributing. With your support, thousands of families might finally have closure about their missing loved ones. All the data processed with the tools you created might be used in the implementation of the peace agreement with FARC.

 

There are no words for me to thank you enough for your work, everything you do is very important, not only for making people like myself work easier with your packages, but also for creating inclusive communities and even making a stand when it is needed to protect women and others. Thank you for being so welcoming, open and not judgemental.

 

I have been using R for about 10 years now, even though I have a technical background, before R, I hated programming. Were I told the impact one could have with easy to use data tools I’d probably picked them up earlier.

 

 

Imposter syndrome

 

With the amazing work you all do, many times I feel that I don’t do good enough things. That’s why I keep postponing the publication of my R packages: until they are "good enough". Yes, I’ve read your encouraging blog posts, I know every contribution counts, I know how welcoming the rstats community is. Still, I always have the feeling that my developments won’t be that useful for the community and for some reason I never get to polish them in a way that they are useful for others. That’s on me, and that’s changing right now. I commit myself to take the time to document my tools, contribute with the smallest thing and be more supportive to the community, not only because building software together is cool, but because now I have a few examples to show how impactful it may be and this may motivate others to jump in.

 

Here, you see two of my co-workers, they are out in the March 8 demonstration with an R built mosaic that show a bunch of numbers, facts and pictures about brave women who have fought for gender equality. The lady in the middle holding the mosaic is a congresswoman in Colombia who approached us because of the sign.

 




The first time we built such data mosaic was 2 years ago. When we did an intervention in a park in Bogota where a woman was brutally killed. It is because of this sad episode that we now have a femicide law in Colombia. Nevertheless, that year we were able to reconstruct her face with the numbers of hundreds of victims of fatal crimes against women in Colombia. In the intervention, we partnered with a local NGO to present 20.000 signatures to the Ministry of Interior, hoping to advocate for some policy changes to better protect women. We really liked this project, we went all the way through from helping a twitter citizen to build a database of femicides, to make a data-driven artistic interventions to promote citizen participation.

 

homenaje rosa elvira cely

 

 

 

 

Not enough

 

Even though these are, in my opinion, great projects, they had to be done with the help of highly trained data people. This leaves many small organizations out in the open. They usually have a hard time maintaining a web page (if they can at all), so it is very difficult for them to access all the superpowers good data science can bring.

 

This is why, over the past few years we have been building R based tools to optimize the whole process for non-tech users to access data and communicate it properly. There are amazing teams working on very important issues, and they could use our help.

Issues like corruption, gender equality, climate crisis, illegal mining, clean water, medicine pricing and accessibility are just a few of those issues that are worth tackling, and unfortunately many times it is very small NGO’s, local and independent newsrooms who go out to battle to create a better world for all of us, armed only with their knowledge and hard work. I truly believe we can arm them with our knowledge too, but we need to work harder to make simpler tools for them: for journalists, for lawyers, for social workers, for activists.

 

It is us, those who can speak computer, data and visualization, who can help them build a better world. I see a future where non-tech users solve their data needs with easy to use Data Apps that do very specific things, so they do not have to learn a monster tool with thousands of options or to program themselves when they only need a map (think of the map of forced disappearances from the beginning), access to public data, or a simple data cleaning routine. It is because of this that we launched a campaign for our team to concentrate a few months and finish and polish the packages for simple point and click data visualization using Shiny apps.

 

We cannot be more thankful to the community, your work has helped us achieve very nice things, but we want to do more and we need your support to do it.

 

Would you support our campaign? Please watch the 4 minute video and donate. You can get one of our data driven t-shirts. Plus, you get to enjoy our tidydance.

 

SUPPORT US NOW



Warm regards,

 

Juan Pablo Marín Díaz

@jpmarindiaz

 

 

Designs with data about mass-shootings in the US. Each dot/bullet represents a victim. Red are fatalities.

 

This one shows the temperature increase in the past 200 years.

 

Support our campaign and get your data-driven t-shirt now!

 

 

 

Special thanks to



All other contributors that developed tools in other languages (C, C++, Java, Javascript, Rust, etc) that have been wrapped to be used within R. Big hug to all of you.

 

*** Note *** this thanks will go forever, so I'll publish now and keep adding in the next few days...

 

Lucy D'Agostino McGowan @LucyStats and Maëlle Salmon @ma_salmon

For rladies mosaic, an inspiration for us to try other ways to make people interact with data

 

Yihui Xie @xieyihui 

For many things (knitr mostly) but more recently for Blogdown, we have been able to implement cool journalism projects with it. Here an app with custom pdf rendering on the fly, it is used to keep track of corruption cases in Colombia (http://monitorciudadano.co/datos/visor)

 

 

Hadley @hadleywickam

For ggplot, dplyr and the tidyverse in general, we couldn’t work without them and they are used in everything we do. In case you were wondering, we did include GATHER in our tidydance as a way to say good-bye. Thanks for the pivot_* !

 

 

RStudio @rstudio

For many things, for leaflet for instance, but more important for the IDE, I knew I could count on you since 2011 when I opened a ticket and could use browseURL straight from RStudio to see my first interactive viz in the browser, you solved it right away. And now I use RStudio everyday and cannot recommend it more.

 

Dean Attali @daattali

 

For shinyjs, we were able to create very cool interactives with in shiny, even when we didn’t know what we were doing with JavaScript.

 

Thomas Lin Pedersen @thomasp85

For gganimate. We were able to create cool animations to make people talk about important issues, like sexual abuse of minors and the lack of proper healthcare by the colombian government for those girls. More info here in spanish 




Mara Averick @dataandme

For making it fun and keeping us up to date and in contact with the community

 

Joshua Kunst @jbkunst

For highcharter. Due to its simplicity it quickly became our go to package for interactive charts.

 

Kent Russel @timelyportfolio

This whole thing actually started because of you and your building widgets site back in 2015. Big thanks.

 

Bob rudis @hrbrmstr 

For your htmlwidgets and your tips on security

 

Mauricio Vargas @pachamaltese 

For your motivation to make rstats inclusive and for d3plus, even though I haven’t delivered to make a proper collaboration

 

Leonardo Collado @fellgernon 

You created a map of Mexico with datamaps. It helped me a lot to undestand the power of going local with data science.

 

Julia Silge @juliasilge and David robinson @drob 

For tidytext. It allowed us to come up with better ways to communicate the Peace Agreement text with FARC at a time of political uncertainty. See it here in spanish

 

 

Gabriela de Queiroz @gdequeiroz 

For Rladies, helping me see where I could be better as a male data scientist and for creating a safe place for all by making data science more inclusive.

 

Joe cheng @jcheng 

For shiny. Couldn't live without it, I really liked back in 2012 the value of meteor. You brought it to life for me a simple R user with very limited web dev experience.

 

Winston Chang @winston_chang 

For webshot and extrafont, pretty handy for allowing personalization and improving exporting options.

 

Jenny Bryan @JennyBryan 

For purrr legos, they have changed the way I think about structuring programs and googlesheets, very useful for journalists.

 

Jeroen Ooms @opencpu 

For magick, we have used it a lot. From showing live election results, to communicating the issues with data cleaning using the britney spears effect. But also for jsonlite, although I always forget to unbox ;)

 

 



Danielle Navarro @djnavarro 

For helping us think about inclusion and animation of texts with some gone posts that were eye openers for me.

 

Timo Grossenbacher @timo 

For motivating us to build better R based tools for journalists.

 

Colin Fay @_ColinFay 

For neo4r, I use it a lot to make sense of linked data that is very relevant for journalistic investigations.

 

Gábor Csárdi @GaborCsardi 

You don't know how useful it has been for me. I've used it to analyze networks of: organized crime and recruitment patterns, to understand corruption and even networks of organizations that are working with data in multiple countries.

 

https://exploralat.am/mapa/ 

 

Ryan Hafen

For maintaining http://gallery.htmlwidgets.org/ A great source for inspiration

 

B. Thieurmel https://github.com/bthieurmel 

For vizNetwork, it quickly became our go to tool for network viz. Here a network of corruption cases in Colombia. http://www.monitorciudadano.co/datos/visor 

 

Ramnath Vaidyanathan https://github.com/ramnathv 

For htmlwidgets. Big fan since the good old rCharts days

 

 

If you made it this far, go ahead, donate to our campaign and we will give one of our data driven notebooks, make sure to remind us via email that you donated because of this post.







How the app works

 

The whole purpose of our campaign is to be able to finish our packages and polish them to create end user Data Apps, where they can easily upload, clean and visualize data with a point and click interface.

Here is a description of how it works. Download full PDF here (in spanish) or see description of the packages used below.




 

datafringe

It adds dictionaries to a dataframe that includes types for humans (more on this below).

 

deduplicate

An attempt to make data deduplication easier.

 

d3plus

An htmlwidget to make tidy visualizations with d3plus.js Needs to be updated to v2.

 

dmaps

An htmlwidget to build interactive maps based on datamaps.js It will be renamed at some point.

 

dsAppModules

Modules to make development of shiny apps easier.

 

geodata

R Package that has geodata ready to be plotted in different formats. The advantage is that it intends to have geographical codes for regions that are standard taking into account each countries common formats.

 

imteractive

Make interactive images using d3. Add tooltips and other interactive elements to static images in html.

 

geomagic

Tidy wrapper for making maps in ggplot.

 

mop

A package to clean data

 

pseudoviz

A package/gallery of visualization types. It has now some basic functions to make graph recommendations based on some input data. 

 

homodatum

I believe in the tidyverse and the power of defining column types. However, there are data types that are still thought and created for computers to understand and not for humans. Homodatum tries to solve this issue by introducing data types for humans. For instance, instead of strings like computers think, we think in categories or texts. Instead of int or double we think in numbers, instead of strings that start with $, we think of currencies.

 

ggmagic

Tidy visualization wrapper for ggplots. Is an attempt to make visualizations work out of the box for different column types. Warning: Might be considered a stupid package because it only wraps ggplot functions. It does help me make data questioning way faster.

 

hgchmagic

Tidy visualization wrapper for all possible highcharter functions. Same idea behind ggmagic.

paletero

 

Its goal is to make it easier to create palettes and map different data types to colors. Still not sure if it is a good idea.

 

neo4rutils

Wrapper utilities to be used along with neo4r package.

 

dsAppLayout

Shiny layout to create apps with panels. 

 

dsAppWidgets

Custom shiny inputs for handling data and options to personalize charts.

 

shi18ny

Introduces internationalization to shiny apps using some basic translation and custom translations for different languages using YAML files.




 

 

 

Juan Pablo Marín Díaz

Juan Pablo is a data scientist. His work in computational statistics has been applied in fields like macroeconomic analysis, hydrology and data journalism.