Yahoo! Query Language (YQL) provides a rich, dynamic method for obtaining and manipulating data from any source or API on the internet. With YQL, the internet becomes your database. By coupling the data backend of YQL with the extensive data visualization and flow techniques of JavaScript (through libraries such as YUI), a developer can build powerful widgets and data systems using the simplified SQL syntax on which YQL is based.

The marriage of YQL and JavaScript brings a robust MVC interface to the browser. This article guides you through a highly scalable YQL/JavaScript use case.
Contents
YQL overview
YQL is a utility that’s akin to a massive open database system. Whether it’s for pulling data from wrapped APIs, feeds such as XML or CSV, or scraping results from an HTML page, YQL provides a simplified SQL-like syntax for accomplishing nearly any data-capture task — all without the need for the overhead of a database system.
Just to put this into context, a simple YQL statement will follow the standard SQL syntax of SELECT something FROM table WHERE field=value such as the one displayed in Listing 1.
SELECT * FROM flickr.photos.search WHERE text='san francisco'
Listing 1 – Simple YQL Query
There are many openly available data sources in the YQL library for developers to use. Yahoo! has opened up dozens of tables linking to its own data sources, such as Flickr, Geo, Maps, and Search.
On top of this, the YQL team has opened up the platform with “Open Data Tables”. Developers can use a standard XML syntax to wrap APIs or data sources for any data needs they may have. The tables can then be used in your query with a simple call to USE 'table url' AS table name, or a developer can issue a pull request to the YQL github account to have their table added to the community table listing for everyone to use.
When a new API is being used as a data source for a project, much of the time is spent massaging the data that you obtain into a format that fits that needs of the project, and then dealing with the processing overhead on all requests within a server environment. When you’re creating your own open data table, you can embed server-side JavaScript to manipulate the data results from an API before they are ever returned from the Yahoo! servers. Those results are processed within the massive Yahoo! data pipe, so the overhead is never felt by the end user.
Going a step further, YQL not only has data-wrapping capabilities, but also includes the ability to use INSERT, UPDATE and DELETE functionality on any service that has the ability to process those requests.
The concept of YQL is a simple one: provide a highly scalable, responsive system for obtaining and manipulating any available data using a version of a well-known language standard. Using this concept, we can apply any number of design patterns and visualizations to the raw data source provided by YQL.
A functional Design Model use case
I was trained in the traditional computer science software engineering realm, but have been working with development projects in the front-end field for over 13 years. That said, there are numerous techniques that I appreciate from software engineering which I normally don’t see applied when developing JavaScript or front-end projects. Specifically, the idea of a Model View Controller (MVC) design pattern is one that I find I fall back on quite a bit in my software engineering. An MVC design pattern will basically just outline three core separations in the development of software. Our Model is the raw data source (e.g., YQL or a database), the View is the visualization (e.g., YUI charts, HTML/CSS, or Flash) of said data for end-user interaction, and the Controller is the data processor and event handler (e.g., YUI GET/Connection Manager, YUI Events).
Now, why is this important, and what does it have to do with YQL? Well, oddly enough, the use of YQL as a data source (or Model) makes it beautifully equipped for layering on Controller and View patterns through a multitude of JavaScript libraries available, such as YUI, JQuery, Dojo, etc. Not only does the coupling of a JavaScript library with YQL give you an incredibly powerful set of features at your command, but developing with a design pattern such as MVC makes any project highly scalable, makes the code easier to update, and lets numerous developers work on the same module within their own development silos. Browser MVC is a wonderfully simple and approachable pattern that I believe lends itself perfectly to YQL parsing and consumption.
A practical use example
We’ve gone through the “what” and “why’, so let’s dive into the “how” of this merger. What I’m going to run through is the process of how to build a purely HTML- and JavaScript-based generic component to pull data from YQL, parse it, and then display it on a page. I’ll use YUI for the data pipe and a custom string filter with some CSS for the data format.
The full code base for this widget can be found on GitHub.
The Markup
To initialize and display the YQL query and display structure, the widget uses a few configuration variables for everything from displaying debug information to the HTML formatting of results (see following listing).
Listing 2 – JavaScript YQL Widget Markup
Let’s take a closer look into this configuration snippet and run through what everything is doing here. The first thing we need to do is attach the JavaScript widget core file through a script source include. This will attach the yqlWidget object for all of the AJAX and parsing functionality.
The configuration script block contains the full feature customizations needed to set up this example. We start off by setting up a few initialization settings — this is really just for error logging to the firebug console if something is failing within the system. The formatstring is the next piece of the puzzle, which outlines how each element returned by the YQL query will be rendered. For each result that is returned by YQL, this format will be applied to it. Every item of text in the format variable that is wrapped in curly brackets (e.g. {content}) directly maps to a piece of data returned by the YQL. When the data is parsed, these references will be replaced with their real results.
This approach to the format variable in this snippet is a very simplified example – ideally, you would want to separate the view even further into its own file that is then stored into the variable. The actual YQL statement that we will be making is the next step – we’ll go into this in full detail in the next section. The insertEl variable stores the id of the DOM node that we will insert our rendered HTML into. Finally, we call the push() method to load the request to render this widget onto an array stack, and we then call the render() method to load all widgets on the stack.
The Query
Now that we’ve gone through the markup structure for the widget, let’s take a closer look into the YQL sub-select portion of the query itself within Listing 3.
(SELECT match.place.centroid.latitude, match.place.centroid.longitude FROM geo.placemaker
WHERE documentURL = 'http://www.cnn.com/' AND documentType='text/html' AND appid='')
Listing 3 – Widget YQL Query: The Sub-Select
The inner query denoted in Listing 3 defines the core data obtained by YQL. For this query we are capturing the latitude and longitude of all geographical locations obtained by scraping the data within an external URL (in this case, http://www.cnn.com) using the Yahoo! Placemaker table. In the background, the Placemaker table is parsing through the text on the homepage of CNN to find geographically significant words (such as “Sunnyvale, California”). Once these locations are obtained, they are compared against our geo database to convert the real text name into geographically significant data, such as latitude and longitude. After running this, we are left with an array of latitudes and longitudes for all obtained geo locations, to which we can apply the next piece of the puzzle (Listing 4).
SELECT * FROM maps.map WHERE (latitude, longitude) IN
Listing 4 – Widget YQL Query: The map lookup
The second part of the query is a simple static map image lookup using the Yahoo! Maps table. What we are doing here is selecting the map data contents that have a latitude and longitude within the results returned by the inner query. The IN clause denotes a sub-select join. In the case of this maps table, the main data returned is “content”, which houses a png map image.
The Public Widget methods
The yqlWidget object contains all parsing and AJAX methods needed to take the configuration variables defined in the markup and
build them into visible results. yqlWidget utilizes closures to encapsulate the core methods; in said closures, we control which methods we want to hook into the object by returning them (Listing 5).
Listing 5 – Widget Public Methods
This list of methods will act as our looking glass into the parsing and connection features of the widget.
The push() method is going to do as the name implies; when called with the standard configuration settings, push will create an anonymous function call to init, which will begin the loading of the widget. This anonymous function is then pushed on the end of the array load stack, which follows standard FIFO (First In First Out) load order.
Much like the push method, render() will begin the widget loading process by trying to pop off one of the anonymous functions pushed to the stack in the push method. If one is available, it will immediately execute the function.
The init() method is in place to get the load process started. Init will be executed from the functions popped off of the stack in the render method. This will just verify that all data is set and then begin the widget initialization process.
The last of the public methods, getYQLDataCallback(), is the callback function that will be executed when our YQL request
completes. One of the wonderful pieces of functionality built into YQL is that you can not only return data as XML or JSON, but also as JSONP and JSONP-X. This allows a user to wrap JSON / XML output in a callback function that is executed when the data is returned.
Let’s say you’re running a cross-domain access script that captures data from an external source through a server side proxy; if you use YQL, a JavaScript callback that you have defined on your page will be called once all of that data capturing is complete. That is exactly what this function is — a YQL callback that checks to make sure that data results are returned and then passes those results on to our parsing method.
The Private Widget methods
When the public methods are called, they will have references in memory to where the private methods are located. Outlined below is the flow that the widget functionality takes, from the point of the public init() method, where we begin rendering the widgets that are pushed on to the load stack. Before I do that though, there are some private variables that we will need for this process which are displayed in Listing 6.
Listing 6 – Widget Private Variables
Besides the obvious configuration storage variables, there are two noteworthy items here. yqlPublicQueryURL is the URL string to which we will make our public YQL query. This will eventually get chained with the actual query, response format, and callback. The other item is regex, which has a specific purpose. That regex is used to replace the text between curly brackets with the resulting YQL data that was defined within our format variable in the markup section.
In addition to these variables, we need a few YUI libraries for cross-domain requests (Listing 7).
<script type='text/javascript' src='http://yui.yahooapis.com/2.7.0/build/yahoo/yahoo-min.js'></ script>
<script type='text/javascript' src='http://yui.yahooapis.com/2.7.0/build/get/get-min.js'></script>
Listing 7 – Widget Script Source Includes
With that said, the first method that will be called by init is getYQLData() (Listing 8), which accepts the YQL query from our configuration step as its single argument.
Listing 8 – Widget Private Method: Get YQL Data
This function is really the meat of our controller within the widget. First, we concatenate all of the configuration bits of the equation with the public YQL query URL. Since you can use env files to denote a series of open data table references, we then check if you would like to use your own set of tables within an env file; otherwise, the standard and community tables will be used for the query. If you do, it’s appended to the YQL public URL query. Lastly, we use the nice and easy YUI GET utility to make a cross-domain request to the URL.
There are two event methods attached to that GET request in order to handle a success or failure event. Since we’re using a callback within the YQL request (JSONP) that callback method will be called once the YQL request returns. Because the parsing is already handled, these status methods are just being used to output debugging messages as displayed in Listing 9.
Listing 9 – Widget Private Method: YUI GET Status Handlers
The data has returned from YQL, and our JSONP callback has been called, pushing the widget construction process to our next step — the parser. The purpose of the parser (illustrated in Listing 10) is to run through all results returned by YQL (the results variable passed to the method) and begin building up the data by applying the YQL results directly to the format HTML we created in the markup step.
Listing 10 – Widget Private Method: Parse YQL Results
There are several stages to note in the parseYQLResults() method. We start by capturing the first object node — not an ideal situation, but the result set returned by YQL follow a specific format. Once we have the root node for our results, we can then begin looping through all of the results returned by YQL. YQL returns an object for a single result, or an array for multiple, so in any event we’ll call parseFormat() on every result that is returned to us.
The purpose of parseFormat() (Listing 11) is to take each result returned by YQL and mash it together with the HTML formatting defined in the markup step.
Listing 11 – Widget Private Method: Parse Format
The code for this is short and sweet; we use our resultFormat variable, and the curly bracket-finding regex statement, to replace any content within curly brackets with the evaluated JSON YQL result. This mashed up string is then returned, where it’s added to the overall HTML results.
Once we have our full HTML string created, we insert the content into the DOM node that we defined in the markup step. After this, we return to the beginning of the load process (the render method) to check for the next widget on the stack, repeating this cycle all over again.
Since the purpose of this query was to capture map images, the content that is inserted into the page consists of a series of maps (Figure 1), each one representing a precise latitude/longitude location that was obtained by the YQL query by searching the content of the page which we defined with the URL in the markup stage.
Conclusion
The power behind YQL is truly in the never-ending amount of data that can be obtained from any source available on the internet, both public and private. On its own, the data resources are quite impressive, but when you take that resource a step further and couple it with extensive visualization and event management utilities (like the ones found in YUI ), you gain highly scalable and dynamic components in JavaScript, with very little effort.
[ Editor's Note: An earlier version of this article was published on JSMag in August 2009. ]
Jonathan LeBlanc (@jcleblanc)
Senior Software Engineer / Technology Evangelist, Yahoo! Developer Network