Microsoft has jumped into the free, self-service data analysis space with Power BI.
Power BI offers basic data wrangling capabilities similar to Excel's Power Query. It also lets you create interactive visualizations, reports and dashboards with a few clicks or drag-and-drops; type natural-language questions about your data on a dashboard; and handle files that are too large for Excel.
It can work with dozens of data types -- not only Excel, Access and CSV files, but also Salesforce, Google Analytics, MailChimp, GitHub, QuickBooks Online and dozens of others. And, it will run R scripts -- meaning that any data you can pull in and massage via R you can import into Power BI.
A guide to Power BI
In this article, I've put together a step-by-step guide to starting with Power BI, along with a number of other resources to help you along:
- Data files
- Power BI Desktop vs. cloud service
- Personal vs. Enterprise gateways
- 12 important Power BI visualization concepts
- Data cleaning via scripts
What is Power BI exactly?
Power BI includes both a downloadable desktop program and a cloud service, each of which offers different but overlapping capabilities. Data wrangling is desktop-only; visualizations and reports can be created in either; dashboards and report sharing are cloud-only. In addition, there are mobile apps for iOS, Android and Windows that let you view your Power BI or SQL Server Reporting Services (SSRS) reports and dashboards.
At least for now, you can take advantage of most Power BI capabilities without paying -- although Microsoft is clearly betting that you'll like the basic cloud service enough to spring for a $9.99/month paid account. Chief benefits of the paid account are increased data storage (10GB vs. 1GB), more timely automated data refreshes, the ability to create enterprise "content packs" and higher streaming capacity.
Be advised, though, that Microsoft wants a business email address when you sign up for Power BI cloud service -- while it can't screen out all non-commercial addresses, it won't accept known free consumer addresses like Gmail.com. Accounts from .gov and .mil addresses aren't supported for direct sign-up at powerbi.com either, although addresses at .edu and .org are.
And if you'd like to use any of Power BI's free mobile apps, you'll need a Power BI cloud account or access to your organization's SQL Server.
On the other hand, Power BI Desktop (at least for now) is not only free but doesn't require an account, an email address or a credit card -- just a Windows PC.
If you'd like to learn how to use this new, still-evolving tool to create reports and dashboards, read on.
Importing data into Power BI
Power BI Desktop is the better place to begin, unless you're sure that your data is already in the format you need for visualization. (Which may be the case if, like me, you prefer to do your data wrangling with a scripting language like R or Python.)
If you're used to Excel, you might think that selecting File > Open is the way to start analyzing your data in Power BI. But you'd be wrong -- File > Open is only for an already existing Power BI project.
Instead, to import new data, click the Get Data button on the Home tab, choose your data source type and click Connect.
This will bring up a familiar Windows file-selection dialog. Choose your file and you'll see a preview of your data. If it looks okay and there's nothing more you want to do to the data before starting to graph and chart, hit Load. Otherwise, click Edit, which brings up the Power BI Query Editor.
In this article, I'm going to use monthly files of airline flight-delay information from last summer that I downloaded from the Federal Aviation Administration (FAA) website. I know -- especially where airlines are concerned, past performance is no guarantee of future results. But if you're going to book a flight this summer, it might be fun (if not necessarily predictive) to answer questions such as: Which airlines had the best and worst delays last summer? Are there any specific flights that do especially poorly or well? These Power BI charts can help you easily answer these questions.
If you want to follow along, you can download your own data files from the Department of Transportation website. Or if you want, you can download the same files I'm using here -- the file download is available to all members of the Computerworld Insider program; registration is free, so if you're not already an Insider, it's easy enough to sign up. Files include data for domestic flights in the U.S. by month (so if you want to check flights to Paris, this won't help). There are separate files for June, July, August and September.
Start by loading in the June file (2015_06_ONTIME.csv): Go to Get Data > CSV in Power BI. Select and open your file, and you'll see a preview of your data. Then click Edit (not Load) to bring up the Query Editor. Now we can do some data wrangling.
One thing that can be useful to check at this point is whether number columns are loading in as numeric (aligned to the right) or text (aligned to the left). In other words, if you see numbers that are flush left in your data preview, they're not importing correctly -- which is one reason to choose Edit and bring up the Query Editor window even if you don't think you need to make changes in your data's structure.
Once in the Query Editor, you can right-click on a column header and select "Change Type" in order to manually select a data type such as whole number, decimal number, date, date/time, etc. But there's plenty more we can do with this data besides checking column types.
Note: If you're not interested in data wrangling and want to get started with charts and graphs, load the summer15delays.csv file instead and skip ahead to the Easy visualizations section -- but do make sure that the flight number is changed from numeric to text when you import the file.
[Continues on next page]