Saturday, October 11, 2014

GIS with R

Initially I tried QGIS, thinking that a system using a GUI interface would be quicker than one where you have to code and spend time figuring out syntax and data structures. WRONG! After crashes and a frustrating attempt to create a heat map, I switched to using R. Yes, figuring out which libraries to use and understanding the syntax took time, but I found it was still faster than trying to use QGIS.

I overlaid the Google maps image of San Francisco with the Police Districts shapefile, then colored the districts based on the frequency of violent crime in the area. The piece that took the most time was putting the district names at the center of each district, because I couldn't figure out how to get the latitude and longitude of shapefile centers. Eventually StackOverflow came to my rescue: http://stackoverflow.com/questions/16462290/obtaining-latitude-and-longitude-with-from-spatial-objects-in-r


San Francisco PD Department Shapefile: https://data.sfgov.org/Public-Safety/SFPD-Districts-Zipped-Shapefile-Format-/8yyx-6uur

Here's the R Notebook: http://rpubs.com/LukasHalim/28754


I also did some non-GIS analysis, to whether crimes are reported more at certain times, which crimes are reported most frequencly, and whether certain Police Districts have a higher proportion of unresolved incidents.

Crime Frequency

Crimes by Hour



Monday, September 15, 2014

Frugal Grocery Shopping in West Hartford

I'm trying to cut our grocery bill - August came it at over $600 for two adults, and that doesn't even include several restaurant trips. This Friday I visited AppleTree and Aldi and came home with the following haul:

AppleTree

ItemUnit CostCost
Packham Pear.99/lb3.4
Black Plum.99/lb1.71
Green Squash.99/lb3.86
Nectarines.99/lb3.69
Loose Beets.79/lb3.17
Yam.99/lb2.72
Lettuce1.99/lb1.19
Peppers1.59/lb6.03
Green Cabbage.99/lb2.32
Fuji Apples.99/lb4.18
Carrots.69/lb1.08
Romain Lettuce1.79/lb1.2
Total34.55



Aldi

ItemPrice Per UnitQuantityPrice
Corned Beef Hash1.5911.59
Frozen Peas0.9521.9
Avocados0.8943.56
33.9 oz Coffee5.4915.49
Whole Chicken5.415.4
Flowers3.9913.99
Eggs1.2933.87
Gallon Whole Milk2.8912.89
Subtotal28.69
Total with Tax29.04

Thursday, August 8, 2013

Preparing for the Base SAS Certification Exam

I tried studying for the SAS exam using Anki decks, but I found that this wasn't a very effective way of learning the syntax.  So, I tried a different strategy: going through the SAS Certification book and writing code snippets related to each concept.  The certification test questions frequently want you to spot errors, so I have created snippets to illustrate frequent mistakes as well as correct code.  When possible, I used the datalines feature to make very small data sets (a few rows are usually enough to illustrate the concept).

The code is available on GitHub: https://github.com/lukashalim/SASCert

Tuesday, July 16, 2013

Amazon Free Tier expiring? Too cheap for EC2? Need root? Head to Low End Box.

I didn't want to start paying \$12 a month for and EC2 micro instance, so I did some googling and found a link through Low End Box to a company called Reliable Hosting Services offering a virtual server for $3/month. Obviously it doesn't have the same feature set as EC2, but it's good enough for me for now.

Similar offers are available on http://www.lowendbox.com/.

Saturday, July 13, 2013

Different Uses of SUM in SAS

There are at least four different types of SUM I have run across in SAS:

  • In PROC PRINT
    • The SUM statement, which causes PROC PRINT to display totals for the specified variable
  • In the DATA step
    • The sum statement, which adds a specified expression to an accumulator variable.  The value of the accumulator variable is retained through each iteration of the data step.  The sum statement treats missing values as 0.
    • The + operator.  This adds two numeric values, but if either value is missing the expression will evaluate to missing.
    • The SUM function - this taks the form SUM(val1val2, ...), returning the sum of the values, treating missing values as zero.
In PROC PRINT, SUM variable-name will give you the total value, for example:

PROC PRINT DATA=work.grocery;
  var item;
  sum price;
run;

Obs Item Price
1 Milk 3
2 Tofu 4
3 Bread 5
12

On the other hand, in a DATA step, you have the sum statement variable+expression.  This initializes the value of variable to 0 and increment by expression in each iteration of the DATA step.

The SUM function and + operator have the same results, except in the case of a missing value.
MyVar = SUM(Var1,Var2); 
MyVar = Var1 + Var2,
 Will produce the same result, unless Var1 or Var2 is a missing value.  If, for example, Var2 is missing, then the SUM function will return the value of VAR1, while the + operator will return the value missing.  This is particularly confusing because the sum statement also uses the + symbol, but the sum statement treats missing values as zeros.


Wednesday, July 10, 2013

Study Strategy for the Base SAS Certification Exam


  • Take the through the end-of-chapter quizzes in the SAS® Certification Prep Guide: Base Programming for SAS® 9, Third Edition
  • When to you get a problem wrong, use the explanation or (when necessary) the material in the chapter to create an Anki flashcard.
    • Often, creating a cloze deletion flashcards works well.  For example, the explanation, "Librefs must be 1 to 8 characters long, must begin with a letter or underscore, and can contain only letters, numerals, or underscores. After you assign a libref, you specify it as the first level in the two-level name for a SAS file." can be come "Librefs must be [...] to [...] characters long, must begin with a [...] or [...], and can contain only [...], [...], or [...]."  Doing this will create 7 flashcards, one with each deleted phrase.
    • Other times, basic flashcards work well.  For example, chapter 3 in the certification guide lists a number of common SAS coding mistakes and there associated symptoms.  Each symptom can be the front of a flashcard, and each symptom can be the back of a flashcard.
  • My tendency when coding is to write the first thing that comes into my head, knowing that the syntax will probably be a bit off and that I'll have to fix a few errors before the code will run.  The SAS certification exam requires that I be able to read some code and figure out which snippets have syntax errors, which will run normally, and which will run but have logical errors that will cause incorrect output.  It's a hassle, but if you want to become really proficient with a language I think it's good to have practice looking at lines of code and detecting which will error and which will run properly.

Thursday, June 20, 2013

Avoid Online Distractions

Based on the information here, I decided to add StayFocused to Chrome.  You tell StayFocused which websites you want to limit, and set a maximum time allowed on those websites.  Definitely helpful for limiting the time I spend on facebook & feedly.  One really nice feature: if you wish to extend your time, you get an irritating task to complete - for example, retyping a long paragraph character for character.  This puts up a sufficient barrier to prevent you from accessing content when you don't need to, but allows you to log in if when you really need to - for example, if you received an important message on facebook.
http://www.wikihow.com/Avoid-Distractions-Online#Tips