Sunnyface

Thursday, November 26, 2009

61 Free Desktop Applications, Webapps, and Tools We're Most Thankful For

Firefox (see also: Power User's Guide to Firefox 3, Top 10 Firefox 3.5 Features)
VLC (see also: Master Your Digital Media with VLC, VLC Hits 1.0 with Better Playback and File Support)
CCleaner (see also: Five Best Windows Maintenance Tools)
Dropbox (see also: Use Dropbox for More Than Just File Syncing, Sync Files and Folders Outside Your My Dropbox Folder)
7-Zip (see also: Five Best File Compression Tools)
OpenOffice.org (see also: OpenOffice.org 3.1's Usability Tweaks, OpenOffice.org Screenshots Preview a Ribbon-Like Toolbar)
Google Chrome (see also: The Power User's Guide to Google Chrome, 2009 Edition)
µTorrent (see also: Tweak uTorrent's Settings for Faster Downloads, Five Best BitTorrent Applications)
Notepad++ (see also: Five Best Text Editors, AutoSave Adds Reassurance to Notepad++ Editing)
Gmail (see also: Our full Gmail coverage)
GIMP (see also: Gimp 2.7 Beta Improves Text Editing, Streamlines Saving)
Paint.NET (see also: Paint.NET Releases Big Update, Still a Killer Photoshop Alternative, Paint.NET Plugin Lets You Open Photoshop Files)
Microsoft Security Essentials (see also: Microsoft Security Essentials Free Antivirus App Leaves Beta, Stop Paying for Windows Security; Microsoft's Security Tools Are Good Enough)
Revo Uninstaller (see also: Lifehacker Pack 2009: Our List of Essential Free Windows Downloads)
Evernote (see also: Evernote 3.5 Beta Brings Tons of Tiny Fixes to Windows, Expand Your Brain with Evernote)
Thunderbird (see also: Thunderbird 3 Release Candidate Available for Download)
Audacity (see also: Geek to Live: Make a ringtone from any MP3)
ImgBurn (see also: Turn Your PC into a DVD Ripping Monster, Five Best CD and DVD Burning Tools)
Picasa (see also: Picasa 3.5 Organizes Your Photos with Facial Recognition)
Skype (see also: Our full Skype coverage)
Pidgin (see also: Ten Must-Have Plug-ins to Power Up Pidgin, Five Best Instant Messengers)
Ubuntu (see also: First Look at Ubuntu 9.10 Karmic Koala, Dual-Boot Windows 7 and Ubuntu in Perfect Harmony)
iTunes (see also: iTunes 9 Improves Syncing, Network Sharing, More)
foobar2000 (see also: Screenshot Tour: The beautiful and varied world of foobar2000, Hack Attack: Roll your own killer audio player with foobar2000)
Foxit Reader (see also: Five Best PDF Readers, Lifehacker Pack 2009: Our List of Essential Free Windows Downloads)
FileZilla (see also: Five Best FTP Clients, Build a Home FTP Server with FileZilla)
VirtualBox (see also: The Beginner's Guide to Creating Virtual Machines with VirtualBox)
TrueCrypt (see also: Geek to Live: Encrypt your data, Five Best Portable Applications)
Avast! (see also: Five Best Antivirus Applications)
Defraggler (see also: Five Best Disk Defragmenters)
KeePass (see also: Eight Best KeePass Plug-Ins to Master Your Passwords, How to Use Dropbox as the Ultimate Password Syncer)
Opera (see also: Opera 10.10 with Unite Media Server Released)
AVG (see also: AVG 9 Free Now Available for Download)
Digsby (see also: Five Best Instant Messengers, Digsby Sees the Light, Removes (Some) Bundled Crapware)
Google Reader (see also: Our full Google Reader coverage)
Winamp (see also: Win7shell Adds Windows 7 Jump List Support to Winamp)
Google Earth (see also: Google Earth 5.1 Speeds Up Your World Browsing)
TeraCopy (see also: Five Best Alternative File Copiers)
Launchy (see also: Our full Launchy coverage)
Transmission (see also: Lifehacker Pack 2009: Our List of Essential Free Mac Downloads)
Eclipse IDE
SpyBot Search & Destroy (see also: Five Best Malware Removal Tools)
Adium (see also: Adium Updates with Security Fixes, Better Facebook Integration)
PuTTY (see also: Add Tabs to PuTTY with PuTTY Connection Manager)
Songbird (see also: Songbird 1.0 Release Official, Fixes Bugs, Plays iTunes Purchases, Killer Add-ons Make Songbird So Much Better)
Sumatra PDF (see also: Sumatra 1.0 is a Blazing Fast Replacement for Adobe Reader)
XBMC (see also: Build a Silent, Standalone XBMC Media Center On the Cheap, Customize XBMC with These Five Awesome Skins, Turbo Charge Your New XBMC Installation)
Blender (see also: Learn Blender with free e-book)
CDBurnerXP (see also: Five Best CD and DVD Burning Tools)
Everything (see also: Everything Finds Windows Files As You Type, Top 10 Tiny & Awesome Windows Utilities)
HandBrake (see also: HandBrake Updates to 0.9.4 with Over 1,000 Changes, 64-Bit Support)
Rainmeter (see also: Rainmeter 1.0 Brings the Enigma Desktop to Everyone)
AutoHotkey (see also: Turn Any Action into a Keyboard Shortcut, Hack Attack: Knock down repetitive email with AutoHotKey)
Google Calendar (see also: Our full Google Calendar coverage)
MediaMonkey (see also: MediaMonkey 3.2 Syncs with More Devices, Adds Auto Folder Watching)
Quicksilver (see also: A beginner's guide to Quicksilver)
WinSCP
Google Voice (see also: Make Unlimited Free Calls on Your Cellphone with Google Voice, How to Ease Your Transition to Google Voice)
Boxee (see also: Build a Cheap But Powerful Boxee Media Center, Boxee to Launch Beta with Loads of New Features)
AdBlock Plus (see also: Top 10 Must-Have Firefox Extensions, 2009 Edition)
Media Player Classic (see also: Five Best Video Players)

From http://huangry.spaces.live.com/blog/cns!11F27CD37F710403!2071.entry.

Wednesday, June 24, 2009

Another way of life

"Education gives you freedom," Jerry Buss says. Growing up poor in Kemmerer's coal mining region, Buss decided early on that life underground was not for him. "I realized that most of the kids who grew up in the mining camps stayed in those towns and worked in the mines. I didn't see myself doing that; for one thing, I didn't like the idea of being a couple of miles underground with all that stuff over my head. So, freedom became the most important thing in my life, and education became my way out."

As most people know, Jerry Buss is the owner of the Los Angeles Lakers and a past owner of the National Hockey League Kings. However, less people may know that he got his Ph.D in chemistry at the University of Southern California at age 24. I remember one day he was on an interview and was asked how much difference his Ph.D experience made for him to his future success. He said, well, it did not make too much difference on my career or did not mean that I am smarter either, but it did teach me how to handle loneliness, and that is very important.

I guess this is so TRUE when you decide to pursue a Ph.D. You should know this is a tough journey where you need to move ahead on your own. You make your own choice on every crossing, taking your own direction step by step, leaving your own persevere footprints behind. And then, one day, you will find yourself a proud and strong person inside. This is the type of personality that I should achieve, and this is also the type of attitude anyone who want success should possess.

Remember, sweet sugar will not lead you to success, but loneliness may.

Friday, July 18, 2008

What am I searching for?

Yesterday I went to a talk at the MSR in Redmond. A guy from Germany was talking about the "suggested search". His idea is pretty cool, but the implementation is sort of "lousy". I do remember Google has done something similar according to my own experience with the Google Tool Bar on my Firefox. It is a search tool that when you type your query, the search engine will automatically suggest the closest possible query by looking at the keywords you are typing into, which can "guide" you during your search process.

I like this guided search a lot, because most times I find myself not exactly know how to best describe what I want to search. For example, if I want to find a cheap airticket from New York to Shanghai, should I search for "cheap New York airtickets", or "cheap Shanghai airtickets", or more precisely, "cheap airticket from New York to Shanghai"? Unfortunately, most likely the last query will FAIL by experiences, since it seems to contain too many keywords. See, sometimes, not always the more the better :-(

When you search for something, search engines provide you potential pre-defined questions to help you better define your query and find out what you need. This seems to be a popular trend in future. There should be a pre-processing procedure to cluster different queries and then classify the upcoming ones into any of the possible categories. Then, these new queries can in turn help improve the clustering results before the next round. Mmm...sounds like "Active learning". This work is quite challenging, since it needs semantic level natural language analysis to better interpret the words' meaning, instead of just doing simple string matching (it seemed to me that the German guy did only string matching using some distance computation).

"Search, search, search~~" We are doing keyword search everyday.
However, before we rush into the search bar, should we think twice what exactly we are searching for? Or should we not?

Perhaps one day, others will know it better than ourselves.

(PS: Something from Google Official BLog.)

Technologies behind Google ranking
7/16/2008 10:53:00 AM

In my previous post, I introduced the philosophies behind Google ranking. As part of our effort to discuss search quality, I want to tell you more about the technologies behind our ranking. The core technology in our ranking system comes from the academic field of Information Retrieval (IR). The IR community has studied search for almost 50 years. It uses statistical signals of word salience, like word frequency, to rank pages. (See "Modern Information Retrieval: A Brief Overview" for a quick overview of IR technology.) IR gave us a solid foundation, and we have built a tremendous system on top using links, page structure, and many other such innovations.

Search in the last decade has moved from give me what I said to give me what I want. User expectations from search have rightly increased. We work hard to fulfill the expectations of each and every user, and to do that we need to better understand the pages, the queries, and our users. Over the last decade we have pushed the technologies for understanding these three components (of the search process) to completely new dimensions.

When we talk about queries at Google, we use square brackets [ ] to mark the beginning and end of queries (see "How to write queries" by Matt Cutts), a notation I will use throughout this post. (Pages and search results change frequently, so in time, some examples used here may not behave as explained.)

Understanding pages: Over years we have invested heavily in our crawl and indexing system. As a result we have a very large and very fresh index. In addition to size and freshness, we have improved our index in other ways. One of the key technologies we have developed to understand pages is associating important concepts to a page even when they are not obvious on the page. We find the official homepage for Sprovieri Gallery in London for the Italian query [galleria sprovieri londra], even though the official page does not have either London or Londra on it. In the U.S., a user searching for [cool tech pc vancouver, wa] finds the homepage www.cooltechpc.com even though the page does not mention anywhere that they are in Vancouver, WA. Other technologies we have developed include distinctions between important and less important words in the page and the freshness of the information on the page.

Understanding queries: It is critical that we understand what our users are looking for (beyond just the few words in their query). We have made several notable advances in this area including a best-in-class spelling suggestion system, an advanced synonyms system, and a very strong concept analysis system.

Most users have used our spelling suggestion system at one time or another. It knows that someone searching for [kofee annan] is really searching for Mr. Kofi Annan, and is prompted: Did you mean: kofi annan; whereas someone searching for [kofee beans] is actually looking for coffee beans. Doing this internationally with very high accuracy is hard, and we do it well.

Synonyms are the foundation of our query understanding work. This is one of the hardest problems we are solving at Google. Though sometimes obvious to humans, it is an unsolved problem in automatic language processing. As a user, I don't want to think too much about what words I should use in my queries. Often I don't even know what the right words are. This is where our synonyms system comes into action. Our synonyms system can do sophisticated query modifications, e.g., it knows that the word 'Dr' in the query [Dr Zhivago] stands for Doctor whereas in [Rodeo Dr] it means Drive. A user looking for [back bumper repair] gets results about rear bumper repair. For [Ramstein ab], we automatically look for Ramstein Air Base; for the query query [b&b ab] we search for Bed and Breakfasts in Alberta, Canada. We have developed this level of query understanding for almost one hundred different languages, which is what I am truly proud of.

Another technology we use in our ranking system is concept identification. Identifying critical concepts in the query allows us to return much more relevant results. For example, our algorithms understand that in the query [new york times square church] the user is looking for the well-known church in Times Square and not for articles from the New York Times. We don't just stop at identifying concepts; we further enhance the query with the right concepts when, for instance, someone looking for [PC and its impact on people] is in fact looking for impact of computers on society, or someone who searches for [rainforest instructional activities for vocabulary] is really looking for rain forest lesson plans. Our query analysis algorithms have many such state-of-the-art techniques built into them, and once again, we do this internationally in almost every language we serve.

Understanding users: Our work on interpreting user intent is aimed at returning results people really want, not just what they said in their query. This work starts with a world class localization system, and adds to it our advanced personalization technology, and several other great strides we have made in interpreting user intent, e.g. Universal Search.

Our clear focus on "best locally relevant results served globally" is reflected in our work on localization. The same query typed in multiple countries may deserve completely different results. A user looking for [bank] in the US should get American banks, whereas a user in the UK is either looking for the Bank Fashion line or for British financial institutions. The results for this query should return local financial institutions in other English speaking countries like Australia, Canada, New Zealand, South Africa. The fun really starts when this query is typed in non-English-speaking countries like Egypt, Israel, Japan, Russia, Saudi Arabia, Switzerland. Likewise the query [football] refers to entirely different sports in Australia, the UK, and the US. These examples mostly show how we get the localized version of the same concept correctly (financial institution, sport, etc.). However, the same query can mean entirely different things in different countries. For example, [Côte d'Or] is a geographic region in France - but it is a large chocolate manufacturer in neighboring French-speaking Belgium; and yes, we get that right too :-).

Personalization is another strong feature in our search system which tailors search results to individual users. Users who are logged-in while searching and have signed up for Web History get results that are more relevant for them than the general Google results. For example, someone who does a lot football-related searches might get more football related results for [giants], while other users might get results related to the baseball team. Similarly, if you tend to prefer results from a particular shopping site, you will be more likely to get results from that site when you search for products. Our evaluation shows that users who get personalized results find them to be more relevant than non-personalized results.

Another case of user intent can be observed for the query [chevrolet magnum]. Magnum is actually made by Dodge and not Chevrolet. So we present the results for Dodge Magnum with the prompt See results for: dodge magnum in our result set.

Our work on Universal Search is another example of how we interpret user intent to give them what they (sometimes) really want. Someone searching for [bangalore] not only gets the important web pages, they also get a map, a video showing street life, traffic, etc. in Bangalore -- watching this video I almost feel I am there :-) -- and at the time of writing there is relevant news and relevant blogs about Bangalore.

Finally let me briefly mention the latest advance we have made in search: Cross Language Information Retrieval (CLIR). CLIR allows users to first discover information that is not in their language, and then using Google's translation technology, we make this information accessible. I call this advance: give me what I want in any language. A user looking for Tony Blair's biography in Russia who types the query in Russian [Тони Блэр биография] is prompted at the bottom of our results to search the English web with:

Similarly a user searching for Disney movie songs in Egypt with the query [أغاني أفلام ديزني] is prompted to search the English web. We are very excited about CLIR as it truly brings us closer to our mission to organize the world's information and make it universally accessible and useful.

I could go on and on showing examples of state-of-the-art technology that we have developed to make our ranking system as good as it is, but the fact is that search is nowhere close to being a solved problem. Many queries still don't get satisfactory results from Google, and each such query is an opportunity to improve our ranking system. I am confident that with numerous techniques under development in our group, we will make large improvements to our ranking algorithms in the near future.

I hope my two posts about Google ranking have made it clear that we live and breathe search, and we are more passionate than ever about it. Our fervor for serving all our users worldwide is unprecedented. We pride ourselves in running a very good ranking system, and are working incredibly hard every day to make it even better.

Posted by Amit Singhal, Google Fellow.

Sunday, June 8, 2008

The SIGMOD Jim Gray Doctoral Dissertation Award

I was reading my Google Reader RSS today and found this:

"SIGMOD has established the annual SIGMOD Jim Gray Doctoral Dissertation Award to recognize excellent research by doctoral candidates in the database field. Until 2008, this award was known as the "SIGMOD Doctoral Dissertation Award." In 2008, SIGMOD, with the unanimous approval of ACM Council, decided to rename the award to honor Dr. Jim Gray. SIGMOD Jim Gray Doctoral Dissertation Award winners and runners-up will be recognized at the SIGMOD conference, and their dissertations will be included in the SIGMOD DiSC and on the SIGMOD Online web site. The award winner will also receive a plaque and present his or her work together with the winners of the SIGMOD Innovations and Test of Time awards."

This reminded me of this respectable person, Jim Gray, who is from Microsoft Research but has gone missing at the sea since early last year. People have still been looking for him, but there is no good news yet. That was really sad. What made me sadder was that not until today did I realize that he also helped in the development of Virtual Earth, which is an advanced online geomapping service to help us locate ourselves, and I am using it right now! Suddenly I feel like he was not a Turing winner far away in CA, but a person who was so close to me! Can't believe something so great in my life, but one of its inventors can no longer enjoy it with us.

I do not know if I have a chance to win this award since it is not exactly my field. But I am very encouaged that it is renamed under him. Because of his efforts, we will never get missing in future.

Saturday, May 31, 2008

A taste of teaching

Last week, it was our last class with Prof. Foster Provot for this semester. This is a PhD level seminar discussing all kinds of topics related to data mining and machine learning. As the only three registered PhD students in Stern, Xiaohan, Mihaela and I were "pushed" to give (bi-)weekly presentations and lead discussions for every paper on those topics.

Oh, god! That was hard! I couldn't understand this. When I sit down in the class and listen to the professors, they are all talking and smiling, making all kinds of jokes, writing gracefully and drawing nice pictures on the board. They are teaching as if doing something really really really easy. However, when I stood in front of the class, no matter how hard I had prepared, I felt nervous, awkward, and then suddenly forgot what I should say. My tones got wondered and my voice became frozen. My confidence was quickly fading out... In fact, I was pretty confident in my presentation skills because I already had some conference/workshop presentation experiences before. I always felt proud of my cool behavior in front of a group of people. But now the truth was that it did not work here! Teaching in class is totally different from giving a short 20-minute talk, at all! For this, I really admire Foster! He is such a sharp person and a great professor. He can always notice the key point in our thoughts and help us sort it out right away. Often times, his questions are actually helpful and informative "hints", which inspire us to think what we have neglected and then better organize our thoughts.

Prof. Anindya Ghose once told me that when you talk to people, you should try to make your point as clear as you can at the first time. Do not wait for people to find themselves confused and then ask you. I believe this is important, but it is not easy to achieve. Sometimes, when we explain something, we have a tendency to either describe it too much that makes the redundancy, or speak too little that leads to the ambiguity. (It seems that the distribution for the intensity of our explanatory words is "bimodal", either too high or too low.) I like Prof. Panos Ipeirotis's teaching, because his way is highly logic. You feel like you are led into a room, and then get to explore by yourself with encouragements time by time. He does not show the whole picture at one time, but leave to us ourselves to find it out. That is coolest part. You never know how big the picture is! Just like an adventure game!

I sometimes was imaging myself in the future, can I do this well when I become a real professor? Will my students enjoy my teaching too? Yea, I believe so! That is my goal and just keep going:-)

Tuesday, May 20, 2008

Data Mining Blogs: The Big List(ZZ from Sandro Saitta)

Sandro Saitta has a full list about the data mining blogs. Just something very nice that can be introduced here:)

Abbott Analytics: both industry and research oriented posts covering any topic related to data mining (Will Dwinnell and Dean Abbott)
Crime Analysis and Data Mining: everything is in the title (Shyam Varan Nath)
Data Miners Blog: data analysis and visualization from an industry point of view (Data Miners Team)
Data Mining, Analytics and Artificial Intelligence : this blog gives news about data mining and AI very frequently (Alberto Roldan)
Data Mining et al.: A new blog about data mining with details on particular applications in this field (Georg Russ)
Data Mining Lab : the blog of the data mining laboratory at Brigham Young University, mainly about social communities and meta-learning (Data Mining Lab)
Data Mining: Text Mining, Visualization and Social Media: a focus on data visualization and the blogosphere (Matthew Hurst)
Data Mining in MATLAB: posts related to the use and possibilities of Matlab for data mining related problems (Will Dwinnell)
DataSciences Analytics : discuss statistics and predictions among other interesting topics (John Aitchison)
Data Strategy: This new blog (started in June) discuss data strategy in general. Data acquisition, visualization and data mining are examples of topics (Chuck Lam)
Data Wrangling : comprehensive posts on technology and news related to data mining and machine learning. Also a lot of very useful resources (Pete Skomoroch)
Diamond Information and Analytics: analytics and its applications in marketing and operations (Amaresh Tripathy)
Foraging in the Data Forest : although not updated recently, this blog has interesting posts about data visualization and statistics (Donald Farmer)
Intelligent Machines: news related to data mining, machine learning and artificial intelligence (Damien François)
Jamie's Junk : a blog that focus on data mining using Microsoft SQL Server (Jamie Mac)
Juice Analytics: data analytics with an emphasis on data visualization and corresponding tools (Juice Team)
Machine Learning, etc: Theory behind machine learning and news related to this field (Yaroslav Bulatov)
Machine Learning (Theory): a strong emphasis on theoretical aspects of machine learning (John Langford)
Machine Learning Thoughts: philosophical and theoretical discussions about machine learning in general (Olivier Bousquet)
Math Stats and Data Mining : data mining with a point of view from statistics (Rachel Graham)
MineThatData: data mining from the marketing point of view (Kevin Hillstrom)
Oracle Data Mining and Analytics: A blog focusing on the use of Oracle for data mining. It covers news, code and applications related to Oracle (Marcos M. Campos)
Shane's Blog : a personal view on data mining with posts on different applications and news (Shane Butler)
Smart (Enough) Systems: data mining and analytics (among others) for decision management (James Taylor)
Undirected Grad : a machine learning blog from a PhD student at Cambridge (Jurgen Van Gael)
Yet Another Machine Learning Blog : more machine learning oriented but contains a lot of useful information (Pierre Dangauthier)

Sunday, May 18, 2008

Congretulations on my new space!

Cheers~