Seznam.cz Opens Part of Its Full Text Search Technology for the Next Year of StartupYard

10. December 2013

Seznam.cz continues to support StartupYard, the Prague based accelerator, for the third year running and has decided to further deepen its co-operation with StartupYard.

Seznam.cz has decided to open part of its full text search technology to the teams that apply for the 2014 round, to help them with their large data projects that need extracted data from the Internet.

Seznam.cz started as a one-man band and during the past 17 years has become a major Czech influential technological company and a media house in one, which is preventing Google from gaining the majority or monopoly on the Czech market. Seznam.cz has been sharing its experience and know-how with start-ups operating in the Czech Republic for the past few years. In line with this aim, Seznam.cz has also been working with Czech StartupYard. This began two years ago and for the third, coming year, Seznam.cz has decided to open part of its full text search technology to all teams with an interesting project that needs pre-crawled data from the Internet to start their business.

It is a challenging project for us. We have developed our own full text search technology which nowadays brings Seznam.cz about 30 % of its revenues of up to 1 billion CZK (40 million EUR). Nevertheless, we have decided to partly open our full text search to teams from all over Europe that have decided to apply for next year’s StartupYard. We have done so because we believe the data we can provide can help the teams to come up with very useful projects. We hope new projects, for example media monitoring, predictive marketing tools etc., will come to life next spring,” Pavel Zima, General Manager of Seznam.cz explains the reasons for partly opening the technology Seznam.cz uses for its full text search.

At StartupYard, we have a simple but ambitious goal for 2014; to attract the most promising European data, search or analytics projects and accelerate them towards becoming viable businesses. To achieve this, we rely on our specialized mentors and will fly founders to Prague to work in person with a wide range of experts and advisors, providing them with accommodation for 3 months. Seznam.cz is adding tremendous value to the package we offer to future teams and I can’t wait to see what projects our start ups will be able to create using this technology” adds Cedric Maloux, Managing Director of StartupYard.

Seznam.cz full text search technology is based on Hadoop and Hbase. The 2014 StartupYard teams will have access to a test cluster of up to 100 million documents from the Internet. All of them pre-crawled and sorted into entities such as domains, webservers and URLs. Each of these entities contains its own attributes for fast analysis and sorting of each web page in the cluster. Seznam.cz will not provide all the signals it uses for its search as that could harm the quality of its search engine and Seznam.cz cannot give its whole know-how to everyone.

We have made a basic analysis for each webpage in the cluster so the teams know its content as a derivate with many parameters such as the language used and meta-descriptions. We have data about all the back links and forward links and many other attributes about the web pages. All of the documents in the cluster are regularly updated and more parameters and content can be added if the teams need and request them,” Marek Nový, Head of Business Development at Seznam.cz adds to the description of the cluster Seznam.cz will provide to the 2014 StartupYard teams.

——————
Everyone with a project covering Data, Search or Analytics can apply to the 2014 StartupYard until January, 31st 2014. To find out more about StartupYard.com, please contact Cedric Maloux, Managing Director of StartupYard.

To find out more about the technology Seznam.cz will provide to the 2014 StartupYard teams, please see the introduction on the application page. To find out more about Seznam.cz, please contact Irena Zatloukalová, the spokesperson of Seznam.cz.

Irena Zatloukalová,
Spokesperson