I do academic research in entrepreneurship
As a Researcher in Entrepreneurship, I analyze startups to figure out the drivers of their success. In all my research activities I apply a strong data-driven approach, using data mining, machine learning, deep learning, and natural language processing techniques. Specifically, I am active in the following research streams:
Data-driven venture capital
Developing machine learning models to support equity investors in their decision-making process.
Startups success factors
Analyzing business activities of startup companies using natural language processing techniques.
Generative AI for innovation
Exploring the Potential of Large Language Models in entrepreneurship and innovation research.
Here you can find some of my publications, others are already on the way
Ferrati, F., Kim, P. H., & Muffatto, M. (2024). Generative AI in Entrepreneurship Research: Principles and Practical Guidance for Intelligence Augmentation. Foundations and Trends® in Entrepreneurship, 20(3), 245-383. http://dx.doi.org/10.1561/0300000121
Abstract. This article investigates the integration of generative artificial intelligence (AI) into the academic research process of entrepreneurship. Specifically, we explore using Large Language Models (LLMs) like ChatGPT in several research scenarios to support novice and established researchers.
As a practical guide, we introduce researchers to prompt engineering – formulating instructions for the LLMs to generate a desired output. We classify different types of prompts, present various technical strategies, and suggest the design of an effective prompt formula. We illustrate the prompt engineering process with different examples for entrepreneurship research.
To assist researchers in systematically integrating LLMs into their research process, we present the ‘‘4D-Framework,’’ which consists of four phases (Discover, Develop, Discuss, and Deliver). Each phase contains four functions accomplished through four prompts, resulting in 16 functions and 64 specific prompts. The initial stage, “Discover,” involves using LLMs for project initiation tasks such as topic selection and literature review, theory exploration, conceptual or empirical puzzles, and research question identification. During the ‘‘Develop’’ phase, the focus shifts to operational aspects, where LLMs assist in designing methods, executing qualitative and quantitative research, and generating programming code. The third phase, ‘‘Discuss,’’ focuses on using LLMs to analyze findings, evaluate their robustness and limitations, highlight the research contribution, and identify future research directions. Finally, the ‘‘Deliver’’ phase emphasizes using LLMs to draft the manuscript, craft the narrative, prepare for submission, and disseminate the findings.
We describe the application of LLMs in entrepreneurship research from a human-centric perspective, emphasizing an Intelligence Augmentation (IA) perspective for harmonizing human intelligence with AI capabilities. Given the novelty and impact of LLMs in knowledge-based areas, we also address the ethical implications of using AI in academia. We urge scholars to incorporate AI and LLMs into their research responsibly. While showcasing their potential, we also address their current limitations. We empower scholars to adopt a dynamic, AI-enhanced research approach that emphasizes the potential to unlock new insights and enhance the integrity of academic research.
Ferrati, F., Kim, P. H., & Muffatto, M. (2023). “Patterns of Successful Founding Team Composition and Funding Outcomes”. 18th European Conference on Innovation and Entrepreneurship (ECIE), Porto.
Abstract. When it comes to assessing a startup’s chance of success, equity investors apply a specific set of criteria to minimize risk. In their decision-making process, most venture capitalists (VCs) agree with giving priority to the team composition, hence the popular saying: “Always consider investing in a grade-A team with a grade-B idea. Never invest in a grade-B team with a grade-A idea.” In this paper, we explore the profile of technology-based startup teams that are most likely to secure a Series-A funding round from VCs. From a methodological point of view, we applied a strongly quantitative approach, integrating several data mining techniques according to a multidisciplinary perspective, between data science and entrepreneurship. As for the company information, we used Crunchbase as our primary source, considering a set of U.S.-based startups founded from 2000 to 2017. For each venture we algorithmically integrated team-related information from the founders’ public LinkedIn profiles. Overall, we analysed more than 2,100 teams, involving a total of about 4,600 founders. Each founders’ experience was analysed by considering their professional background. Overall, more than 29,000 work experiences have been taken into consideration. Statistical analysis was carried out on both individual founders and their team organization. Both founders and teams were evaluated in terms of heterogeneity of prior experience and similarity of co-founder profiles using the Gini coefficient and Jaccard index, respectively. Statistics are expressed according to the companies’ sector and their fundraising profile. In fact, the different sectors are mapped on a 4-quadrant chart to identify different combinations between founders’ profiles (specialists VS generalists) and teams characteristics (combining co-founder with similar or diverse background). Results reveal the impact of team similarity and variety in terms of prior working experience. The findings provide valuable insights for scholars dealing with tech-driven startups teams, aspiring entrepreneurs looking for co-founders and for VCs seeking to invest in promising startups.
Ferrati, F. & Muffatto, M. (2021). “Entrepreneurial Finance: Emerging Approaches Using Machine Learning and Big Data”. Foundations and Trends® in Entrepreneurship: Vol. 17: No. 3, pp 232-329.
Abstract. For equity investors the identification of ventures that most likely will achieve the expected return on investment is an extremely complex task. To select early-stage companies, venture capitalists and business angels traditionally rely on a mix of assessment criteria and their own experience. However, given the high level of risk with new, innovative companies, the number of financially successful startups within an investment portfolio is generally very low.
In this context of uncertainty, a data-driven approach to investment decision-making can provide more effective results. Specifically, the application of machine learning techniques can provide equity investors and scholars in entrepreneurial finance with new insights on patterns common to successful startups.
This study presents a comprehensive overview of the applications of machine learning algorithms to the Crunchbase database. We highlight the main research goals that can be addressed and then we review all the variables and algorithms used for each goal. For each machine learning algorithm, we analyze the respective performance metrics to identify a baseline model. This study aims to be a reference for researchers and practitioners on the use of machine learning as an effective tool to support decision-making processes in equity investments.
Ferrati, F. & Muffatto, M. (2021). “Reviewing Equity Investors’ Funding Criteria: A Comprehensive Classification and Research Agenda”. Venture Capital: Vol. 23: No. 2, pp. 1-22.
Abstract. Venture capitalists and angel investors usually apply a set of assessment criteria in order to evaluate all the key elements of entrepreneurial projects. However, since each investor considers a different set of criteria, previous researchers who analysed investors’ decision making, ended up analysing a variety of divergent aspects. In this paper, a systematic literature review on the assessment criteria applied by equity investors was carried out. The purpose of this study was to identify and classify all the criteria considered by previous researchers in order to determine whether some aspects were investigated more extensively than others and to understand the reasons for this type of approach.
After screening the abstracts of 894 journal publications, 53 articles were selected for a detailed analysis. In total, 208 unique criteria were identified and were subsequently classified into 35 specific categories, 11 generic classes and 4 main domains of analysis. The high level of detail and granularity of this work is one of its added values and can provide a knowledge base for future researchers who intend to apply new methodologies for the analysis of investors’ decision-making.
Starting from the results obtained so far, a new agenda for future research in this field is suggested to encourage a more data-driven approach leveraging data science techniques.
Ferrati, F., Chen, H. & Muffatto, M. (2021). “A Deep Learning Model for Startups Evaluation Using Time Series Analysis”. 16th European Conference on Innovation and Entrepreneurship (ECIE), Lisbon.
Abstract. In the field of entrepreneurial finance, both academic researchers and venture capital firms are exploring the use of datadriven approaches to the analysis of entrepreneurial projects. For example, using the data provided by Crunchbase, some researchers have developed machine learning models aimed at predicting the exit event of startup companies. However, these previous contributions have always looked at ventures as static entities over time, only considering the values assumed by the key variables at the time of data extraction.
This paper aims to propose a new modelling approach, based on the analysis of the evolution of companies over time. The work considers a sample of 10,211 US-based companies, appropriately selected through a sequence of data processing activities. The rationale applied to reorganize the information and design a database ready to be used for a temporal analysis is described. In particular, each firm is modelled considering three different groups of features whose values change as the company evolve and therefore describe the key milestones achieved. In this regard, the number and amount of funding rounds over time, the number of investors involved and the number of patents obtained over the years are considered. To highlight the importance of the evolution of these variables over time, their statistical trends are reported within a 10-year time window from the companies’ foundation.
Considering a binary classification problem aimed at predicting whether or not a startup exit event will occur, statistics are presented for the two groups of companies, those that have made an exit or not. Figures show how this approach makes it possible to achieve a greater level of detail on the characteristics of the companies, not otherwise obtainable without considering the time factor. The obtained dataset is then used to train a binary deep learning classifier designed to perform time series analysis. The results obtained confirm the effectiveness of the applied modelling strategy. The obtained model is in fact able to predict whether a company will make an exit within 10 years of its foundation with a recall equal to 93%.
Ferrati, F. & Muffatto, M. (2021). “Startup Exits by Acquisition: A Cross Industry Analysis of Speed and Funding”. 16th European Conference on Innovation and Entrepreneurship (ECIE), Lisbon.
Abstract. Being acquired by a larger company represents the final step in a startup life cycle and is often the ultimate objective of both founders and equity investors. In fact, the occurrence of an acquisition allows shareholders to transfer their equity stake to the acquiring company and thus realize a return on their initial investment, hopefully resulting in a capital gain. From an investor’s point of view, two important elements to estimate are the time needed to take a company to an exit and the capital it will require to reach that result. These two factors are related to the sector in which the venture operates. Since acquisitions represent the most frequent case of exit, this paper focuses on their analysis.
A sample from Crunchbase with more than 17,000 U.S.-based tech-startups founded after the year 2000 and acquired before 2021 was analysed. Starting from the original 744 categories used by Crunchbase for company classification, 64 sectors were identified through a clustering process. For each sector, the following elements were calculated: the number of acquired companies, the average number of months it takes for companies to be acquired as well as the average amount of capital raised before their acquisition.
By combining these analyses, it was then possible to create a matrix in which each sector has been positioned within four quadrants, considering the variables “acquisition speed” and “required capital”. Considering also the number of companies in each sector, the weight of each sector in terms of investments can be estimated.
On the other hand, more than 10,000 acquiring companies involved in the considered exits were also analysed, highlighting that 74% of them just made one single acquisition. Top 15 acquirers were also identified and their behaviour in terms of speed of acquisition and funding raised by target companies was then investigated.
Ferrati, F. & Muffatto, M. (2020). “Setting Crunchbase for Data Science: Preprocessing, Data Integration and Feature Engineering”. 3rd International Conference on Advanced Research Methods and Analytics (CARMA), Valencia, pp. 221-229.
Abstract. In order to support equity investors in their decision-making process, researchers are exploring the potential of machine learning algorithms to predict the financial success of startup ventures. In this context, a key role is played by the significance of the data used, which should reflect most of the variables considered by investors in their screening and evaluation activity.
This paper provides a detailed description of the data management process that can be followed to obtain such a dataset. Using Crunchbase as the main data source, other databases have been integrated to enrich the information content and support the feature engineering process. Specifically, the following sources has been considered: USPTO PatentsView, Kauffman Indicators of Entrepreneurship, Academic Ranking of World Universities, CB Insights ranking of top-investors. The final dataset contains the profiles of 138,637 US-based ventures founded between 2000 and 2019.
For each company the elements assessed by equity investors have been analyzed. Among others, the following specific areas were considered for each company: location, industry, founding team, intellectual property and funding round history. Data related to each area have been formalized in a series of features ready to be used in a machine learning context.
Ferrati, F. & Muffatto, M. (2020). “Using Crunchbase for Research in Entrepreneurship: Data Content and Structure”. 19th European Conference on Research Methodology for Business and Management Studies (ECRM), Aveiro, pp. 342-351.
Abstract. The large amount of business-related data available today allows researchers in entrepreneurship to explore new methodologies for data analysis. This paper aims to present an overview of the database provided by Crunchbase for research purposes. Founded in 2007, Crunchbase collects worldwide data on companies, investors, funding rounds and key people of the entrepreneurial ecosystem. As of May 2019, Crunchbase had collected records on 760,590 organizations (of which 708,558 companies), 121,509 investors of different types, 263,426 funding rounds, 890,429 people, 17,068 initial public offerings (IPO) and 89,959 acquisitions.
The main purpose of this work is to give a detailed description of the Crunchbase database in order to highlight its potential and facilitate future researchers who intend to use this source of data. In order to achieve this goal, three main topics are covered. Since the database is provided in seventeen independent datasets, the linking logics have been reconstructed applying a reverse engineering approach. The relationships between the individual files have been identified and then summarized in an original diagram. For each dataset all the available variables are provided.
Afterwards, in order to quantify the scope and coverage of the database, some key variables have been analysed, resulting in descriptive statistics for three areas of interest: companies, funding rounds and investors. Specifically, analysis is provided about the geographical distribution of companies, the number of companies per year of foundation and current operating status, the number of companies by amount and number of investments raised and as well as the number of investors by number and amount of investments made.
Finally, some indications on the potential uses of Crunchbase for research in entrepreneurship are given. Considering the characteristics of the available variables we focused on the applications of machine learning algorithms for the analysis and modeling of equity investment processes.
Ferrati, F. & Muffatto, M. (2019). “A Systematic Literature Review of the Assessment Criteria Applied by Equity Investors”. 14th European Conference on Innovation and Entrepreneurship (ECIE), Kalamata, pp. 304-312.
Abstract. What assessment criteria are most widely used by equity investors during their funding decisions? In the context of the so-called picking winner’s problem, which aspect do they consider most? Is it the jockey (entrepreneurial team), the horse (product/service), the race-track (market) or the odds (financials) to make the difference? Despite the investment evaluation funnel being very selective, about 35% of the venture-backed firms actually fail and, considering a conservative estimate, an additional 20% doesn’t provide the expected return on investment. The data therefore indicate that the investment process has large room for improvement.
This paper is a systematic literature review of the research about the assessment criteria used by equity investors (venture capital and angel investors) during their investment decision making process. The research is designed around three research questions. RQ.1: what are the criteria used by equity investors to support their decision-making process in venture funding? RQ.2: what are the investment criteria that have been most discussed in the literature? RQ.3: which aspects of the company are mostly assessed by investors?
After screening the abstract of 894 unique journal publications, 53 articles were selected for a detailed analysis. The criteria mentioned in every study were registered and 208 distinct drivers were identified. The criteria were classified into 35 specific categories, 11 generic classes and 4 main domains of analysis (respectively related to the venture, the investor, the risks factors and the environment). The high detail and granularity of the analysis is one of the added values of this work compared with previous literature.
The authors propose a new approach to research, based on the use of large databases on ventures funding (e.g. Crunchbase). By analysing data on thousands of actual investments, researchers could introduce a radical change of perspective in this field of research.