Streaming analytics better than classic batch — when and why?
While a lot of problems can be solved in batch, the stream processing approach can give you even more benefits. Today, we’ll discuss a real-world…
Read moreThe 2024 edition of InfoShare was a landmark two-day conference for IT professionals, attracting data and platform engineers, software developers, marketers, company representatives and more. The event was a hub for networking, trend-following, job hunting and knowledge sharing. This year at the Infoshare conference, the DataMass stage made its debut for the first time, and it was our stage. This was a significant event as, thanks to our partnership with DataMass, we delivered even more valuable content to the attendees! In the next part of the blog, we will review the presentations from this stage.
AI was the central theme at InfoShare 2024, featuring prominently on every stage. Topics ranged from inspiration and trends to big data, development architecture, coding, marketing and more.
Piotr Przybył aimed to persuade the participants in his talk titled “GenAI, Vector/Semantic/Hybrid Search, RRF, NLP, LLM, RAG, FUD, FOMO, and Other Buzzwords” that it’s easy to start out on the path of AI, and that it’s worth doing so now. He emphasized that whilst it’s uncertain whether people in the IT industry will be replaced by AI, it’s certain that they will be replaced by those who use AI effectively.
We started with the concept of neural networks (NN), which simulate neural behavior using computer programs for tasks such as image recognition. This simulation is based on multiple layers and connections between them, which become stronger when used during the learning process, similar to how neurons behave in our brains.
The next acronym, NLP, stands for Natural Language Processing. It's a field of training neural networks (NN) to recognize natural language, understand its semantic context and generate appropriate responses. This leads us to the concept of the Large Language Model (LLM), which is a model pre-trained using NLP techniques with massive amounts of data. This extensive training allows it to detect hidden patterns and rules, and generate accurate responses in natural language.
As Piotr mentioned, context is king. An LLM is trained on a static dataset, and without additional information, its usefulness might be limited. The Retrieval Augmented Generation (RAG) technique allows for the inclusion of additional information, enabling the LLM to generate responses based on an extended dataset.
The next step is Generative AI (GenAI), which refers to systems that can create new data similar to the data used in their training process, allowing AI to be creative. The ultimate goal in this field is Artificial General Intelligence (AGI), an intelligent system capable of understanding and continuously learning across all domains. It possesses cognitive abilities and is often envisioned as a dream of the future. Remember Skynet from the Terminator movies?
While all of this might seem complicated, it's actually quite easy to use in practice. Piotr demonstrated a simple application using the LangChain4j library, which allows for the integration of Java applications with LLMs.
Piotr also showcased Elasticsearch's AI capabilities. You can import a model from Hugging Face, train it on selected data, which then vectorizes the data and identifies the nearest neighbor to provide accurate answers or categorize data. Another advantage is the capability of a hybrid search, which combines the nearest neighbor vector search with a match query.
Hurry up and learn how to use LLM and other techniques. Those who stand still will be left behind! The best way to do this is to learn from our blogs, for example:
People in the IT industry are constantly under pressure, needing to learn new technologies and continuously adapt. It isn’t surprising that 2 out of 5 people experienced burnout in 2022. How can this be avoided, and how can one find the right work-life balance? Aleksandra Knysz gave an excellent speech entitled “Czy można pracować inaczej? Prosta droga branży IT do wypalenia” (Can work be done differently? The IT industry's straightforward path to burnout).
Aleksandra split the problem into three categories, providing some tips and advice:
Profit and Loss Balance
Teamwork
Sense of Influence
Aleksandra emphasizes the statistic that more than half of IT employees feel they are not doing enough. In most cases, this isn’t true and leads people to work overtime, even though nobody expects it!
These tips can help prevent burnout, resulting in better job quality, higher satisfaction, and a healthy work-life balance!
How can you learn efficiently and make the process satisfying? Turn it into a game! In his speech, “Breaking Barriers: The Art of (Free) Gamified Security Training,” Joseph Katsioloudes explained how gamification can help achieve this goal.
Joseph asserts that software security should start with developers. However, developers often find traditional training to be boring, unrealistic outside of a development environment, ineffective, and too theoretical. After such training, they often still don’t know how to fix security issues without introducing new ones.
As an example, Joseph presented statistics from a PropTech startup. GitHub Advanced Security (CodeQL, Security Scanning, Dependabot) identified about 180 vulnerabilities. Low and medium issues could be fixed easily, often automatically. However, some issues required significant code changes and took more time. As a result, the company managed to reduce the number of issues tenfold, within five months. It cost each developer four hours weekly, plus two additional hours for meetings. Note that the company still needs to address new security issues, and the costs may increase as the company and codebase grow. This contradicts the startup vision, which considers such time unproductive and the product unsustainable.
How can you avoid spending time fixing code? By writing it better! And what is the best way to train developers? Gamification! Joseph launched an open-source project called secure-code-game and organized a weekly hackathon for developers in Vancouver. The idea was well-received and brought joy to employees. The company saw the benefits: the total time spent on fixing security issues was reduced by 97%! Moreover, 9 out of 10 developers felt that security was in their hands.
The secure-code-game is free and doesn’t require installation. It runs on GitHub Codespaces using virtual machines, providing users with 60 free hours per month - more than enough to complete the challenge. Currently, the game contains two seasons (the second created by the community), each with several tasks organized into levels. Assignments are available in Python, JavaScript, C, Go, and GitHub Actions. Each task includes functional code with unit tests that contain vulnerabilities. Your mission is to fix them. Afterwards, you can check the reference solution.
Why is gamification so effective? It triggers real human emotions, resulting in a dopamine boost, providing a reward or a sense of achievement. Moreover, people love competitions! Gamification is ubiquitous - social media with likes and followers, loyalty cards, paybacks and community achievements apps. It’s not surprising that the value of the gamification market has now erupted from $9 billion to $21 billion now.
Stay Ahead in Data! Subscribe to our newsletter for more insights and updates from industry experts.
While a lot of problems can be solved in batch, the stream processing approach can give you even more benefits. Today, we’ll discuss a real-world…
Read moreSo, you have an existing infrastructure in the cloud and want to wrap it up as code in a new, shiny IaC style? Splendid! Oh… it’s spanning through two…
Read moreManaging data efficiently and accurately is a significant challenge in the ever-evolving landscape of stream processing. Apache Flink, a powerful…
Read moreA year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast…
Read moreThe year 2023 has definitely been dominated by LLM’s (Large Language Models) and generative models. Whether you are a researcher, data scientist, or…
Read moreData Pipeline Evolution The LinkedIn Engineering blog is a great resource of technical blog posts related to building and using large-scale data…
Read moreTogether, we will select the best Big Data solutions for your organization and build a project that will have a real impact on your organization.
What did you find most impressive about GetInData?